Reputation
Badges 1
25 × Eureka!GreasyPenguin66 Nice !!!
Very cool setup, and kudos on making it work with multiple users!
Quick question, shouldn't the JUPYTERHUB_API_TOKEN env variable be enough to gain access to the server? Why did you need to add it to the 'nbserver-x.json' as well?
I guess this is from clearml-server and seems to be bottlenecking artifact transfer speed.
I'm assuming you need multiple "file-server" instances running on the "clearml-server" with a load-balancer of a sort...
Not sure on the cause but if you do:
mp.set_start_method('fork', force=True)
There is no semaphore leakage
Hi @<1610083503607648256:profile|DiminutiveToad80>
This depends on how you configure the agents in your clearm.conf
You can do https if user/pass are configured, and you can force SSH and it will auto-mount your host SSH folder into the container and use it.
None
[None](https://github.com/allegroai/clearml-agent/blob/0254279ed5987fbc69cebae245efaea33aec1ff2/docs/cl...
PlainSquid19 I will also look into it as well.
maybe for some reason model.keras_model.save_weights
is not caught ...
So far my local and remote gitlab repositories are synchronized, I suspect, thatΒ
Failed applying git diff, see diff above
Β error is caused by cached repository from which clearml tries to run the process. I've cleaned the cache, but it haven't helped.
Hmm can you test with empty "uncommitted changes" ?
Just making sure when you say still does n't work, you are not trying to run the Task with the git diff that includes teh binary data right?
Yes, but where I can fi...
CloudyHamster42 you mean that when you set sdk.metrics.tensorboard_single_series_per_graph
to True and you rerun the experiment, you are still getting multiple series on the same graph?
What's your Trains version?
Each of these steps,Β Β
[2], [3], [4], [5 & 6]
Β can be thought of as an independent Kedro nodes that can be reused in the future.Β Now, how to integrate this with ClearML is unclear to us.
The same can be said for ClearML, each of these steps is a clearml Task (with it's own repo/environment)
It sounds (and I might be completely off here, so please feel free to correct me) that the main use for Kedro is the nice web UI of the pipeline (which I
agree looks very cool).
Th...
SlipperyDove40 following on the missing section name, this seems like backwards compatibility issue. Try calling with backwards_compatibility=False
my_params = Task.get_parameters(backwards_compatibility=False)
This should always add the section name prefix.
Hi @<1610083503607648256:profile|DiminutiveToad80>
I think we will need more context for the log...
but I think there is something wrong with the GCP resource configuration of your autoscaler
Can you send the full autoscaler log and the configuration ?
generally speaking the agent will convert the repo url to the auth scheme it is configured with, ssh->hhtp if using user/pass, and http->ssh if using ssh
Where exactly are the model files stored on the pod?
clearml cache folder, usually under ~/.clearml
Currently I encounter the problem that I always get a 404 HTTP error when I try to access the model via the...
How are you deploying it? I would start by debugging and runnign everything in the docker-compose (single machine) make sure you have everything running, and then deploy to the cluster
(becuase on a cluster level, it could be a general routing issue, way before getting t...
Hi @<1649221394904387584:profile|RattySparrow90>
: Are the models I defined to be served e.g. via the CLI downloaded to the serving pod
Yes this is done automatically and online (i.e. when you update the using CLI/API) , based on the models/endpoints you set
So that they are physically lying there as a file I can see in the filesystem?
They are, and cached there
Or is it more the case that the pod gets the model when needed/when an API call for this model is incoming?
I...
As long as you import clearml on the main script, it should work. Regarding the Nvidia container, it should not interfere with any running processes, the only issue is memory limit. BTW any reason not to spin an agent on a dedicated machine? What is the gpu used for in the ckearml server machine?
PleasantGiraffe85 can you send examples of the different git repo links (one internal one public) ?
JitteryCoyote63
Yes this extremely annoying, I think it was updated on the community server, let me check if we deployed a new docker with a fix ...
RobustGoldfish9 I see.
So in theory spinning an experiment on an gent would be clone code -> build docker -> mount code -> execute code inside docker?
(no need for requirements etc.?)
can someone show me an example of howΒ
PipelineController.create_draft
I think the idea is to store a draft versio of the pipeline (not the decorator type, I think, but the one launching pre-executed Tasks).
GiganticTurtle0 I'm not sure I fully understand how / why you are using it, can you expand?
EDIT:
However, my intention is ONLY to create it to be executed later on.
Hmm so may like enqueue it?
GiganticTurtle0 fix was just pushed to GitHub πpip install git+
Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output:
Run with --debug as the first parameter
Are you running the latest from the git repo ?
Also in the same open docker session, can you try:$LOCAL_PYTHON -m clearml_agent execute --disable-monitoring --id <task_id_here>
Where the Task ID is one of the failed executions (only reset it before)
Yes, you are too quick for the resource monitoring π
Ohh sorry you will also need to fix the def _patched_task_function
The parameter order is important as the partial call relies on it.
This is odd, how are you spinning clearml-serving ?
You can also do it synchronously :
predict_a = self.send_request(endpoint="/test_model_sklearn_a/", version=None, data=data)
predict_b = self.send_request(endpoint="/test_model_sklearn_b/", version=None, data=data)
I'm assuming some package imports absl (the TF define package) and that's the reason you see the TF defines). Does that make sense?
Hi UnsightlyHorse88
Hmm, try adding to your clearml.conf file:agent.cpu_only = true
if that does not work try adding to the OS environmentexport CLEARML_CPU_ONLY=1