Reputation
Badges 1
25 × Eureka!So I can set output_uri = "s3://<bucket_name>/prefix" and the local models will be loaded into the s3 bucket by ClearML ?
Yes, magic π
We do upload the final model manually.
wait you said upload manually, and now you are saying "saved automatically", I'm confused.
Hi TastyOwl44
So this depends on your code itself, but usually you need a CPU machine to run ClearML server (or use the free community server), than a machine to run the pipeline controller (usually the same machine running the clearml-server
, as the pipeline control code is basically controller only and does not execute the Task itself), lastly you need machines with GPU running the clearml-agent
(these GPU machines are the one actually doing the training inference etc.)
Make ...
Okay so my thinking is, on the pipelinecontroller / decorator we will have:abort_all_running_steps_on_failure=False
(if True, on step failing it will abort all running steps and leave)
Then per step / component decorator we will havecontinue_pipeline_on_failure=False
(if True, on step failing, the rest of the pipeline dag will continue)
GiganticTurtle0 wdyt?
UnsightlyShark53 See if this one solves the problem :)
BTW: the reasoning for the message is that when running the task with "trains-agent" if the parsing of the argparser happens before the the Task is initialized, the patching code doesn't know if it supposed to override the values. But this scenario was fixed a long time ago, and I think the error was mistakenly left behind...
Hi UnsightlyShark53 apologies for this delayed reply, slack doesn't alert users unless you add @ , so things sometimes get lost :(
I think you pointed at the correct culprit...
Did you manage to overcome the circular include?
BTW , how could I reproduce it? It will be nice if we could solve it
no available π
I want that last python program to be executed with the environment that was created by the agent for this specific task
Well basically they all inherit the Python environment that points to the venv they started from, so at least in theory it should be transparent when the agent is spinning the initial process.
I eventually found a different way of achieving what I needed
Now I'm curious, what did you end up doing ?
JitteryCoyote63 you mean in runtime where the agent is installing? I'm not sure I fully understand the use case?!
Hi! I was wondering why ClearML recognize Scikit-learn scalers as Input Models...
Hi GiganticTurtle0
any joblib.load/save is logged by clearml (it cannot actually differentiate what it is used for ...)
You can of course disable it with Task.init(..., auto_connect_frameworks={'joblib': False})
in my repo I maintain a bash script to setup a separate python env.
Hmm interesting, now I have to wonder what is the difference ? meaning why doesn't the agent build a similar one based on the requirements ?
RipeGoose2 models are automatically registered
i.e. added to the models artifactory, but it only points to where the files are stored
Only if you are passing the output_uri
argument to the Task.init, they will be actually uploaded.
If you want to disable this behavior you can passTask.init(..., auto_connect_frameworks={'pytorch': False})
Hmm so there is a way to add callbacks (somewhat cumbersome, and we would live feedback) so you can filter them out.
What do you think, would that work?
We workaround the issue by downloading the file with a request and unzipping only when needed.
We have located the issue, it seems the file-server is changing the header when sending back the file (basically saying CSV with gzip compression, which in turn will cause any http download client to automatically unzip the content). Working on a hot fix for it π
Hi GiddyPeacock64
If you already have K8s setup, and are already using ClearML.
In your kubeflow Yaml:trains-agent execute --id <task_id> --full-monitoring
This will install everything your Task needs inside the docker. Just make sure that you pass the env variable setting the ClearML , see here:
https://github.com/allegroai/clearml-server/blob/6434f1028e6e7fd2479b22fe553f7bca3f8a716f/docker/docker-compose.yml#L127
. Is it possible for two agents to be utilizing the same GPU?
It is, as long as memory wise they do not limit one another.
(If you are using k8s and clearml enterprise, then it supports GPU slicing and dynamic memory allocation)
So if any step corresponding to 'inference_orchestrator_1' fails, then 'inference_orchestrator_2' keeps running.
GiganticTurtle0 I'm not sure it makes sense to halt the entire pipeline if one step fails.
That said, how about using the post_execution callback, then check if the step failed, you could stop the entire pipeline (and any running steps), what do you think?
Or maybe you could bundle some parameters that belongs to PipelineDecorator.component into high-level configuration variable (something like PipelineDecorator.global_config (?))
So in the PipelineController we have a per step callback and generic callbacks (i.e. for all the steps), is this what you are referring to ?
Well, I can see the difference here. Using the new pipelines generation the user has the flexibility to play with the returned values of each step.
Yep π
We...
GiganticTurtle0 My apologies, I made a mistake, this will not work π
In the example above "step_two" is executed "instantaneously" , meaning it is just launching the remote task, it is not actually waiting for it.
This means an exception will not be raised in the "correct" context (actually it will be raised in a background thread).
That means that I think we have to have a callback function, otherwise there is no actual way to catch the failed pipeline task.
Maybe the only re...
The new parameterΒ
abort_on_failed_steps
Β could be a list containing the name of the
I like that, we can also have it as an argument per step (i.e. the decorator can say, abort_pipeline_on_fail or continue_pipeline_processing)
Hi RoundMosquito25
Hi, are there available somewhere examples of testing in ClearML? For example unit tests that check if parameters are passed correctly to new tasks etc.?
What do you mean by "testing in ClearML" ?
For example unit tests that check if parameters are passed correctly
Passed where / how? Are we thinking agents here ?
is the model overridden or its version is automatically increased?
You will have another model, with the same name (assuming the second Task has the same name), but a new ID. So if I understand you correctly, we have auto-versioning :)
understood trains does not have auto versioning
What do you mean auto versioning ?
task name is not unique, task ID is unique, you can have multiple tasks with the same name and you can edit the name post execution
Thanks DefeatedOstrich93
Let me check if I can reproduce it.
But the git apply failed, the error message is the "xxx already exists in working directory" (xxx is the name of the untracked file)
DefeatedOstrich93 what's the clearml-agent
version?
GrievingTurkey78 I see,
Basically the arguments after the -m src.train
in the remote execution should be ignored (they are not needed).
Change the m in the Args section under the configuration. Let me know if it solved it.
Okay how do I reproduce it ?