without the ClearML Server in-between.
You mean the upload/download is slow? What is the reasoning behind removing the ClearML server ?
ClearML Agent per step
You can use the ClearML agent to build a socker per Task, so all you need is just to run the docker. will that help ?
Let me verify something in the code,
So I can set output_uri = "s3://<bucket_name>/prefix" and the local models will be loaded into the s3 bucket by ClearML ?
Yes, magic ๐
Hi ConvolutedBee40
If we deploy a task to
clearml-server
, will it automatically scale?
The way it works is with agents and agent glue, basically using k8s as a resource allocator and the clearml agent as orchestrator, did that answer the question ?
Even if you had any packages, I'm pretty sure there is nothing for you to worry about, it will just list them, and if they are preinstalled, the preinstalled will be used
Let's assume the host has a folder for all users for persistence storage, for example '/mnt/user_data/and you have a user named 'myuser' and a matching subfolder '/mnt/user_data/myuser
Then we can do:clearml-session ... --docker "my_docker_image -v /mnt/user_data/:/host_mount/" --user-folder "/host_mount/myuser"
BTW: The next time you call clearml-session
these will become the default parameters, so no need to change anything ๐
Hmm, might be, check if your files server is running and configured properly
Hey LethalDolphin75 , when it works, could you PR it?
Hi GiganticTurtle0
The main issue is the cache=True
it will cause the second time you call the function to essentially reuse the Task, ending with the same result.
Can you test with cache=False
in the decorator ?
Hi RoughTiger69
A. Yes makes total sense . Basically you can use Task.export Task.import to do achieve this process (notice we assume the dataset artifacts links are available on both, usually this is the case)
B. The easiest way would be to use Process , then one subprocess is exporting from dev , where the credentials and configuration is passed with os environment. The another subprocess imports it to the prod server (again with os environment pointing to the prod server). Make sense?
then will clearml associate that image with my experiment and always use that image with it,
when you say "agent to use my docker image," I'm assuming you mean the configuration file or --docker
argument, in both cases this means Default conatiner.
This means that if the Task does Not specify a docker, it will use the one you set in the conf/argument, But Tasks can always specify a different docker to use, and the agent will pull the requested docker based on the Task's entry.
Eve...
Can you let me know if i can override the docker image using template.yaml?
No, you cannot.
But you can pass OS environment "CLEARML_DOCKER_IMAGE" to set a diff default one
Once a model is saved and published, it should be downloadable right
Well that depends if you configured CLearML to autoupload it (by default it will just log the "local location").
To auto-upload add output_uri=True
to Task.Init
(or specify a destination with output_uri= ` s3://bucket/ )
You can also configure it as default here:
https://github.com/allegroai/clearml/blob/65f1c0baa124efb05fb7894a5386f0dd52c0536b/docs/clearml.conf#L163
Is it possibe to launch a task from Machine C to the queue that Machine B's agent is listening to?
Yes, that's the idea
Do I have to have anything installed (aside from theย
trains
ย PIP package) on Machine C to do so?
Nothing, pure magic ๐
What's the OS / Python version?
Thanks GrievingTurkey78
Sure just PR (should work with any Python/Hydra version):kwargs['config']=config kwargs['task_function']=partial(PatchHydra._patched_task_function, task_function,) result = PatchHydra._original_run_job(*args, **kwargs)
So I think it makes more sense in this case to work with the former.
Totally !
Thank you for saying ! ๐
BroadMole98 thank you for noticing !
I'll make sure it is fixed (a few other properties are also missing there, not sure why, I'll ask them to take a look)
Python3.8 I can quickly check, give me a minute
Our remote machine is Windows 10
JumpyDragonfly13 seems like the Windows 10 + docker is the issue (that would explain the OCI error)
Is this relevant ?
https://github.com/microsoft/WSL/issues/5100
Hi PompousBeetle71 , Trains will log all the torch.save call, I'm assuming they do not actually use it for the rest of the files on that folder.
If you like to share a code snippet we could see if we could auto-magically log it You could use artifacts and store the entire folder. It will zip it an upload it. Then you can reuse it from other experiments. https://allegro.ai/docs/task.html?highlight=artifact#trains.task.Task.upload_artifact
Example:
` task.upload_artifact('transformer', './my_...
Any idea where that could come from? Could we turn off the local logging as well - in these kinds of runs we donโt need it?
It is supposed to create it automatically... I tested with other examples (clearml version 1.7.3rc1) everything seems to work
What am I missing? how do we recreate the issue ? can you verify it is still not working with the latest RC?
It also seems that
PipelineDecorator.upload_artifact
is not compatible with caching, sadly,
Both use the exact same mechanism of uploading artifacts (i.e. including caching for downloaded artifacts), in terms of caching pipeline components, this is on a component level (i.e. same code/task same arguments, equals cache hit)
What exactly are you getting ? how is it that the "PipelineDecorator.upload_artifact" uploads to a different storage ? is that reproducible ?
Hi ReassuredOwl55
The easiest is to configure it as default output_uri in the clearml.conf of file the agent, wdyt?
https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/docs/clearml.conf#L430
Hi @<1643060801088524288:profile|HarebrainedOstrich43>
You are absolutely correct we just fixed nested decorators in pipeline a week ago, let me check if he RC is already out with a fix.
Hi RipeGoose2
I think it "should" take of uploading the artifacts as well (they are included in the zip file created by the offline package)
Notice that the "default_output_uri" on the remote machine is meaningless as it stored them locally anyhow. It will only have an effect on the machine that actually imports the offline session.
Make sense ?
Hi @<1585078763312386048:profile|ArrogantButterfly10>
Now i want to clone the pipeline and change the hyperparameters of train task, is it possible? If so, how??
the pipeline arguments are for the pipeline DAG/logic, you need to pass one of the arguments as an argument for the training step/task. Make sense ?