Reputation
Badges 1
25 × Eureka!Now in case I needed to do it, can I add new parameters to cloned experiment or will these get deleted?
Adding new parameters is supported π
1.0.1 is only for the cleaml python client, no need for a server upgrade (or agent)
(some packages that are not inside the cache seem to have be missing and then everything fails)
How did that happen?
Hi BattyLion34
script_a.py
Β generates fileΒ
test.json
Β in project folder
So let's assume "script_a" generates something and puts it under /tmp/my_data
Then it can create a dateset from the folder /tmp/my_data
, with Dataset.create() -> Dataset.sync -> Dataset.upload -> Dataset.finalize
See example: https://github.com/alguchg/clearml-demo/blob/main/process_dataset.py
Then "script_b" can get a copy of the dataset using "Dataset.get()", see examp...
Hi MoodyCentipede68 , I think I saw something like it, can you post the full log? The triton error is above, also I think it restarted the container automatically and then it worked
Long story short, not any longer (in previous versions of k8s it was possible, but after the runtime container change it is not supported)
Hi WickedGoat98
"Failed uploading to //:8081/files_server:"
Seems like the problem. what do you have defined as files_server in the trains.conf
Wait who is creating this file? I thought you remove it in the uncommitted changes
A single query will return if the agent is running anything, and for how long, but I do not think you can get the idle time ...
if executed remotely...
You mean cloning the local execution, sending to the agent, then when running on the agent the Args/command is updated to a list ?
Does clearml resolve the CUDA Version from driver or conda?
Actually it starts with the default CUDA based on the host driver, but when it installs the conda env it takes it from the "installed packages" (i.e. the one you used to execute the code in the first place)
Regrading link, I could not find the exact version bu this is close enough I guess:
None
Sure, ReassuredTiger98 just add them after the docker image in the "Base Docker image" section under the execution Tab. The same applies for setting it from code.
example:nvcr.io/nvidia/tensorflow:20.11-tf2-py3 -v /mnt/data:/mnt/data
You can also always force extra docker run arguments by changing the clearml.conf on the agent itself:
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L121
Hi @<1730758665054457856:profile|MysteriousCrab4>
do I get to have the autoscaler feature,
You have the open source one here: None
In the managed Pro tier you have the fancy UI AWS/GCP autosclaer (with some additional extra features)
And there is the Scale/Enterprise tiers with more sophisticated features like Vault on top of that
Hurray conda.
Notice it does include cudatoolkit , but conda ignores it
cudatoolkit~=11.1.1
Can you test the same one only serach and replace ~= with == ?
How so? Installing a local package should work, what am I missing?
so I wanted to keep our βforkβ of the autoscaler but I guess this is not supported.
you are correct π
I wonder, " I customized it a bit to our workflow
" what did you add?
Well if we the "video" from TB is not in mp4/gif format than someone will have to encode it.
I was just pointing that for the encoding part we might need additional package
Okay yes, that's exactly the reason!! Cross origin blocks the file link
without the ClearML Server in-between.
You mean the upload/download is slow? What is the reasoning behind removing the ClearML server ?
ClearML Agent per step
You can use the ClearML agent to build a socker per Task, so all you need is just to run the docker. will that help ?
Hi @<1661180197757521920:profile|GiddyShrimp15>
I think the is a better channel for this kind of question
(they will be able to help with that)
Hi ImpressionableRaven99
Yes, it is π
Call this one before task.init, and it will run offline (at the end of the execution, you will get a link to the local zip file of the execution)Task.set_offline(True)
Then later you can import it to the system with:Task.import_offline_session('./my_task_aaa.zip')
The cool thing of using the trains-agent, you can change any experiment parameters and automate the process, so you get hyper-parameter optimization out of the box, and you can build complicated pipelines
https://github.com/allegroai/trains/tree/master/examples/optimization/hyper-parameter-optimization
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py
Hmmm that is a good use case to have (maybe we should have --stop get an argument ?)
Meanwhile you can do$ clearml-agent daemon --gpus 0 --queue default $ clearml-agent daemon --gpus 1 --queue default then to stop only the second one: $ clearml-agent daemon --gpus 1 --queue default --stop
wdyt?
I remember being told that the ClearML.conf on the client will not be used in a remote execution like the above so I think this was the problem.
SubstantialElk6 the configuration should be set on the agent's machine (i.e. clearml.conf that is on the machine running the agent)
- Users have no choice of defining their own repo destination of choice.
In the UI you can specify in the "Execution" tab, Output "destination", a different destination for the models/artifacts. Is this...
Hmm so the SaaS service ? and when you delete (not archive) a Task it does not ask for S3 credentials when you select delete artifacts ?