Reputation
Badges 1
25 × Eureka!Any specific use case for the required "draft" mode?
Okay, I think I lost you...
DilapidatedDucks58 you mean detect at which "iteration" the max value was reported, and then extract all the other metrics for that iteration ?
Are you running a jupyter notebook inside vscode ?
I assume so π Datasets are kind of agnostic to the data itself, for the Dataset it's basically a file hierarchy
Failing when passing the diff to the git command...
WackyRabbit7 hmmm seems like non regular character inside the diff.
Let me check something
. Is there any known issue with amazon sagemaker and ClearML
On the contrary it actually works better on Sagemaker...
Here is what I did on sage maker, created:
created a new sagemaker instance opened jupyter notebook Started a new notebook conda_python3 / conda_py3_pytorchIn then I just did "!pip install clearml" and Task.init
Is there any difference ?
Hi BitterStarfish58
What's the clearml version you are using ?
dataset upload both work fine
Artifacts / Datasets are uploaded correctly ?
Can you test if it works if you change " http://files.community.clear.ml " to " http://files.clear.ml " ?
DefeatedCrab47 yes that is correct. I actually meant if you see it on the tensorboard's UI π
Anyhow if it there, you should find it in the Tasks Results Debug Samples
CrookedWalrus33 any chance you can think of a sample code to reproduce?
I think, this all ties into the none-standard git repo definition. I cannot find any other reason for it. Is it actually stuck for 5 min at the end of the process, waiting for the repo detection ?
JitteryCoyote63 to filter out 'archived tasks' (i.e. exclude archived tasks)Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(system_tags=["-archived"])))
Once a model is saved and published, it should be downloadable right
Well that depends if you configured CLearML to autoupload it (by default it will just log the "local location").
To auto-upload add output_uri=True
to Task.Init
(or specify a destination with output_uri= ` s3://bucket/ )
You can also configure it as default here:
https://github.com/allegroai/clearml/blob/65f1c0baa124efb05fb7894a5386f0dd52c0536b/docs/clearml.conf#L163
Yes, no reason to attach the second one (imho)
Let me know if there is an issue π
I was unable to reproduce, but I added a few safety checks. I'll make sure they are available on the master in a few minutes, could maybe rerun after?
Aws autoscaler will work with iam rules along as you have it configured on the machine itself. Sagemaker job scheduling (I'm assuming this is what you are referring to, and not the notebook) you need to select the instance as well (basically the same as ec2). What do you mean by using the k8s glue, like inherit and implement the same mechanism but for sagemaker I stead of kubectl ?
Will using Model.remove, completely delete from storage as well?Β (edited)
correct see argument delete_weights_file=True
We should probably add (set_task_type :))
Assuming git repo looks something like:.git readme.txt module | +---- script.py
The working directory should be "."
The script path should be: "-m module.scipt"
And under the Configuration/Args, you should have:args1 = value args2 = another_value
Make sense?
BTW: how did it get there ?
Hi FranticCormorant35
So Tasks have parent field, that would link one to another.
Unfortunately there is no visual representation for it.
What we did with the hyper-parameter for example, was also to add a tag with the ID of the "parent" Task. This would make sense if you have multiple tasks all generated from the same "parent", like in hyper-parameter optimization.
What's your use case ? Is it a single evaluation Task per training, or multiple or con job alike ?
Thanks EnviousStarfish54 we are working on moving them there!
BTW, in the mean time, please feel free to open GitHub issue under train, at least until they are moved (hopefully end of Sept).
But adding a simpleΒ
force_download
Β flag to theΒ
get_local_copy
That's sounds like a good idea
I think that just backing up /opt/clearml and moving it should be just fine π€
Hi @<1554275779167129600:profile|ProudCrocodile47>
Do you mean @ clearml.io ?
If so, then this is the same domain (.ml is sometimes flagged as spam, I'm assuming this is why they use it)
BTW: the agent will resolve pytorch based on the install CUDA version.
So a bit of explanation on how conda is supported. First conda is not recommended, reason is, is it very easy to create a setup on conda that is un-reproducible by conda (yes, exactly that). So what trains-agent does, it tries to install all the packages it can first with conda (not one by one, because that will break conda dependencies), then the packages that it failed to install from conda, it will install using pip.
Thanks for the details TroubledJellyfish71 !
So the agent should have resolved automatically this line:torch == 1.11.0+cu113
into the correct torch version (based on the cuda version installed, or cpu version if no cuda is installed)
Can you send the Task log (console) as executed by the agent (and failed)?
(you can DM it to me, so it's not public)