Found it, definitely a bug in the callback, it has not effect on the HPO process itself
btw: you can also do cron for that:
None
@reboot sleep 60 && clearml-agent daemon ...
Sure, thing, I'll fix the "create_draft" docstring to suggest it
It runs into the above error when I clone the task or reset it.
from here:
AssertionError: ERROR: --resume checkpoint does not exist
I assume the "internal" code state changed, and now it is looking for a file that does not exist, how would your code state change, in other words why would it be looking for the file only when cloning? could it be you put the state on the Task, then you clone it (i.e. clone the exact same dict, and now the newly cloned Task "thinks" it resuming ?!)
I found "scheduler" on allegroai github, is it something related to the case I want to make?
MoodyCentipede68 it is exactly what you are looking for š
Do notice that you need to make sure you have your services queue configured and running for that to work š
Hi FierceHamster54
This is already supported, unfortunately the open-source version only supports static allocation (i.e you can spin multiple agents and connect each one to specific number of GPUs), the dynamic option (where you have single agent allocating jobs to multiple GPUs / Slices is only part of the enterprise edition
(there is the hidden assumption there that if you spent so much on a DGX you are probably not a small team š )
Hi NastyOtter17
"Project" is so ambiguous
LOL yes, this is something GCP/GS is using:
https://googleapis.dev/python/storage/latest/client.html#module-google.cloud.storage.client
The agent is using Bash (but when you add command line to the docker run, .bashrc is not executed, hence no conda in PATH)
Maybe add the full path to the conda executable:ocker_setup_bash_script= [ "export PATH=""/workspace/miniconda/bin:$PATH", "export LOCAL_PYTHON=/workspace/miniconda/bin/python3","/workspace/miniconda/bin/conda activate /PATH_GOES_HERE"])
Hi ThankfulOwl72 checkout TrainsJob object. It should essentially do what you need:
https://github.com/allegroai/trains/blob/master/trains/automation/job.py#L14
BoredHedgehog47 this is basically a wizard explaining the steps, see the 3 tabs š
BTW, you can launch an experiment directly from CLI with clearml-task
https://clear.ml/docs/latest/docs/apps/clearml_task
š
I'm trying to create a task that is not in repository root folder.
JuicyFox94 If the Task is not in a repo folder, you mean in a remote repository right ?
This means the repo should be in the form of " https://github.com/ " or "ssh://"
It failed in deducing this is a remote repository (maybe we can improve the auto detection?!)
Oh I see, basically a UI feature.
I'm assuming this is not just changing the x-axis in the UI, but somehow store the x-axis as part of the scalars reported?
But do consider a sort of a designer's press kit on your page haha
That is a great idea!
Also you can use:
https://2928env351k1ylhds3wjks41-wpengine.netdna-ssl.com/wp-content/uploads/2019/11/Clear_ml_white_logo.svg
Hi WickedGoat98
"Failed uploading to //:8081/files_server:"
Seems like the problem. what do you have defined as files_server in the trains.conf
Things to check:
Task.connect called before the dictionary is actually used Just in case, do configs['training_configuration']=Task.connect(configs['training_configuration']) add print(configs['training_configuration']) after the Task.connect call, making sure the parameters were passed correctly
Hi @<1523701260895653888:profile|QuaintJellyfish58>
Is there a way or a trigger to detect when the number of workers in a queue reaches zero?
You mean to spin them down? what's the rational ?
Iād like to implement a notification system that alerts me when there are no workers left in the queue.
How are they "dropping" ?
Specifically to your question, let me check I'm sure there is an API that get's that data becuase you can see it in the UI š
Hi @<1523704198338711552:profile|RoughTiger69>
From this scenario can we assume the "selection" will be tagging the model manually?
Also, how would an human operator decide on the best model, that is what is the input to base the decision on?
Hmm, this is a good question, I "think" the easiest is to mount the .ssh folder form the host to the container itself. Then also mount clearml.conf into the container with force_git_ssh_protocol: true see here
https://github.com/allegroai/clearml-agent/blob/6c5087e425bcc9911c78751e2a6ae3e1c0640180/docs/clearml.conf#L25
btw: ssh credentials even though sound more secure are usually less (since they easily contain too broad credentials and other access rights), just my 2 cents š I ...
Maybe combining the two, with an unload gRPC api we could have that ability moved to the "preprocessing" logic, wdyt?
function and just seem to be getting an "isadirectory" error?
Can you post here what you are getting ? which clearml version are you using ?!
also tried manually adding
leap==0.4.1
in the task UI which didn't work.
That has to work, if it did not, can you send the log for the failed Task (or the Task that did not install it)?
The environment in the logs does show that leap is being installed potentially from a cache?
- leap @ file:///opt/keras-hannd...
I'm not sure this is configurable from the outside š
BTW if the plots are too complicated to convert to interactive plotly graphs, they will be rendered to images and the server will show them. This is usually the case with seaborn plots
First let's try to test if everything works as expected. Since 405 really feels odd to me here. Can I suggest following one of the examples start to end to test the setup, before adding your model?
the only port configurations that will work are 8080 / 8008 / 8081
Hi @<1536518770577641472:profile|HighElk97>
Is there a way to change the smoothing algorithm?
Just like with TB, this is front-end, not really something you can control ...
That said you can report a smoothed value (i.e. via python) as additional series, wdyt ?
Hi GracefulDog98
The agent will map the ~/.ssh folder automatically into the docker's /root/.ssh
It will also convert http links to ssh pull if you set force_git_ssh_protocol
in your clearml.conf :
https://github.com/allegroai/clearml-agent/blob/351f0657c3dcf707659875d7e0a52fa387709978/docs/clearml.conf#L25
SubstantialElk6 I know they have full permission control in the enterprise edition, if this is something you need I suggest you contact http://allegro.ai š