Reputation
Badges 1
90 × Eureka!In particular, what does the external trigger poll? Is it a queue somewhere on clearml, or any arbitrary queue like SQS is supported?
Never mind. I think I figured it out. Thanks for your help 🙂
Ok, that explains a lot. The new user was using version 1.x.x and I was using version 0.17.x. That is why I think my task was being drafted. and his was being aborted.
There is no specific use case for draft mode - it was just the mode I knew that I understood to be used for enqueuing a newly created task, but I assume that aborted now has the same functionality
We are planning on using airflow as the orchestration, but that may not fit your needs. I would say that the tool of choice is highly context specific.
We will be using airflow to trigger clearml-defined pipelines based on 'trigger' events, such as degradation in model performance, error alerts (e.g. at the data transformation task) etc.
Rightttttt I think I am starting to understanding the architecture now lol. Thank you so much for your help!
@<1687643893996195840:profile|RoundCat60> Hey Alex. Could you take a look at this when you're free later on please
From what I can tell, docker has some leakage here. Temp files are not removed correctly, resulting in the build up of disk storage usage.
See the following for more details
https://stackoverflow.com/questions/46672001/is-it-safe-to-clean-docker-overlay2
https://forums.docker.com/t/some-way-to-clean-up-identify-contents-of-var-lib-docker-overlay/30604
https://docs.docker.com/storage/storagedriver/overlayfs-driver/
Im going to write a clean up script and add that to the cron. I dont bel...
Thanks maestro. Will give this a go
Out of curiosity, is there a reason why utils
is not a package in its own right?
I have found that a private PyPi repo really does help with managing dependencies
On my local I have clearml 0.17.4
using this method training_task.set_model_label_enumeration(label_map)
I am struggling to fill in the values for the template. Some are obvious, others are not
I think there is more complexity to what I am trying to achieve, but this will be a good start. Thanks!
I will need to log data set ID, transformer (not the NN architecture, just a data transformer), the model (with all hyperparameters & metadata) etc. and how all things link
Yes it does 🙂 I suspected this was the process. Thanks Jake. One last question, more so about the architecture design - is it advised to have the clearml server instance and a 'worker' instance listening to the queue as separate remote machines, or can I use the same instance for the web UI and and as a worker? I understand that processing pipelines may be compute intense enough to consume all resources and break the web UI, but I was wondering whether using a single large instance is a po...
While we are here - excuse my ignorance for now if this has already been stated in the docs ..
Is it possible to launch multiple clearml-agents on a dedicated clearml-agent server? I noticed that with one agent, only one task gets executed at one time
Hey Martin. We have managed to resolve this. FYI the issue was with the resolving of the host. It had to be changed from @github.com
to what the host is in the ssh config file!
I don't even know if I have a valid concern for this. Just a little worried as airflow is accessible by more departments than just DS, which could result in some disasters
Awesome, thank you Jake! very helpful. For a lot of the models we run, we do not require GPU resources, so its good to know that a beefy instance should be able to run the experiments.
Hey Martin. By labels map, I'm referring to the labels map assigned to the model. The one you can view in the models tab // labels
New user is trying to push tasks, and the task is instantly changed to aborted from running
Our model store consists of metadata stored in the DWH, and model artifacts stored in S3. We technically use ClearML for managing the hardware resource for running experiments, but have our own custom logging of metrics etc. Just wondering how tricky integrating a trigger would be for that
Hey guys. Installing from the private repo is still failing. We have added the relevant deploy key to the repo, but I still get an error when trying to clone and install. Any ideas?
This is what I see under the 'Installed Packages' section
` # Python 3.8.5 (default, Sep 4 2020, 02:22:02) [Clang 10.0.0 ]
azure_storage_blob == 12.6.0
boto3 == 1.11.17
clearml == 0.17.4
git+ssh://git@github.com/15gifts/py-db.git
Detailed import analysis
**************************
IMPORT PACKAGE azure_st...
Only downside, which is not related to clearml, is that codeartifact authorisation tokens have to have a minimum lifespan of 15 mins. Usually, setting up envs before task execution takes less than a couple minutes, so the token lingers in the background. Nonetheless, all works as expected!
Thanks AnxiousSeal95 , will check it out! 🙂
It should be a draft, so that it can be enqueued
Haha no not that much, I was just trying to play around with removing tasks etc, and didn't want to remove tasks created by co-workers.
Out of interest, is there a reason these are read-only? The code for these tasks is on github right?
Can I use the task scheduler to schedule an update task every say 10 mins, would that keep it from being deleted?