Reputation
Badges 1
90 × Eureka!ohhh ok. so I can actually remove this if those workers are no longer in use
Awesome, thank you. I will give that a try later this week and update if it worked as expected! May finally solve my private dependencies issue 😂
One question - you can also set the agent.package_manager.extra_index_url
, but since this is dynamic, will pip install still add the extra index URL from the pip config file? Or does it have to be set in this agent config variable?
By script, you mean entering these two lines separately as a list for that extra_docker_shell_scripts
arugment?
Did the shell script route work? I have a similar question.
It's a little more complicated because the index URL is not fixed; it contains the token which is only valid for a max of 12 hours. That means the ~/.config/pip/pip.conf
file will also need to be updated every 12 hours. Fortunately, this editing is done automatically by authenticating AWS codeartefact in the command line by logging in.
My current thinking is as follows:
Install the awscli
- pip install awscli
(c...
Sounds good to me. Thanks Martin 🙂
Thanks Martin. I think I have found where the error is!
Yeah, it's not urgent. I will change the labels around to avoid this error 🙂 thanks for checking!
No worries, happy to help with the bug hunt 😄
This is a suspicion only. It could be something else. In my case, there is no artifact or other config with a dict containing that key. Only the label map contains that key
Hey Alon. Thanks for the response. I'm not quite sure I follow your answer. Where do I add this git+ssh and how?
To report the metric to clearML, would that just be a batch update every t interval?
How we resolved this issue is by developing a package to deal with the connecting and querying to databases. This package is then used inside the task for pulling the data from the data warehouse. There is a devops component here for authorising access to the relevant secret (we used SecretsManager on AWS). The clearml-agent instances are launched with role permissions which allow access to the relevant secrets. Hope that is helpful to you
Where are you storing your secret JitteryCoyote63 ?
Is there any documentation on how to set up the config for the agent?
Our model store consists of metadata stored in the DWH, and model artifacts stored in S3. We technically use ClearML for managing the hardware resource for running experiments, but have our own custom logging of metrics etc. Just wondering how tricky integrating a trigger would be for that
Yeah that could be one approach.
I mean, is it possible to create a trigger task that reads a message from a queue? And that message contains information about whether a pipeline needs to be triggered or not
In particular, what does the external trigger poll? Is it a queue somewhere on clearml, or any arbitrary queue like SQS is supported?
2021-03-01 20:51:55,655 - clearml.Task - INFO - Completed model upload to s3://15gifts-clearml/artefacts/pre-engine-traits/logistic-regression-paths-and-sales-tfidf-device-brand.8d68e9a649824affb9a9edf7bfbe157d/models/tfidf-logistic-regression-1614631915-8d68e9a649824affb9a9edf7bfbe157d.pkl *****
2021-03-01 20:52:01
2021-03-01 20:51:57,207 - clearml.Task - INFO - Waiting to finish uploads
On my local I have clearml 0.17.4
I think there is more complexity to what I am trying to achieve, but this will be a good start. Thanks!
I will need to log data set ID, transformer (not the NN architecture, just a data transformer), the model (with all hyperparameters & metadata) etc. and how all things link
New user is trying to push tasks, and the task is instantly changed to aborted from running
Ok, that explains a lot. The new user was using version 1.x.x and I was using version 0.17.x. That is why I think my task was being drafted. and his was being aborted.
There is no specific use case for draft mode - it was just the mode I knew that I understood to be used for enqueuing a newly created task, but I assume that aborted now has the same functionality
It should be a draft, so that it can be enqueued
Reason I am asking is because we have servers with large RAM capacity, but minimal storage capacity, meaning that objects held in memory can sometimes surpass storage capacity if export is required
Oh, that may work. Is there any docs/demos on this?
using this method training_task.set_model_label_enumeration(label_map)