
Reputation
Badges 1
86 × Eureka!SuccessfulKoala55 thanks for your help as always. I will try to create a DAG on airflow using the SDK to implement some form of retention policy which removes things that are not necessary. We independently store metadata on artefacts we produce, and mostly use clearml as the experiment manager, so a lot of the events data can be cleared.
yes it does, but that requires me to manually create a new agent every time I want to run a different env no?
ECR access should be enabled as part of the role the agent instance assumes when it runs a task
I can't figure out from the examples how the external trigger works. All of our model performance stats are in the DWH, and we want to build triggers based on that, Is that possible to integrate with Clearml triggers and schedulers?
In particular, what does the external trigger poll? Is it a queue somewhere on clearml, or any arbitrary queue like SQS is supported?
Yeah that could be one approach.
I mean, is it possible to create a trigger task that reads a message from a queue? And that message contains information about whether a pipeline needs to be triggered or not
Say we have a DAG running on airflow every 30 mins. The purpose of this DAG is to aggregate results of model performance. If model performance is poor, then it sends a message to a queue with some config on which model to re-train.
I would like to use a TaskScheduler to poll this queue every X interval, to check whether a training pipeline needs to be kickstarted or not
Can I use the task scheduler to schedule an update task every say 10 mins, would that keep it from being deleted?
To report the metric to clearML, would that just be a batch update every t interval?
Using SSH credentials - replacing https url '
' with ssh url '
' Replacing original pip vcs 'git+
' with '
` '
Collecting py_db
Cloning ssh://@github.com/15gifts/py-db.git (to revision 851daa87317e73b4602bc1bddeca7ff16e1ac865) to /tmp/pip-install-zpiar1hv/py-db
Running command git clone -q 'ssh://@github.com/15gifts/py-db.git' /tmp/pip-install-zpiar1hv/py-db
2021-12-08 15:56:31
ERROR: Repository not found.
fatal: Could not read from remote repository.
Please...
This is included as part of the config file at ~/clearml.conf
on the clearml-agent
extra_docker_shell_script: [ "apt-get install -y awscli", "aws codeartifact login --tool pip --repository data-live --domain ds-15gifts-code", ]
Not sure how to get a log from the CLI but I can get the error from the clearml server UI, one sec
I don't think we explicitly pass the package path to the agent. I expect it to run a regular pip install but it seems to be doing it via git somehow
Only downside, which is not related to clearml, is that codeartifact authorisation tokens have to have a minimum lifespan of 15 mins. Usually, setting up envs before task execution takes less than a couple minutes, so the token lingers in the background. Nonetheless, all works as expected!
I can authorise CodeArtifact if I ssh into the server, and install the private package with no issues. Seems like something is forcing clearml-agent to use github cloning to install, rather than directly pip. Not sure if this is a configuration I have set up myself, or whether the server is configured to do this
Oh, that may work. Is there any docs/demos on this?
In particular, I am trying to find a neat way to query all models available, and use tags to know the context. As it stands, I log the model accuracies/RMEs as part of the metadata, alongside the training data filepath. Issue is that this is not the neatest way of querying models across tasks without a lot of laborious manual lifting. Suggestions welcome
While we're here, how can I return the model accuracy (or any performance metric for that matter) given a model(s) belonging to a particular task? Is this information stored anywhere or do I need to explicitly log this data somehow?
Did the shell script route work? I have a similar question.
It's a little more complicated because the index URL is not fixed; it contains the token which is only valid for a max of 12 hours. That means the ~/.config/pip/pip.conf
file will also need to be updated every 12 hours. Fortunately, this editing is done automatically by authenticating AWS codeartefact in the command line by logging in.
My current thinking is as follows:
Install the awscli
- pip install awscli
(c...
One question - you can also set the agent.package_manager.extra_index_url
, but since this is dynamic, will pip install still add the extra index URL from the pip config file? Or does it have to be set in this agent config variable?
Awesome, thank you. I will give that a try later this week and update if it worked as expected! May finally solve my private dependencies issue 😂
By script, you mean entering these two lines separately as a list for that extra_docker_shell_scripts
arugment?
Sounds good to me. Thanks Martin 🙂
Hey Martin. By labels map, I'm referring to the labels map assigned to the model. The one you can view in the models tab // labels
using this method training_task.set_model_label_enumeration(label_map)
This is a suspicion only. It could be something else. In my case, there is no artifact or other config with a dict containing that key. Only the label map contains that key
Yeah, it's not urgent. I will change the labels around to avoid this error 🙂 thanks for checking!
No worries, happy to help with the bug hunt 😄
We are planning on using airflow as the orchestration, but that may not fit your needs. I would say that the tool of choice is highly context specific.
We will be using airflow to trigger clearml-defined pipelines based on 'trigger' events, such as degradation in model performance, error alerts (e.g. at the data transformation task) etc.
Any news on this bug?
Nope, from a remote server. It was that I had installed the package from git locally, so when pushing the task, clearml assumed it should also install from git. I since installed the package from the private pypi and it all works as expected now 🙂