Reputation
Badges 1
90 × Eureka!So how do I ensure that artefacts are uploaded in the correct bucket from within clearml?
That's a good question, which I don't have an answer to 😅 I was hoping to be able to store the config file in some kind of secrets vault, and authenticating via some in-memory trace or so
I don't even know if I have a valid concern for this. Just a little worried as airflow is accessible by more departments than just DS, which could result in some disasters
Oh, that may work. Is there any docs/demos on this?
I think there is more complexity to what I am trying to achieve, but this will be a good start. Thanks!
I will need to log data set ID, transformer (not the NN architecture, just a data transformer), the model (with all hyperparameters & metadata) etc. and how all things link
Did the shell script route work? I have a similar question.
It's a little more complicated because the index URL is not fixed; it contains the token which is only valid for a max of 12 hours. That means the ~/.config/pip/pip.conf
file will also need to be updated every 12 hours. Fortunately, this editing is done automatically by authenticating AWS codeartefact in the command line by logging in.
My current thinking is as follows:
Install the awscli
- pip install awscli
(c...
One question - you can also set the agent.package_manager.extra_index_url
, but since this is dynamic, will pip install still add the extra index URL from the pip config file? Or does it have to be set in this agent config variable?
Awesome, thank you. I will give that a try later this week and update if it worked as expected! May finally solve my private dependencies issue 😂
Sounds good to me. Thanks Martin 🙂
This is a suspicion only. It could be something else. In my case, there is no artifact or other config with a dict containing that key. Only the label map contains that key
Hey Martin. By labels map, I'm referring to the labels map assigned to the model. The one you can view in the models tab // labels
New user is trying to push tasks, and the task is instantly changed to aborted from running
It should be a draft, so that it can be enqueued
We are planning to use Airflow as an extension of clearml itself, for several tasks:
we want to isolate the data validation steps from the general training pipeline; the validation will be handled using some base logic and some more advanced validations using something like great expectations. our training data will be a snapshot from the most recent 2 weeks, and this training data will be used across multiple tasks to automate the scheduling and execution of training pipelines periodically e...
In particular, I am trying to find a neat way to query all models available, and use tags to know the context. As it stands, I log the model accuracies/RMEs as part of the metadata, alongside the training data filepath. Issue is that this is not the neatest way of querying models across tasks without a lot of laborious manual lifting. Suggestions welcome
While we're here, how can I return the model accuracy (or any performance metric for that matter) given a model(s) belonging to a particular task? Is this information stored anywhere or do I need to explicitly log this data somehow?
Thanks maestro. Will give this a go
We are planning on using airflow as the orchestration, but that may not fit your needs. I would say that the tool of choice is highly context specific.
We will be using airflow to trigger clearml-defined pipelines based on 'trigger' events, such as degradation in model performance, error alerts (e.g. at the data transformation task) etc.
As in an object from memory directly, without having to export the file first. I thought boto3 can handle this, but looking at the docs again, it doesn't look like it. File-like objects is their term, so maybe an export is required
Reason I am asking is because we have servers with large RAM capacity, but minimal storage capacity, meaning that objects held in memory can sometimes surpass storage capacity if export is required
Yeah, it's not urgent. I will change the labels around to avoid this error 🙂 thanks for checking!
using this method training_task.set_model_label_enumeration(label_map)
No worries, happy to help with the bug hunt 😄
Any news on this bug?
How we resolved this issue is by developing a package to deal with the connecting and querying to databases. This package is then used inside the task for pulling the data from the data warehouse. There is a devops component here for authorising access to the relevant secret (we used SecretsManager on AWS). The clearml-agent instances are launched with role permissions which allow access to the relevant secrets. Hope that is helpful to you
Where are you storing your secret JitteryCoyote63 ?
Thanks Jake, I will have a look. Is there a reason a lot disk space would be used on the server instance? Is there something in the config I can change to ensure that minimal memory is used on that server, and mostly s3 is used for storage?