Reputation
Badges 1
90 × Eureka!While we are here - excuse my ignorance for now if this has already been stated in the docs ..
Is it possible to launch multiple clearml-agents on a dedicated clearml-agent server? I noticed that with one agent, only one task gets executed at one time
As in an object from memory directly, without having to export the file first. I thought boto3 can handle this, but looking at the docs again, it doesn't look like it. File-like objects is their term, so maybe an export is required
Yeah I would say that a demo on this would be great. I think this task is difficult as is given the differences in deployment architectures, but for common tasks it would be good to have some additional docs/examples 🙂
We are planning to use Airflow as an extension of clearml itself, for several tasks:
we want to isolate the data validation steps from the general training pipeline; the validation will be handled using some base logic and some more advanced validations using something like great expectations. our training data will be a snapshot from the most recent 2 weeks, and this training data will be used across multiple tasks to automate the scheduling and execution of training pipelines periodically e...
Hey Alon. Thanks for the response. I'm not quite sure I follow your answer. Where do I add this git+ssh and how?
Hey Martin. By labels map, I'm referring to the labels map assigned to the model. The one you can view in the models tab // labels
Reason I am asking is because we have servers with large RAM capacity, but minimal storage capacity, meaning that objects held in memory can sometimes surpass storage capacity if export is required
Another update - the tasks run fine and installs the packages from the correct index URL. However, by default, py_db @ git .. is added in the installed packages panel. Could this be from a requirements.txt file somewhere? To get it to work, I have to remove the @ git part, and then it works. Just very strange that it defaults to git pip install 🤔
Did the shell script route work? I have a similar question.
It's a little more complicated because the index URL is not fixed; it contains the token which is only valid for a max of 12 hours. That means the ~/.config/pip/pip.conf file will also need to be updated every 12 hours. Fortunately, this editing is done automatically by authenticating AWS codeartefact in the command line by logging in.
My current thinking is as follows:
Install the awscli - pip install awscli (c...
This is included as part of the config file at ~/clearml.conf on the clearml-agent
extra_docker_shell_script: [ "apt-get install -y awscli", "aws codeartifact login --tool pip --repository data-live --domain ds-15gifts-code", ]
Not sure how to get a log from the CLI but I can get the error from the clearml server UI, one sec
using this method training_task.set_model_label_enumeration(label_map)
Just upgraded matplotlib, going to test now
Sounds good to me. Thanks Martin 🙂
Thanks Ariel, will give it a watch now
Yeah, it's not urgent. I will change the labels around to avoid this error 🙂 thanks for checking!
No worries, happy to help with the bug hunt 😄
it just hangs when trying to upload. maybe that is the reason that the plots are not logging?
The task is dependent on a few artefacts from another task. Is there anything else I can do here?
This is a suspicion only. It could be something else. In my case, there is no artifact or other config with a dict containing that key. Only the label map contains that key
Nope, from a remote server. It was that I had installed the package from git locally, so when pushing the task, clearml assumed it should also install from git. I since installed the package from the private pypi and it all works as expected now 🙂
So how do I ensure that artefacts are uploaded in the correct bucket from within clearml?
Hey Martin. We have managed to resolve this. FYI the issue was with the resolving of the host. It had to be changed from @github.com to what the host is in the ssh config file!
I am struggling to fill in the values for the template. Some are obvious, others are not
Ok, that explains a lot. The new user was using version 1.x.x and I was using version 0.17.x. That is why I think my task was being drafted. and his was being aborted.
There is no specific use case for draft mode - it was just the mode I knew that I understood to be used for enqueuing a newly created task, but I assume that aborted now has the same functionality
Oh great, thanks! Was trying to figure out how the method knows that the docker image ID belongs to ECR. Do you have any insight into that?
This is the link I am trying to access None
So do I set the report_image to False for the plots to appear in the plots tab?
If uploaded as image, what is the target destination for logging?
I will try add some print statements to test the hanging issue