Reputation
Badges 1
90 × Eureka!I am struggling to fill in the values for the template. Some are obvious, others are not
Yeah I would say that a demo on this would be great. I think this task is difficult as is given the differences in deployment architectures, but for common tasks it would be good to have some additional docs/examples 🙂
Rightttttt I think I am starting to understanding the architecture now lol. Thank you so much for your help!
This is a suspicion only. It could be something else. In my case, there is no artifact or other config with a dict containing that key. Only the label map contains that key
Yeah, it's not urgent. I will change the labels around to avoid this error 🙂 thanks for checking!
using this method training_task.set_model_label_enumeration(label_map)
Any news on this bug?
Here is the error message from the consoleCollecting git+ssh://****@github.com/15gifts/py-db.git Cloning ssh://****@github.com/15gifts/py-db.git to /tmp/pip-req-build-xai2xts_ Running command git clone -q 'ssh://****@github.com/15gifts/py-db.git' /tmp/pip-req-build-xai2xts_ ERROR: Repository not found. fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.
Never mind. I think I figured it out. Thanks for your help 🙂
Is there any documentation on how to set up the config for the agent?
That's a good question, which I don't have an answer to 😅 I was hoping to be able to store the config file in some kind of secrets vault, and authenticating via some in-memory trace or so
Thanks Jake, I will have a look. Is there a reason a lot disk space would be used on the server instance? Is there something in the config I can change to ensure that minimal memory is used on that server, and mostly s3 is used for storage?
After some additional inspection, seems like the issue is docker related.7.7G /var/lib/docker/overlay2/
this is the directory which is causing the device storage issues.
So how do I ensure that artefacts are uploaded in the correct bucket from within clearml?
Using SSH credentials - replacing https url '
' with ssh url '
' Replacing original pip vcs 'git+
' with '
` '
Collecting py_db
Cloning ssh://@github.com/15gifts/py-db.git (to revision 851daa87317e73b4602bc1bddeca7ff16e1ac865) to /tmp/pip-install-zpiar1hv/py-db
Running command git clone -q 'ssh://@github.com/15gifts/py-db.git' /tmp/pip-install-zpiar1hv/py-db
2021-12-08 15:56:31
ERROR: Repository not found.
fatal: Could not read from remote repository.
Please...
Another update - the tasks run fine and installs the packages from the correct index URL. However, by default, py_db @ git ..
is added in the installed packages panel. Could this be from a requirements.txt
file somewhere? To get it to work, I have to remove the @ git part, and then it works. Just very strange that it defaults to git pip install 🤔
This is included as part of the config file at ~/clearml.conf
on the clearml-agent
extra_docker_shell_script: [ "apt-get install -y awscli", "aws codeartifact login --tool pip --repository data-live --domain ds-15gifts-code", ]
Not sure how to get a log from the CLI but I can get the error from the clearml server UI, one sec
I can authorise CodeArtifact if I ssh into the server, and install the private package with no issues. Seems like something is forcing clearml-agent to use github cloning to install, rather than directly pip. Not sure if this is a configuration I have set up myself, or whether the server is configured to do this
I don't think we explicitly pass the package path to the agent. I expect it to run a regular pip install but it seems to be doing it via git somehow
Nope, from a remote server. It was that I had installed the package from git locally, so when pushing the task, clearml assumed it should also install from git. I since installed the package from the private pypi and it all works as expected now 🙂
Okay solved the problem. It is using the version that is locally installed (on my laptop). Is there a way to prevent this? Perhaps a requirements.txt or something like that>
One question - you can also set the agent.package_manager.extra_index_url
, but since this is dynamic, will pip install still add the extra index URL from the pip config file? Or does it have to be set in this agent config variable?
I think there is more complexity to what I am trying to achieve, but this will be a good start. Thanks!
I will need to log data set ID, transformer (not the NN architecture, just a data transformer), the model (with all hyperparameters & metadata) etc. and how all things link
Only downside, which is not related to clearml, is that codeartifact authorisation tokens have to have a minimum lifespan of 15 mins. Usually, setting up envs before task execution takes less than a couple minutes, so the token lingers in the background. Nonetheless, all works as expected!
Where are you storing your secret JitteryCoyote63 ?
How we resolved this issue is by developing a package to deal with the connecting and querying to databases. This package is then used inside the task for pulling the data from the data warehouse. There is a devops component here for authorising access to the relevant secret (we used SecretsManager on AWS). The clearml-agent instances are launched with role permissions which allow access to the relevant secrets. Hope that is helpful to you
Ok, that explains a lot. The new user was using version 1.x.x and I was using version 0.17.x. That is why I think my task was being drafted. and his was being aborted.
There is no specific use case for draft mode - it was just the mode I knew that I understood to be used for enqueuing a newly created task, but I assume that aborted now has the same functionality
And how will it know that the container is on ECR instead of some other container repository?
I can't figure out from the examples how the external trigger works. All of our model performance stats are in the DWH, and we want to build triggers based on that, Is that possible to integrate with Clearml triggers and schedulers?