Reputation
Badges 1
39 × Eureka!I tried.
it looks like this,
sudo apt update
sudo apt install amazon-ecr-credential-helper
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ****
But my problem is that I can't even see whether it passes my init script properly (tried to add printing comment but I cannot see the output) anywhere (nor scaler, nor task)
Should note that it works when i run the container locally (with no external env variables).
Important to notice I am running my instances on GCP, but the container is on ECR (AWS)
I do have the configuration vault feature.
I managed to make it work.
Seems like I have been using it wrong.
In order to facilitate the multiple credentials one must use the Clearml SDK obviously.
So I just started using StorageManager and it works.
Thanks.
It's generated automatically by HPO script.
So it might be added inside the report completion section
Just found this thread,
https://clearml.slack.com/archives/CTK20V944/p1639037799391000
Will try to follow and see (although it looks the same like what I tried)
TimelyPenguin76 Maybe you were able to find the problem ?
I don't remember what was the solution.
Might just updated my ClearML version...
folder is rather small.
3.5MB
Am able to see it in the artifacts.
but can't download it (the address is wrong)
I see,
is there a possibility to "clear" a queue from python?
A "purge" method for :clearml.backend_api.session.client.Queue
?
I can only watch the current length of the queue, how do I remove all task/ specific tasks?
Nope. It gives me errors.
Just like the guy that replied in the thread I linked in my previous reply here.
as a followup on this one @<1523701070390366208:profile|CostlyOstrich36> .
how do I make my agent install python 3.9 and not 3.7?
agent.default_python: 3.9
agent.python_binary: 3.9
but getting in the task:
Python executable with version '3.9' requested by the Task, not found in path, using '/clearml_agent_venv/bin/python3' (v3.7.3) instead
It's a private image (based off of this image).
` ======================================
Welcome to the Google Deep Learning VM
Version: pytorch-gpu.1-11.m91
Based on: Debian GNU/Linux 10 (buster) (GNU/Linux 4.19.0-21-cloud-amd64 x86_64\n) `I am leaving the docker line empty, so I assume there's no docker spun up for my agent,
I don't think it's related to the region.
I do have the log of the autoscaler.
We also have an autoscaler that was implemented from scarch before ClearML had the autoscaler application.
I wouldn't want to share the autoscaler log with this channel.
CostlyOstrich36
Thank you,
Solved,
I messaged with Alon from your team and he will upload an update to the old repository.
Thanks,
solved.
I tried to delete ~/clearml.conf (apparently it was already exist)
and rerun clearml-init
Still no good, managed to apply with errors only
So anyway,
you can pickle the above object (pickle the study).
But you can't actually pickle the optimizer itself as you said/
I took it offline with Alon shomrat from ClearML.
It seems like that the problem is solved (at least for now).
It's hard for me to tell why, and also for him.
Adding the flags he added also didn't help
e.g.my_optimizer = an_optimizer.get_optimizer() plot_optimization_history(my_optimizer._study)Since my_optimizer._study is an optuna object
As a matter of fact, all my tasks are "running" state although some of them have failed
Update:
Manged to make the credentials attached to the configuration when the task is spinned,
Although boto3 in the script still uses the "default" access keys instead of the newly added keys
