unfortunately, our security posture is so strict that we cannot have an agent git user that have unfettered read access to all repos.
Hi, any idea if i can acheive this? I just need a list of usernames.
Try set docker_force_pull: true
under agent section of your agent's clearml.conf.
AlertBlackbird30 , Actually the log says 10.2.docker_cmd = nvidia/cuda:10.2-devel-ubuntu18.04 -e GIT_SSL_NO_VERIFY=true
We are deploying ClearML Server via the docker-compose.
For ClearML-Agent. We have the choice of Docker or K8S preferred (Using the Glue).
For K8S, we can't get the glue to work ( https://clearml.slack.com/archives/CTK20V944/p1614525898114200?thread_ts=1613923591.002100&cid=CTK20V944 ) so we can't make an assessment of whether it actually works for us.
Hi AgitatedDove14 , i changed everything to cuda 10.1 and tried again with the same rrror. the section as follows. I made sure torch==1.6.0+cu101 and torchvision==0.8.2+cu101 are in the pypi repo. But the same error still came up.
` # Python 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
boto3 == 1.14.56
clearml == 0.17.4
numpy == 1.19.1
torch == 1.6.0
torchvision == 0.7.0
Detailed import analysis
**************************
IMPORT PACKAGE boto3
clearml.storage: 0
IMPORT PACKAG...
Hi AgitatedDove14 , what version i should change it to? I'm currently on v0.17.2rc3.
I'm also beginning to think this is related to https://clearml.slack.com/archives/CTK20V944/p1620664770492400 . Previously when i set force_repo_requirements_txt=true
and system_site_packages: true
, it seems to work. upgrading to v1.02 seems to change things.
This one can be solved with shared cache + pipeline step, refreshing the cache in the shared cache machine.
Would you have an example of this in your code blogs to demonstrate this utilisation?
Hi, we are still not getting the model repo to work, mainly due to clearml.storage failing to save the models.
We tried a vanilla boto3 code and it works, but we can't figure out why we get connectionreseterror 104 when clearml does it.
How do we configure clearml in correspondence to following boto code?
S3= boto3.resource('s3', endpoint_url=' https://ecs.ai ', aws_access_key_id='mykey', aws_secret_access_key='mysevret', config=Config(signature_version='s3v4'), region_name='us-east-1', ve...
Oh, this meant i have been using the latest agent which is v1.0.0. The problems were still there.
the default for base_pod_num is 1.
Hi, it make sense if i only had to change hyperparameters, but it's not so when i am still changing the model architecture (training code) and train and repeat.
Hi CostlyOstrich36 , That's correct.
AgitatedDove14 , i'm Jax, not Manoj! lol. 😅 😅
thanks. That seems to work. I got a question, does it save the best model or the model in the last epoch?
Thanks. Which brings me to the question. How does ClearML deal with all the CVEs? What is your process for response?
Yes! I definitely think this is important, and hopefully we will see something thereÂ
 (or at least in the docs)
Hi AgitatedDove14 , any updates in the docs to demonstrate this yet?
Any comments on using the global python libraries without the need to 'pip install' anything?
[root@2c7498711bef elasticsearch]# curl
`
{
"cluster_name" : "clearml",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 4,
"active_shards" : 4,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 8,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" ...
Hi, for both of them, args.lastiter
 is the exact same value. But when plotted out, they are 2 actually iterations apart.
Hi SuccessfulKoala55 ,i managed to install clearml-agent==1.0.1rc5. However, the same issues occur.
so the clearml-agent daemon needs higher privilege?
It's a local deployment. I was only presented with username without a need to enter passwords. When I'm in, I don't see an option in my profile to set a password as well. Neither is there integration with ldap for example.