Reputation
Badges 1
282 × Eureka!Thanks this would be a good alternative before the enterprise version comes in. How is this different from argparser btw?
Some breakthrough. The problem is because we switched the web, api and files server to use https (ssl) endpoint instead. I had switched back to http end points to test this theory.
Although its not printing the error, i suspect its not able to connect due to lack of the self signed cert. Previously this wasn't an issue, not sure what changed in clearml_agent=1.1.0.
There's a secondary issue resulting, i will put this on a new thread.
Hi, we are still not getting the model repo to work, mainly due to clearml.storage failing to save the models.
We tried a vanilla boto3 code and it works, but we can't figure out why we get connectionreseterror 104 when clearml does it.
How do we configure clearml in correspondence to following boto code?
S3= boto3.resource('s3', endpoint_url=' https://ecs.ai ', aws_access_key_id='mykey', aws_secret_access_key='mysevret', config=Config(signature_version='s3v4'), region_name='us-east-1', ve...
Likely network. Can you run a curl on ClearML server api server from jenkin stage and see if that gets through?
Hi, it make sense if i only had to change hyperparameters, but it's not so when i am still changing the model architecture (training code) and train and repeat.
Thanks. That's easy to miss as its not quite apparent in the main docs. How should i pass in env variables with Task?
Hi SuccessfulKoala55 I was refering to the Task.init() or any other SDK API that we use in our training codes.
and out of curiosity, what did you think we were talking about? cos i didn't see anywhere else that might print the secrets.
alright thanks. Its impt we clarify it works before we migrate the ifra.
Hi just wondering if I did something wrong here. Would k8s-glue be the reason is not working? I'm purchasing the enterprise version and if vault has the same problem it'll be a big issue.
Sorry i don't quite understand this. The task itself was submitted as I run the code on the client. I suppose the dependancies requirements would be copied over as the experiment is cloned?
Hi we did a check. Only 7.16.1 and 6.8.21 and above mitigates the attack. What's the current version that ClearML is using?
Thought this looked familiar.
https://clearml.slack.com/archives/CTK20V944/p1635323823155700?thread_ts=1635323823.155700&cid=CTK20V944
where should i indicate in the configuration?
Any idea?
Hi. Anything that can point to activity by user.
Hi, by agent logs i suppose you meant the logs from the ClearML server console panel?
I'm having the same problem. You using latest clearmagent? Is your docker image a root user by default?
Hi CostlyOstrich36 , That's correct.
Thanks. Which brings me to the question. How does ClearML deal with all the CVEs? What is your process for response?
Hi, by deployment strategies I meant by canary, blue-green...etc..etc. I figured this should be done by clearml-serving and maybe seldon as well.
Thanks, its attached.
I also noted that the status on the ClearML is always in 'pending', unlike others which says 'Running'. Is this a side effect of using k8s glue?
Its. 0.17-63.
It doesn't appear in profile page.
[root@2c7498711bef elasticsearch]# curl `
{
"index" : "events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2021-05-22T11:33:38.932Z",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisi...
From ClearML perspective, how would we enable this, considering we don't have direct control or even IP of the agents
Thanks. We set this configuration and the client ran and submitted the job for remote execution (agent running k8s glue). However when the job runs, and tries to save into model repo, this error came up.
ClearML.storage - ERROR - Failed creating storage object S3://ecs.ai Reason; Missing key and secret for S3 storage access ( S3://ECS.ai ).
I remember being told that the ClearML.conf on the client will not be used in a remote execution like the above so I think this was the problem. I also...
They don't have the same version. I do seem to notice that if the client is using version 3.8, during remote execution will try to use that same version despite the docker image not installed with that version.
Ok sure. Thanks.
Yeah that'll cover the first two points, but I don't see how it'll end up as a dataset catalogue as advertised.
Ok thanks. that explains alot. We have been doing this wrongly the whole time, thinking that the clearml.conf on the client side would be acknowledged by the remote agent execution. In reality, only the API section is utilised.