Reputation
Badges 1
212 × Eureka!That is the problem, the if
condition is not evaluating to True
Would be great if the docker_bash_setup_script
had output I could see
I dont know how to do that
I made the PR here JuicyFox94 AgitatedDove14 https://github.com/allegroai/clearml-helm-charts/pull/106
Do you want me to PR that fix?
Are you able to do screenshare to discuss this? I'm not sure I understand the k8 glue agent purpose.
Would using 22.04 Ubuntu still work in the task execution?
SuccessfulKoala55 It looks like it should eval to True?
Hey triggering the tasks from the CLI resolved the python pathing issues!
Also how do I provide the k8 glue agent permissions to spin up/down ec2 nodes?
I got the EFS volume mounted. Curious what advantage it would be to use the StorageManager
It will then parse the above information from my local workstation?
okay makes sense now, thanks!
I verified in the pod yaml it is set correctly
the API url works fine, returns 200
Then it tries to curl the files API and gets a 405
I don't see any requests
I can see this log message in the nginx controller"GET / HTTP/1.1" 405 178 "-" "curl/7.79.1" 95 0.003 [clearml-clearml-fileserver-8081] [] 10.36.1.61:8081 178 0.004 405 b4f5caf7665ffa1e8823a195ae41ec26
perhaps the 405 is from nginx
I just opened a shell with the api and tried to curl my files URL, and the curl just hangs. no response
Yep I updated those as well
the worker is now in the dashboard
thank you for the help!
ok yes, this is the problem
Okay, so basically the DL framework manages the master/worker relationship. I just need to use pod replicas for my k8 agents.
I think the best change would to respect the value set https://github.com/allegroai/clearml-helm-charts/blob/19a6785a03b780c2d22da1e79bcd69ac9ffcd839/charts/clearml-agent/values.yaml#L50 so you could change it down the road if infra/hosting changes. Also in this case, I'm uploading the data to the public file server URL, but my k8 pod can't reach that for security reasons.
IMO, the dataset shouldnt be tied to the clearml.conf URLs that it was uploaded with, as that URL could change. It should respect the file server URL the agent has.