Reputation
Badges 1
212 × Eureka!I just opened a shell with the api and tried to curl my files URL, and the curl just hangs. no response
It seems like the clearml python sdk might have issues when a subprocess is opened?
Are there any work arounds to this issue? Our team is evaluating this product to potentially buy enterprise license. If we can't fetch data this is a problem.
Figured this out, the value is parsed from my local clearml.conf file
err maybe not, I dont know where its being fetched
yea let me unwind some changes so I can pinpoint the issue
You could change infrastructure or hosting, and now your data is associated with the wrong URL
yep that fixed it using references like clearml-webserver.clearml.svc.cluster.local:80
Okay so I just tried this and immediately I'm getting errors Failed to establish a new connection: because the file server URL in my clearml.conf is the k8 dns name. So I'm sort of stuck because if I revert it to the public DNS name, then upon Dataset.get I will get same failure.
I made the PR here JuicyFox94 AgitatedDove14 https://github.com/allegroai/clearml-helm-charts/pull/106
For instance, if I wanted the default queue and gpu queue that I create, how do I do that?
Just curious, if https://github.com/allegroai/clearml-helm-charts/blob/19a6785a03b780c2d22da1e79bcd69ac9ffcd839/charts/clearml-agent/values.yaml#L50 is a value I can set, where is it used? It would be great if it overrides the Dataset.get embedded url parsed from my clearml conf file
Thanks that worked, I had to set the AWS_PROFILE as well
that is the containerinit logs from k8glueagent
Also how do I provide the k8 glue agent permissions to spin up/down ec2 nodes?
When I click on a task details -> info tab, it seems like each task is setup to run on a single pod/node based on the attributes like gpu memory , os , num of cores, worker
Yep got it, I was under the impression I could set those values in the UI but I now see they are parsed from my local workstation
so its not the files server, its every server
Seems like its just missing the brackets
do I have to fetch it via code? I was hoping to not modify my scripts
AgitatedDove14 Will I need sudo permissions if I add this script to extra_docker_shell_scriptecho "192.241.xx.xx venus.example.com venus" >> /etc/hosts
Okay, so basically the DL framework manages the master/worker relationship. I just need to use pod replicas for my k8 agents.
Would I copy and paste this block to produce another queue and k8 glue agent?
AgitatedDove14 note the missing brackets https://github.com/allegroai/clearml-helm-charts/blob/main/charts/clearml-agent/templates/agentk8sglue-deployment.yaml#L22
Yes I will try that
