Reputation
Badges 1
282 × Eureka!Hi SuccessfulKoala55 , just wondering how i can follow up on this.
Ok thanks, that worked.
No issues. I know its hard to track open threads with Slack. I wish there's a plugin for this too. 🙂
Any idea where i can find the relevant API calls for this?
Oh, this meant i have been using the latest agent which is v1.0.0. The problems were still there.
Unfortunately due to security, clients can't have direct access to the nodes. Is there any possible workarounds at the moment?
Thanks. Have a better understanding now.
like create multiple datasets?
create parent (all) - upload to S3
create child1 (first 100k)
create child2 (second 100k)...blah blah
Then only pull indices from children. Technically workable but not sure if its best approach since different ppl have different batch sizes in mind.
Hi SuccessfulKoala55 , is there a channel here that posts version updates?
In the Kube logs of the pod, i see 'Err:1 http://security.ubuntu.com/ubuntu bionic-security InRelease Temporary failure resolving http://security.ubuntu.com '. My guess is its trying to do a apt update.
As we are on disconnected network, we have a server hosting the repo but on a differennt name.
Hi, any idea if i can acheive this? I just need a list of usernames.
f you can directly access the machine running the agent, yes you could. If not reverse proxy is in the workingÂ
Hi AgitatedDove14 , i might have misunderstood your previous comment above. Do you mean that clearml-session can only work regardless of whether xforwarding is configured, if we have direct access to the Kubernetes worker when we run K8S glue?
We did some testing today and clearml-session tried to tunnel with a k8s cluster ip, and thus failed.
If we setup a ingress with Me...
Hi. Anything that can point to activity by user.
Hi CostlyOstrich36 , thanks. I will check with the Enterprise team then.
Hi AgitatedDove14 , thanks.
In this case i am running k8s glue (machine glue), which will then spawn off pods in kubernetes worker (machine worker). So when you say direct access, are you refering to the Glue machine or K8S Worker machine?
Hi just wondering if I did something wrong here. Would k8s-glue be the reason is not working? I'm purchasing the enterprise version and if vault has the same problem it'll be a big issue.
yah i got that too. This happens when i run the client code on the same machine as the clearml-agent. So i'm wondering if sharing the same clearml.conf cause that problem. Is there a way to specify the clearml.conf instead of defaulting to ~/clearml.conf?
This is a env var?
CLEARML_CONFIG_FILE
Hi AgitatedDove14 , i was refering totask.set_base_docker("nvcr.io/nvidia/tensorflow:19.11-tf2-py3 --env TRAINS_AGENT_GIT_USER=git_username_here --env TRAINS_AGENT_GIT_PASS=git_password_here")
The above will give errorskipping docker argument TRAINS_AGENT_GIT_USER=git_username_here (only -e --env supported) TRAINS_AGENT_GIT_PASS=git_username_here (only -e --env supported)
Hi CostlyOstrich36 , That's correct.
We are using k8s glue to spawn the job. Would you be able to advise in detail of steps on what goes on when the above code executes?
Ok, i guess i will have to kill the whole thing and refresh it.
It's a local deployment. I was only presented with username without a need to enter passwords. When I'm in, I don't see an option in my profile to set a password as well. Neither is there integration with ldap for example.
I'm not familiar with elastic. What role does elastic play in ClearML?
docker exec clearml-elastic curl
zsh: no matches found:
Hi AgitatedDove14 , i changed everything to cuda 10.1 and tried again with the same rrror. the section as follows. I made sure torch==1.6.0+cu101 and torchvision==0.8.2+cu101 are in the pypi repo. But the same error still came up.
` # Python 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
boto3 == 1.14.56
clearml == 0.17.4
numpy == 1.19.1
torch == 1.6.0
torchvision == 0.7.0