 
			Reputation
Badges 1
282 × Eureka!AgitatedDove14 , would you elaborate on this resolution process?
Ok sure. Thanks.
Hi SuccessfulKoala55 , is there a channel here that posts version updates?
I used nvcr pytorch image and instruct clearml to inherit global dependencies. No need to install torch and work well.
Here's my two cents worth.
I thought its really nice to start off the topic highlighting 'pipelines', its unfortunately one of the most missed component when ppl start off with ML work. Your article mentioned about drfits and how MLOps process covered it. I thought there are 2 more components that was important and deserves some mention.Retraining pipelines. ML engineers tend not to give much thought to how they want to transit a training pipeline in development to a automated retraining pipe...
clearml=1.0.3
python=3.8.10clearml-data upload --id 12314jhg42342j4j --storagehttp://ecs.ai  is an on-prem DELL EMC ECS that serves as our S3 storage configured with s self signed cert.
Space is way above nominal. What created this folder that it's trying to process? What processing is this?Processing /tmp/build/80754af9/attrs_1604765588209/workIs there any paths in the agent machine that i can clear out to remove any possible issues from previous versions?
From ClearML perspective, how would we enable this, considering we don't have direct control or even IP of the agents
Create immutable and differentiable versions on-prem or in the cloud with our data agnostic solution.
I see i understand better now. Thanks.
Thank. Gonna try that out. But i hit another snag. Strangely, the Agent is not creating the right venv. This is what the Agent created.
` pip:
- asn1crypto==0.24.0
- attrs==20.3.0
- certifi==2020.12.5
- chardet==4.0.0
- cryptography==2.1.4
- Cython==0.29.22
- furl==2.1.0
- future==0.18.2
- humanfriendly==9.1
- idna==2.6
- importlib-metadata==3.7.0
- jsonschema==3.2.0
- keyring==10.6.0
- keyrings.alt==3.0
- orderedmultidict==1.0.1
- pathlib2==2.3.5
- psutil==5.8.0
- pycrypto==2.6.1
- pygobject...
That didn't work as well...
Hi, the problem is the same.
I noticed that its not checking out the latest version in gitlab. This latest version would contain the requirements.txt.Using cached repository in "/root/.clearml/vcs-cache/pytorchmnist.f220373e7227ec760b28c7f4cd99b534/pytorchmnist" warning: redirecting to Note: checking out 'cfb833bcc70f3e10d3b6a96cfad3225ed682382b'.But i'm guessing this block below applied the diff..does it include the requirements.txt though?
` HEAD is now at cfb833b Upload New Fil...
Yes, as listed in the snippet. The torch library is torchvision.
Thanks. That's easy to miss as its not quite apparent in the main docs. How should i pass in env variables with Task?
Yeah.. issue is ClearML unable to talk to the nodes cos pytorch distributed needs to know their IP. There is some sort of integration missing that would enable this.
The first is probably done using pipeline controllers, the second using Datasets or HyperDatasets. Its not very clear how the last one is achieved, especially on the searchable data catalogs.
what feature on this paid roadmap are you referring to? I am indeed communicating with Noem on paid features.
Hi SuccessfulKoala55 , just wondering how i can follow up on this.
Try set  docker_force_pull: true  under agent section of your agent's clearml.conf.
Its hard to tell, but the agent change was a significant one. Unless python versions has something to do with it.
Thanks SuccessfulKoala55 . Just pm'ed him.
Hi thanks. How about Agent, does its docker mode or k8s mode require docker.sock to be exposed?
Hi,
It did, nvidia/cuda:10.1-runtime-ubuntu18.04.
So if i need to set this every time, what is the following config for? And how do i pass in new env parameters?
` default_docker: {
# default docker image to use when running in docker mode
image: "dockerrepo/mydocker:custom"
    # optional arguments to pass to docker image
    # arguments: ["--ipc=host", ]
    arguments: ["--env GIT_SSL_NO_VERIFY=true",]
} `
This would be solved if --env GIT_SSL_NO_VERIFY=true is passed to the k8s pod that's spawned to run the job. Currently its not.
Thanks 👍 . Should i create an issue on Github?