In the Kube logs of the pod, i see 'Err:1 http://security.ubuntu.com/ubuntu bionic-security InRelease Temporary failure resolving http://security.ubuntu.com '. My guess is its trying to do a apt update.
As we are on disconnected network, we have a server hosting the repo but on a differennt name.
Hi it is missing --docker on the agent. Thanks! Dynamic GPU option only available with Enterprise version right?
I see, so its a path. Another question, as far as i can tell, clearml-data will download entire datasets before starting training. This isn't very ideal when we are dealing with billions of datasets (E.g. WE might want to download a subset at a time, send to GPU for training and then use the CPU to concurrently pull another subset.). Any comments on this?
Ok, i guess i will have to kill the whole thing and refresh it.
Thanks that did solve the problem, the tasks are running again.
thanks GrumpyPenguin23 , i'll look deeper on that. This kinda fits what i am looking for but its for TRAINS and there's no technical how-to.
https://clear.ml/blog/stop-using-kubernetes-for-ml-ops/
No issues. I know its hard to track open threads with Slack. I wish there's a plugin for this too. 🙂
Can i dig into the mongodb or ES to pull these data?
Got that thanks. Just to better understand. When clearml-data upload my recursive folder of image data, it convert it into a compressed form with a different folder structure than the original datasets.
When my software pull the data, i'm returned a str. How would we manipulate the data from there?
Do you mean by this that you want to be able to seamlessly deploy models that were tracked using ClearML experiment manager with ClearML serving?
Ideally that's best. Imagine that i used Spacy (Among other frameworks) and i just need to add the one or two lines of clearml codes in my python scripts and i get to track the experiments. Then when it comes to deployment, i don't have to worry about Spacy having a model format that Triton doesn't recognise.
Do you want clearml serving ...
It's a local deployment. I was only presented with username without a need to enter passwords. When I'm in, I don't see an option in my profile to set a password as well. Neither is there integration with ldap for example.
Hi SuccessfulKoala55 I was refering to the Task.init() or any other SDK API that we use in our training codes.
Hi, just wondering if this 'feature: Passing env via the code' is in the works?
https://clearml.slack.com/archives/CTK20V944/p1616677400127900?thread_ts=1616585832.098200&cid=CTK20V944
Thanks CostlyOstrich36 , how do i know how is the parts indexed in the first place? Or rather, how is chunk and parts defined? Say in the context of images, videos, text documents...etc.
docker exec clearml-elastic curl
zsh: no matches found:
Hi TimelyPenguin76 , i am adding a debug sample to an existing task using the above method. What should i put for the iteration? I do not want to overwrite existing ones but i do not know what's the last count. This is for both scalar and media reporting.
Hi, for both of them, args.lastiter
is the exact same value. But when plotted out, they are 2 actually iterations apart.
Hi AgitatedDove14 , i changed everything to cuda 10.1 and tried again with the same rrror. the section as follows. I made sure torch==1.6.0+cu101 and torchvision==0.8.2+cu101 are in the pypi repo. But the same error still came up.
` # Python 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
boto3 == 1.14.56
clearml == 0.17.4
numpy == 1.19.1
torch == 1.6.0
torchvision == 0.7.0
Detailed import analysis
**************************
IMPORT PACKAGE boto3
clearml.storage: 0
IMPORT PACKAG...
AlertBlackbird30 , Actually the log says 10.2.docker_cmd = nvidia/cuda:10.2-devel-ubuntu18.04 -e GIT_SSL_NO_VERIFY=true
Hi SuccessfulKoala55 , thanks. Opened issue on the CLearml-Agent GH at https://github.com/allegroai/clearml-agent/issues/67
Hi, the problem is the same.
I noticed that its not checking out the latest version in gitlab. This latest version would contain the requirements.txt.Using cached repository in "/root/.clearml/vcs-cache/pytorchmnist.f220373e7227ec760b28c7f4cd99b534/pytorchmnist" warning: redirecting to
Note: checking out 'cfb833bcc70f3e10d3b6a96cfad3225ed682382b'.
But i'm guessing this block below applied the diff..does it include the requirements.txt though?
` HEAD is now at cfb833b Upload New Fil...
I thought of another potential way but not sure if the SDK supports it.
We will perform manual save and upload of model using vanilla boto3 and credentials passed in as env var. Use ClearML SDK to update the Model Repo on the location of the model, without ClearML uploading it explicitly.
Would the above work?
Is there anyway to see an error log from that?
Hi SuccessfulKoala55 , just wondering how i can follow up on this.
Yes of cos, its a long one.
like create multiple datasets?
create parent (all) - upload to S3
create child1 (first 100k)
create child2 (second 100k)...blah blah
Then only pull indices from children. Technically workable but not sure if its best approach since different ppl have different batch sizes in mind.