Reputation
Badges 1
282 × Eureka!Hi it is missing --docker on the agent. Thanks! Dynamic GPU option only available with Enterprise version right?
Hi. Yup the model was not physically uploaded with the up:port into the bucket, although ClearML does indicate that it's there, except that I can't download it. I also verified this with another S3 client, the model was not there as well.
Hi CostlyOstrich36 , nothing in particular. I was doing a research and noticed that ML Pipelines was mentioned not even once in the literature. So i wonder if one should be done. I'm looking at other aspects as well but i'll gradually ask on those.
Hi thanks for the examples! I will look into them. Quite a fair bit of my teams uses tf datasets to pull data directly from object stores, so tfrecords and stuff are heavily involved. I'm trying to figure if they should version the raw data or the tfrecords with ClearML, and if downloading entire set of data to local can be avoided as tf datasets is able to handle batch downloading quite well.
Hi, building a container with vscode is not possible. If i have an alternative location for the vscode, where should i indicate in the configuration?
I didn't track the version on this change in behaviour. But last I tried it was able to download the content after I provide the credentials.
Thanks that did solve the problem, the tasks are running again.
In the Kube logs of the pod, i see 'Err:1 http://security.ubuntu.com/ubuntu bionic-security InRelease Temporary failure resolving http://security.ubuntu.com '. My guess is its trying to do a apt update.
As we are on disconnected network, we have a server hosting the repo but on a differennt name.
Transform feature engineering and data processing code into recurring data ingestion workflows. Start building data stores, develop, automate, and schedule complex data processing jobs.
Yes, as listed in the snippet. The torch library is torchvision.
clearml-serving does not support Spacy models out of the box among many others and that Clearml-Serving only supports following;
Support Machine Learning Models (Scikit Learn, XGBoost, LightGBM)
Support Deep Learning Models (Tensorflow, PyTorch, ONNX).
An easy way to extend support to different models would be a boon.
I believe in such scenarios, a custom engine would be required. I would like to know, how difficult is it to create a custom engine with clearml-serving? For example, in this...
I would say its intermittent.
I see, so its a path. Another question, as far as i can tell, clearml-data will download entire datasets before starting training. This isn't very ideal when we are dealing with billions of datasets (E.g. WE might want to download a subset at a time, send to GPU for training and then use the CPU to concurrently pull another subset.). Any comments on this?
I did another test by runningkubectl exec pod-name -- echo $PIP_INDEX_URL and it returned nothing. So the env are not passed to the container at all.
Sorry AgitatedDove14 can you bump me to that thread?
It also stopped taking in tasks from the queue after that.
Hi AgitatedDove14 , i changed everything to cuda 10.1 and tried again with the same rrror. the section as follows. I made sure torch==1.6.0+cu101 and torchvision==0.8.2+cu101 are in the pypi repo. But the same error still came up.
` # Python 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
boto3 == 1.14.56
clearml == 0.17.4
numpy == 1.19.1
torch == 1.6.0
torchvision == 0.7.0
Detailed import analysis
**************************
IMPORT PACKAGE boto3
clearml.storage: 0
IMPORT PACKAG...
Hi.
We tried as advised above and it still didn't work.
Host: http://ecs.ai:443
output_uri = S3://ecs.ai:443/bucketname
This time round the client gave this error.
Botocore.exceptions.connectiinclosederror: connection was closed before we received a valid response from endpoint URL: ' http://ecs.ai/bucketname/.clearml.test '.
It's quite apparent that whatever clearml passed to boto3 ends up as a http call instead of https, which is wrong.
Hi AgitatedDove14 , i was refering totask.set_base_docker("nvcr.io/nvidia/tensorflow:19.11-tf2-py3 --env TRAINS_AGENT_GIT_USER=git_username_here --env TRAINS_AGENT_GIT_PASS=git_password_here")The above will give errorskipping docker argument TRAINS_AGENT_GIT_USER=git_username_here (only -e --env supported) TRAINS_AGENT_GIT_PASS=git_username_here (only -e --env supported)
Hi, currently the ClearML SDK only supports python. If i want to run my ML in other languages, can i use a SDK in that language? Or is there other means such as a Web API calls that does the same as the SDK?
Thanks could you share the URL to this full API documentation?
Ah ok. So it will be fixed on the ClearML server web UI as well? (See screenshots).
Previously we had similar issues when we switched images used in agent. Might want to check on that.
ah ok, so if i see Jax's workspace on https://app.community.clear.ml/dashboard , then i'm on the right track? How regular does this reset then?
Hi, is this currently not working? http://app.community.clear.ml ? I noticed that cleaml UI will cache on the browser and if the backend is not running, its not clear to user that something is wrong (except for broken pages).
Yes it is! But ClearML didn't support multi node training out of the box in a way that it streamline the process. So we are trying to figure out a way to do it.