Reputation
Badges 1
282 × Eureka!I used nvcr pytorch image and instruct clearml to inherit global dependencies. No need to install torch and work well.
Does the glue write any error logs anywhere? I only see CLEARML_AGENT_UPDATE_VERSION =
and nothing else.
Likely network. Can you run a curl on ClearML server api server from jenkin stage and see if that gets through?
Hi. Yup the model was not physically uploaded with the up:port into the bucket, although ClearML does indicate that it's there, except that I can't download it. I also verified this with another S3 client, the model was not there as well.
Hi CostlyOstrich36 , nothing in particular. I was doing a research and noticed that ML Pipelines was mentioned not even once in the literature. So i wonder if one should be done. I'm looking at other aspects as well but i'll gradually ask on those.
Hi, how may i task.init() within these sub processes without write access to the 3rd party scripts and python executables?
Hi, by agent logs i suppose you meant the logs from the ClearML server console panel?
If we run all the rank 0 and rank n tasks individually, it's defeats the purpose of using ClearML.
Here's my two cents worth.
I thought its really nice to start off the topic highlighting 'pipelines', its unfortunately one of the most missed component when ppl start off with ML work. Your article mentioned about drfits and how MLOps process covered it. I thought there are 2 more components that was important and deserves some mention.Retraining pipelines. ML engineers tend not to give much thought to how they want to transit a training pipeline in development to a automated retraining pipe...
AlertBlackbird30 , Actually the log says 10.2.docker_cmd = nvidia/cuda:10.2-devel-ubuntu18.04 -e GIT_SSL_NO_VERIFY=true
Yes of cos, its a long one.
where should i indicate in the configuration?
Any idea?
Hi, building a container with vscode is not possible. If i have an alternative location for the vscode, where should i indicate in the configuration?
The agent is running on a disconnected server on docker mode. I have a client that runs clearml-session and i saw from the agent's logs that the installation of vscode fails.
Hi. The upgrade seems to go well but i'm seeing one wierd output. When i ran a task and observe the software installed
under the execution
tab , i still see clearml=0.17
. Is this expected?
I think in general, the 'published' action can be considered an 'approval'. The question is, how do we control who has the authority to 'publish'? The Web UI today does not support any uploads outside of the coding environment, would be nice it would be supported. But for now, the only workaround is to include parameters that stores document urls in the user properties.
Hi CostlyOstrich36 , thanks. I will check with the Enterprise team then.
and yes, there are stuff in there. In fact its been running for a few weeks with no issue. This appears to have happened after i added new workers, though i can't be sure this is the cause. Is there a limit to the number of workers that i can add for community edition?
What's the diff between template-yaml and --overrides-yaml? I used the latter to ensure the gpu is passed in.
Hi, i changed it, but it still point to https://files.pythonhosted.org/packages .
[root@2c7498711bef elasticsearch]# curl -XGET
`
yellow open events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b 4hAFNtGkRr-CHNGnUYfbTA 1 1 4724 271 660.9kb 660.9kb
yellow open events-log-d1bd92a3b039400cbafc60a7a5b1e52b M3qgFy1HRU2PibDOr1YOdw 1 1 1221 20 1013.6kb 1013.6kb
red open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-05 EQK8mnlhRxCrrKK3clcUFA 1 1
red open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_...
Thanks AgitatedDove14 , unfortunately it didn't take effect.
Ok thanks, looking forward to it. Would you advise on the bug you encountered?
Ok, i guess i will have to kill the whole thing and refresh it.
So these (PIP_INDEX_URL) weren't used when clearml starts running pip.
Hi, this is what i got. No mention of the env variables.
` Current configuration (clearml_agent v0.17.2, location: /home/jax/clearml.conf):
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
ap...
I did another test by runningkubectl exec pod-name -- echo $PIP_INDEX_URL
and it returned nothing. So the env are not passed to the container at all.
clearml-serving does not support Spacy models out of the box among many others and that Clearml-Serving only supports following;
Support Machine Learning Models (Scikit Learn, XGBoost, LightGBM)
Support Deep Learning Models (Tensorflow, PyTorch, ONNX).
An easy way to extend support to different models would be a boon.
I believe in such scenarios, a custom engine would be required. I would like to know, how difficult is it to create a custom engine with clearml-serving? For example, in this...