Ah, so in the future, we can add non-clearml code as a step in the pipeline controller.
So now you donโt have any failures but gpu usage issue?
I didnt run the hyper_parameter_optimzer.py, as I was thinking if there is already a problem with the base, no use with running the series of experiments
How about running the ClearML agent in docker mode?
Prev, we had our clearml-agent run in the bare-metal machine instead in docker formation. There wasnt any issue.. Though I havent tried with 0.17.2 version
let me run the clearml-agent outside the k8 system.. and get back to u
` Could not load dynamic library 'libcupti.so.11.0'; dlerror: libcupti.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-03-11 09:11:17.368793: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcupti.so'; dlerror: libcupti.so: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-03-11 09...
Hi TimelyPenguin76 ,
Instead of running the hyper_parameter_optimizer.py, I tried running the base_template_keras_simple.py instead.. It seems that I didnt use the GPU, however when i ssh into clearml-glueq-id-ffaf55c984ea4dbfb059387b983746ba:gpuall pod, and ran nvidia-smi, it gave an output.
Hi, Some walk around I thought of.. Btw, I havent tried . AnxiousSeal95 , your comments
1 ) Attach a clearml-task id to each new dataset-id
So in the future, when new data comes in, get the last data commit from the project(Dataset) and get the clearml-task for it. Then clone the clearml-task, and pass in the new data. The only downside, is the need to clone the cleaml-task.
Or alternatively
2) Attach a gitsha-id of the processing code to each new dataset-id.
This can't give the exact code ...
kkie.. I have two differenet projects under clearml web server.
First project , stores datasets only.. using clearml-data (PROJ_1) Second project, is a clearml-pipeline project, (PROJ_2) which pulls the latest commited dataset from (PROJ_1) and does few other steps ... Now, I manually start the PROJ_2 when i know the dataset is updated in PROJ_1.
One use case now :
Load Data from Label Studio (Manager to manually approve) Push data to Clearml-data Run Training (Manager to manually Publish) Pushes model uri to next step Seldon deploy itLater, if seldon detects a data drift, it will automatically run (steps 2-5)..
At this point, we havent drilled all of it down yet
Btw, this is just the example code from clearml repo..
Yup, i used the value file for the agent. However, i manually edited for the agentservices (as there was no example for it in the github).. Also I am not sure what is the CLEARML_HOST_IP (left it empty)
Hmm, unfortutenly it is still pending as in nothing is running
CostlyOstrich36 :
They mentioned that they already have a Nexus backend. So just was wondering if we could use it for storage purposes.
In our local setup, we use minio though ?
AgitatedDove14 Not creating but more for orchestrating...
Currently, we manually push a dataset to cleaml-dataset .
Have a pipeline controller Task which (takes in data from clearml-dataset, runs preprocessing, runs training) and Publishes a model (if certain threshold is met).
We have clearml monitor which will monitor all Published models .It will push the uri of the published model to a rabbitmq.
We have a subscriber (python code) listening to the rabbitmq. This takes in the uri from t...
MagnificentSeaurchin79 How to do this ? Can it be done via ClearMl itself ?
sounds like you need to run a service to monitor for new commits in PROJ_1, to trigger the pipeline
Is this some sort of polling ?
End of the day, we are just worried whether this will hog resources compared to a web-hook ? Any ideas
Hi AgitatedDove14 , This isnt the issue. With or without specifying the queue, I have this error when I do the "Create version" as compared to the "Init version".
I wonder whether this is some issue with using the Create version together with execute_remotely() ..
we can always use the latest Clearml.
We were thinking of a use case for a client, who has Sonatype Nexus in their environment ? Could we leverage on it, or would we need minio instead ?
Something is wierd.. It is showing workers which are not running now...
Hi, for the values.yaml, is there some reference for it esp so , if we assign more Memory to webserver service etc. I tried googling around but so far no luck
This is from my k8 cluster. Using the clearml helm package, I was able to set this up.
i ran this in my local machine..clearml-task --project playground --name tensorboard_toy --script tensorboard_toy.py --requirements requirements.txt --queue myqueue
Just figured out..
Seems like the docker image below, didnt have tensorflow package.. ๐ฎtensorflow/tensorflow:latest-devel-gpuI shld have checked prior... My Bad..
Thanks for the help
Hi martin, i just untemplate-ed thehelm template clearml-server-chart-0.17.0+1.tgzI found this lines inside.- name: CLEARML_AGENT_DOCKER_HOST_MOUNT value: /opt/clearml/agent:/root/.clearmlUpon ssh-ing into the folders in the both the physical node (/opt/clearml/agent) and the pod (/root/.clearml), it seems there are some files there.. So the mounting worked, it seems.
I am not sure, I get your answer. Should i change the values to something else ?
Thanks
Hi AgitatedDove14 , Attached my create version compared to init version..
When I enqueue both the init and create version into my clearmlQueue, it seems the create version doesnt execute at all.
It just mentions "2021-05-26 16:02:13,053 - clearml - WARNING - Terminating local execution process" and says it has completed successfully.
Hi AgitatedDove14 , Just updated that flag, but the problem continues..
` agent.package_manager.system_site_packages = true
.....
Environment setup completed successfully
Starting Task Execution:
ClearML results page: files_server:
Traceback (most recent call last):
File "base_template_keras_simple.py", line 15, in <module>
import tensorflow as tf # noqa: F401
File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/clearml/binding/import_bind.py", line 59, in __pat...
