Reputation
Badges 1
121 × Eureka!No no, I mean now i can export a csv file into clearml-data. I was wondering if it possible to export directly from a sql database.
This is my example. Iteration 10 so there are 10 runs. Looking at the 4th run, 60% of the jobs, 91% iteration, 94% time.. What does it mean ?
No, the agent can be in any machine.
But the agent has to be running on the machine with gpu
One use case now :
Load Data from Label Studio (Manager to manually approve) Push data to Clearml-data Run Training (Manager to manually Publish) Pushes model uri to next step Seldon deploy itLater, if seldon detects a data drift, it will automatically run (steps 2-5)..
At this point, we havent drilled all of it down yet
CostlyOstrich36 :
They mentioned that they already have a Nexus backend. So just was wondering if we could use it for storage purposes.
In our local setup, we use minio though ?
we also might have some other steps incorporated for other tools. We intend to have Label-Studio upstream.. So defintely needed some orchestrator tool
I did update it to clearml-agent 0.17.2 , however the issue still persists for this long-lasting service pod.
However, this issue is no more when trying to dynamically allocate pods using the Kubernetes Glue.k8s_glue_example.py
Nice tutorial.. Though personally, I prefer a more clean-cut presentation (without the Yays and muaks or the the turtle). 😄 But usually, as long as content is there, it shldnt matter...
Hi AgitatedDove14 , imho links are def better, unless someone decides to archive their Tasks.. Just wondering about the possibility only..
Hi martin, i just untemplate-ed thehelm template clearml-server-chart-0.17.0+1.tgz
I found this lines inside.- name: CLEARML_AGENT_DOCKER_HOST_MOUNT value: /opt/clearml/agent:/root/.clearml
Upon ssh-ing into the folders in the both the physical node (/opt/clearml/agent) and the pod (/root/.clearml), it seems there are some files there.. So the mounting worked, it seems.
I am not sure, I get your answer. Should i change the values to something else ?
Thanks
Maybe more of data repository than a model repository...
AgitatedDove14
Just figured it out..
node.base_task_id is the base task, which will always be in draft mode, Instead we should use the node.executed which references the current executed node.
yup, i updated this in my local clearml.conf... Or should be updating this elsewhere as well
Hi AgitatedDove14 , Just updated that flag, but the problem continues..
` agent.package_manager.system_site_packages = true
.....
Environment setup completed successfully
Starting Task Execution:
ClearML results page: files_server:
Traceback (most recent call last):
File "base_template_keras_simple.py", line 15, in <module>
import tensorflow as tf # noqa: F401
File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/clearml/binding/import_bind.py", line 59, in __pat...
AgitatedDove14 Not creating but more for orchestrating...
Currently, we manually push a dataset to cleaml-dataset .
Have a pipeline controller Task which (takes in data from clearml-dataset, runs preprocessing, runs training) and Publishes a model (if certain threshold is met).
We have clearml monitor which will monitor all Published models .It will push the uri of the published model to a rabbitmq.
We have a subscriber (python code) listening to the rabbitmq. This takes in the uri from t...
Mostly DL, but I suppose there could be ML use cases also
Hi AgitatedDove14 , This isnt the issue. With or without specifying the queue, I have this error when I do the "Create version" as compared to the "Init version".
I wonder whether this is some issue with using the Create version together with execute_remotely() ..
More than the documentation, my main issue was that naming executed is far too vague.. Maybe something like executed_task_id or something along that line is more appropriate. 👍
Yeah within clearml , we use the PipelineController. We are now mainly looking for a single tool to stitch together other products.
But of course, will give first precedence to tools which will work best with clearml. Thus asking, if anyone has had similar experience on setting up such systems.
I just downloaded the logs from the Failed task. Seem I have set the agent.package_manager.system_site_packages: true
in the agent as well.