Reputation
Badges 1
121 × Eureka!Yeah, currently we are evaulating Seldon.. But was wondering whether clearml enterprise version wud do something similar ?
However, I am able to get it to work, if I launch a clearml-agent outside the kubernetes ecosystem.
It'll be good if there was yaml file to deploy clearml-agents into the k8 system.
I just checked the /root/clearml.conf file and it just containssdk{ }
Just to add on, I am using minikube now.
Could it be another applications's "elasticsearch-pv" and not clearml's
AgitatedDove14 I am confused now.. Isnt this feature not available in the k8 glue ? Or is it going to be implemented ?
It is like generating a report per Task level (esp for Training Jobs).. It's like packaging a report out per Training job..
I just downloaded the logs from the Failed task. Seem I have set the agent.package_manager.system_site_packages: true in the agent as well.
When I push a job to an agent node, i got this error.
"Error response from daemon: network None not found"
For the clearml-agent deployment file, I updated this linepython3 -m pip install clearml-agent==0.17.2rc4and restarted the deployment. However the conf file is still empty.
Should I also update the clearml-agent-services as well in the clearml-agent-services deployment file ?
AgitatedDove14
Just figured it out..
node.base_task_id is the base task, which will always be in draft mode, Instead we should use the node.executed which references the current executed node.
Currently, in the diagram here.. Clearml File server is shown as a local storage drive. Our 2 primary concerns.
Is there any ways , we can scale this file server when our data volume explodes. Maybe it wouldnt be an issue in the K8s environment anyways. Or can it also be configured such that all data is stored in the hdfs (which helps with scalablity). Is there any security to protect this data in this storage ?
No, the agent can be in any machine.
But the agent has to be running on the machine with gpu
Hi, using the pipeline examples, withstep1_dataset_artifact.py, step2_data_processing.py, step3_train_model.py ==> pipeline_controller.pyIn the above example, the pipeline_controller is stringing together 3 python files, instead could it string together 3 containers instead. Of course, we can manually compile each into a docker image, but does clearml has some similar approach baked in.
Is this some sort of polling ?
End of the day, we are just worried whether this will hog resources compared to a web-hook ? Any ideas
Hi, will proceed to close this thread. We found some issue with the underlying docker in our machines. We've have not shifted to another k8 of ec2 instances in AWS.
nice... we need moarrrrrrrr !!!!!!!!
It wud be really helpful, if you cud do the next episode on setting up clearml in kubernetes.. 😇
In anyways, keep up the good work for the community
kkie.. was checking in the forum (if anyone knows anything) before asking them..
TimelyPenguin76 :from clearml import Dataset ds = Dataset.get(dataset_project="PROJ_1", dataset_name="dataset")
Hi AgitatedDove14 ,
At this point, Showing the url of the cleamltask might be sufficient. Unless in the future, someone wants it to be customised.
But the bigger question is if there is tool to aid with this workflow building ? We are currently experimenting with airflow/prefect.
More than the documentation, my main issue was that naming executed is far too vague.. Maybe something like executed_task_id or something along that line is more appropriate. 👍
AgitatedDove14 We too self host (on prem) the helm charts in our local k8s ecosystem.
Triggering - Will be nice feature indeed, currently we are using clearml.monitors to address these now
Is it the UI presenting the entire workflow? - This portion will also be nice. (Let's say someone uses a 1) clearmldataset -> 2) Pipeline Controller (Contains preprocessing, training, hyperparamter tuning) -> 3) clearml-serving ).. If they can see the entire thing, in one flow
We are using seldon f...
what does a control plane do ? I cant understand this..
Like the serving engine, will get the user input, preprocess, infer it and send back the results..


