
Reputation
Badges 1
121 × Eureka!TimelyPenguin76 :from clearml import Dataset ds = Dataset.get(dataset_project="PROJ_1", dataset_name="dataset")
The use case, is lets say i runpython k8s_glue_example.py --queue glue_q
And some guys pushes an hyperparameterization job with over 100 experiments to the glue_q, one minute later, I push a simple training job to glue_q.. But I will be forced to wait for the 100 experiments to finish.
Mostly DL, but I suppose there could be ML use cases also
Hi AgitatedDove14 , Now we prefer to run dynamic agents instead usingpython3 k8s_glue_example.py
In this case, is it still possible to pass --order-fairness at the queue level or this is more of a Enterprise edition feature.
AgitatedDove14 I am confused now.. Isnt this feature not available in the k8 glue ? Or is it going to be implemented ?
No no, I mean now i can export a csv file into clearml-data. I was wondering if it possible to export directly from a sql database.
This is from my k8 cluster. Using the clearml helm package, I was able to set this up.
Ah, so in the future, we can add non-clearml code as a step in the pipeline controller.
what does a control plane do ? I cant understand this..
Like the serving engine, will get the user input, preprocess, infer it and send back the results..
Yeah, I restarted the deployment and sshed into the host machine also.. (Img below)
Github Issue : https://github.com/allegroai/clearml-agent/issues/50
AgitatedDove14 , Have added the github issue as requested. Thanks for the help. 👍
Hi AgitatedDove14 ,
At this point, Showing the url of the cleamltask might be sufficient. Unless in the future, someone wants it to be customised.
But the bigger question is if there is tool to aid with this workflow building ? We are currently experimenting with airflow/prefect.
Hi martin, i just untemplate-ed thehelm template clearml-server-chart-0.17.0+1.tgz
I found this lines inside.- name: CLEARML_AGENT_DOCKER_HOST_MOUNT value: /opt/clearml/agent:/root/.clearml
Upon ssh-ing into the folders in the both the physical node (/opt/clearml/agent) and the pod (/root/.clearml), it seems there are some files there.. So the mounting worked, it seems.
I am not sure, I get your answer. Should i change the values to something else ?
Thanks
TimelyPenguin76 : Yup that's what I do now.. However, shld config to use some distributed storage later
nice... we need moarrrrrrrr !!!!!!!!
It wud be really helpful, if you cud do the next episode on setting up clearml in kubernetes.. 😇
In anyways, keep up the good work for the community
Hi AgitatedDove14 , Just your reply on https://github.com/allegroai/clearml-agent/issues/50#issuecomment-811554045Basically as jobs are pulled by order, they are pushed into the k8s, then if we hit the max k8s instance limit, we stop pulling jobs until a k8s job is completed, then we continue.
For this scenario,
k8s has an instance limit of 10 (let's say)
I run Optimization (it has about 100 jobs) but only the first 10 will be pulled in k8. After this, I run a single Deep Learning (DL)...
It is like generating a report per Task level (esp for Training Jobs).. It's like packaging a report out per Training job..
We have k8s on ec2 instances in the cloud. I'll try it there 2morrow and report back..
let me run the clearml-agent outside the k8 system.. and get back to u
AgitatedDove14 We too self host (on prem) the helm charts in our local k8s ecosystem.
Triggering - Will be nice feature indeed, currently we are using clearml.monitors to address these now
Is it the UI presenting the entire workflow? - This portion will also be nice. (Let's say someone uses a 1) clearmldataset -> 2) Pipeline Controller (Contains preprocessing, training, hyperparamter tuning) -> 3) clearml-serving ).. If they can see the entire thing, in one flow
We are using seldon f...
I just checked the /root/clearml.conf file and it just containssdk{ }
Hi, for the values.yaml, is there some reference for it esp so , if we assign more Memory to webserver service etc. I tried googling around but so far no luck
Yeah within clearml , we use the PipelineController. We are now mainly looking for a single tool to stitch together other products.
But of course, will give first precedence to tools which will work best with clearml. Thus asking, if anyone has had similar experience on setting up such systems.
Hi AgitatedDove14 , imho links are def better, unless someone decides to archive their Tasks.. Just wondering about the possibility only..
I did update it to clearml-agent 0.17.2 , however the issue still persists for this long-lasting service pod.
However, this issue is no more when trying to dynamically allocate pods using the Kubernetes Glue.k8s_glue_example.py
Just to add on, I am using minikube now.