Maybe more of data repository than a model repository...
For me too, had this issue.. I realised that k8s glue, wasnt using the GPU resource compared to running it as clearml-agent..TimelyPenguin76 suggested using the latest Cuda11.0 images, though it also didnt work.
Mostly DL, but I suppose there could be ML use cases also
Yes, I am already using a Pipeline.
2. I have another project build using the Pipeline. The pipeline always loads the last commited dataset from the above Dataset project and run few other stuff.
Just not sure, how to make the Pipeline to listen to changes in the Dataset project.
TimelyPenguin76 :from clearml import Dataset ds = Dataset.get(dataset_project="PROJ_1", dataset_name="dataset")
MagnificentSeaurchin79 How to do this ? Can it be done via ClearMl itself ?
sounds like you need to run a service to monitor for new commits in PROJ_1, to trigger the pipeline
TimelyPenguin76 : Yup that's what I do now.. However, shld config to use some distributed storage later
sure, I'll post some questions once I wrap my mind around it..
Thanks JuicyFox94 .
Not really from devops background, Let me try to digest this.. 🙏
nice.. this looks a bit friendly.. 🙂 .. Let me try it.. Thanks
We have to do it in-premise.. Cloud providers are not allowed for the final implementation. Of course, now we use Cloud to test out our ideas.
Just to add on, I am using minikube now.
Hi SuccessfulKoala55 , kkie..
1)Actually, now i am using AWS. I am trying to set up Clearml server in K8. However, clearml-agents will be just another ec2-instance/docker image.
2) For phase 2, I will try Clearml AWS AutoScaler Service.
3) At this point, I think I will have a crack at JuicyFox94 's solution as well.
Our main goal, maybe I shld have stated prior. We are data scientists who need a mlops environment to track and also run our experiments..
AgitatedDove14
Just figured it out..
node.base_task_id is the base task, which will always be in draft mode, Instead we should use the node.executed which references the current executed node.
More than the documentation, my main issue was that naming executed is far too vague.. Maybe something like executed_task_id or something along that line is more appropriate. 👍
Yeah, currently we are evaulating Seldon.. But was wondering whether clearml enterprise version wud do something similar ?
It is like generating a report per Task level (esp for Training Jobs).. It's like packaging a report out per Training job..
Hi AgitatedDove14 , imho links are def better, unless someone decides to archive their Tasks.. Just wondering about the possibility only..
Ah kk, it is ---laptop:0 worker is no more now.. But wrt to our original qn, I can see the agent(worker) in the clearml-server UI ..
This is from my k8 cluster. Using the clearml helm package, I was able to set this up.
I just changed the yaml file of clearml-agent to get it to start with the above line.python3 -m pip install clearml-agent==0.17.2rc4
Nothing changed.. the clearml.conf is still as is (empty)
Hi AgitatedDove14 , I also fiddled around by changing this line and restarted the deployment. But this just causes it revert back 0.17.2rc4 again.python3 -m pip install clearml-agent==0.17.2rc3
kkie.. was checking in the forum (if anyone knows anything) before asking them..