Hi AgitatedDove14 , Thanks for the explanation .python k8s_glue_example.py --queue high_priority_q --ports-mode --num-of-services 10 python k8s_glue_example.py --queue low_priority_q --ports-mode --num-of-services 2
Would the above be a good way to simulate the below ?clearml-agent daemon --queue high_priority_q low_priority_q
Hi AgitatedDove14
I am still not very clear on using this, even after looking at k8s_glue_example.py 's code
Is it possible to give a sample usage of how this works ?python k8s_glue_example.py --ports-mode --num-of-services
Another question, I am still not sure , how this resolves my original question.
https://github.com/allegroai/clearml-agent/issues/50#issuecomment-811554045
How will imposing an instance limit , prevent or allow --order-fairness feature for example, which ex...
yup, i updated this in my local clearml.conf... Or should be updating this elsewhere as well
Yeah, currently we are evaulating Seldon.. But was wondering whether clearml enterprise version wud do something similar ?
i ran this in my local machine..clearml-task --project playground --name tensorboard_toy --script tensorboard_toy.py --requirements requirements.txt --queue myqueue
nice.. this looks a bit friendly.. π .. Let me try it.. Thanks
I just downloaded the logs from the Failed task. Seem I have set the agent.package_manager.system_site_packages: true
in the agent as well.
This is my example. Iteration 10 so there are 10 runs. Looking at the 4th run, 60% of the jobs, 91% iteration, 94% time.. What does it mean ?
Is this some sort of polling ?
End of the day, we are just worried whether this will hog resources compared to a web-hook ? Any ideas
` Could not load dynamic library 'libcupti.so.11.0'; dlerror: libcupti.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-03-11 09:11:17.368793: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcupti.so'; dlerror: libcupti.so: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-03-11 09...
Hi guys,
I filled up the default_output_ur in the conf file, but it doesnt get reflected in the clearml ui.
Disclaimer : Clearml is setup as a k8s pod using the Helm chartssdk { development { # Default Task output_uri. if output_uri is not provided to Task.init, default_output_uri will be used instead. default_output_uri: "
" } }
let me run the clearml-agent outside the k8 system.. and get back to u
what does a control plane do ? I cant understand this..
Like the serving engine, will get the user input, preprocess, infer it and send back the results..
Hi SuccessfulKoala55 , kkie..
1)Actually, now i am using AWS. I am trying to set up Clearml server in K8. However, clearml-agents will be just another ec2-instance/docker image.
2) For phase 2, I will try Clearml AWS AutoScaler Service.
3) At this point, I think I will have a crack at JuicyFox94 's solution as well.
Hi, for the values.yaml, is there some reference for it esp so , if we assign more Memory to webserver service etc. I tried googling around but so far no luck
Maybe more of data repository than a model repository...
Using clearml-task, I am able to pass in the exact requirements.txt file, I am not sure how we can accomplish that when you using the Python train_it.py and execute_remotely() option.
AgitatedDove14
So now you donβt have any failures but gpu usage issue?
I didnt run the hyper_parameter_optimzer.py, as I was thinking if there is already a problem with the base, no use with running the series of experiments
How about running the ClearML agent in docker mode?
Prev, we had our clearml-agent run in the bare-metal machine instead in docker formation. There wasnt any issue.. Though I havent tried with 0.17.2 version
Mostly DL, but I suppose there could be ML use cases also
Yeah, that worked.. As I was the running the agent in a different machine as our deployment of clearml was in k8s.
The above screenshot is from my local settings... My agents run in the k8s system (like in a pod)
Hi AgitatedDove14 , Attached my create version compared to init version..
When I enqueue both the init and create version into my clearmlQueue, it seems the create version doesnt execute at all.
It just mentions "2021-05-26 16:02:13,053 - clearml - WARNING - Terminating local execution process" and says it has completed successfully.
We have k8s on ec2 instances in the cloud. I'll try it there 2morrow and report back..