Reputation
Badges 1
121 × Eureka!kkie..now I get it.. I set up the clearml-agent on an EC2 instance. and it works now.
Thanks
Could it be another applications's "elasticsearch-pv" and not clearml's
It'll be good if there was yaml file to deploy clearml-agents into the k8 system.
However, I am able to get it to work, if I launch a clearml-agent outside the kubernetes ecosystem.
When I push a job to an agent node, i got this error.
"Error response from daemon: network None not found"
Hi, sorry for the delayed response. Btw, all the pods are running all good.
This is where I downloaed the log. Seems like some docker issue, though i cant seem to figure it out. As an alternative, I spawned a clearml-agent outside the k8 environment and it was able to execute well.
Hi, will proceed to close this thread. We found some issue with the underlying docker in our machines. We've have not shifted to another k8 of ec2 instances in AWS.
Btw, this is just the example code from clearml repo..
I just had to set up the clearml-agent on my machine. Closing this issue.
Hmm, unfortutenly it is still pending as in nothing is running
Currently, in the diagram here.. Clearml File server is shown as a local storage drive. Our 2 primary concerns.
Is there any ways , we can scale this file server when our data volume explodes. Maybe it wouldnt be an issue in the K8s environment anyways. Or can it also be configured such that all data is stored in the hdfs (which helps with scalablity). Is there any security to protect this data in this storage ?
Is there any documentation on how, we can use this ports mode ? I didnt seem to find any.. Tks
Hi AgitatedDove14 , Now we prefer to run dynamic agents instead usingpython3 k8s_glue_example.py
In this case, is it still possible to pass --order-fairness at the queue level or this is more of a Enterprise edition feature.
AgitatedDove14 I am confused now.. Isnt this feature not available in the k8 glue ? Or is it going to be implemented ?
Hi AgitatedDove14 , Just your reply on https://github.com/allegroai/clearml-agent/issues/50#issuecomment-811554045Basically as jobs are pulled by order, they are pushed into the k8s, then if we hit the max k8s instance limit, we stop pulling jobs until a k8s job is completed, then we continue.
For this scenario,
k8s has an instance limit of 10 (let's say)
I run Optimization (it has about 100 jobs) but only the first 10 will be pulled in k8. After this, I run a single Deep Learning (DL)...
Hi AgitatedDove14 , Thanks for the explanation .python k8s_glue_example.py --queue high_priority_q --ports-mode --num-of-services 10 python k8s_glue_example.py --queue low_priority_q --ports-mode --num-of-services 2
Would the above be a good way to simulate the below ?clearml-agent daemon --queue high_priority_q low_priority_q
Hi AgitatedDove14
I am still not very clear on using this, even after looking at k8s_glue_example.py 's code
Is it possible to give a sample usage of how this works ?python k8s_glue_example.py --ports-mode --num-of-services
Another question, I am still not sure , how this resolves my original question.
https://github.com/allegroai/clearml-agent/issues/50#issuecomment-811554045
How will imposing an instance limit , prevent or allow --order-fairness feature for example, which ex...
yup, i updated this in my local clearml.conf... Or should be updating this elsewhere as well
Yeah, currently we are evaulating Seldon.. But was wondering whether clearml enterprise version wud do something similar ?
i ran this in my local machine..clearml-task --project playground --name tensorboard_toy --script tensorboard_toy.py --requirements requirements.txt --queue myqueue