
Reputation
Badges 1
49 × Eureka!We are running workers as bare metal and clearml-server on Kubernetes. I was trying to find, what are those min and max value for above metrics.
What do you mean by how much is reserved ? Are you running with an agent?
Thanks for the discussion 🙂
It helped me too.
It would be great to have possible fields in the given parameters mentioned here: https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all
Any clue how do I figure out those?
I see it now.
` "5451af93e0bf68a4ab09f654b222ccae": {
"1b790a3da2e8d6cd939cf271694fe81b": {
"metric": ":monitor:gpu",
"variant": "gpu_0_utilization",
"value": 0.0,
"min_value": 0.0,
"max_value": 3.542
},
"409d4e6ad9b69b3224fceeac6e265ddc": {
"metric": ":monitor:gpu",
"variant": "gpu_0_mem_used_gb",
"value": 0.0,
...
Exactly. I am trying to create alert for tasks that have GPU/CPU allocated but not utilizing it from x period of time.
So, if task is there, GPU will be allocated to it. I will need to check if task is using it or just idle.
I found system_tags and all the metrics including CPU but can't find any field mentions GPU scalar reported or GPU utilization.
I found a lot of questions from past chat in this group including by you related to k8 glue with clearml.
Do you mean it recently become part of enterprise version?
AgitatedDove14
Phew. Make sense. I am testing it by updating FROM in dockerfile.
Fingers crossed.
I need to use this image in kubernetes
AgitatedDove14 I am upgrade upgrading pip before this. 😕
AgitatedDove14 I found it's the issue with pycryptodome 😕
Error started coming from here. Maybe specific version of it. Digging more.
` #13 101.0 note: This error originates from a subprocess, and is likely not a problem with pip.
#13 101.0 ERROR: Failed building wheel for pycryptodome
#13 101.0 Running setup.py clean for pycryptodome
#13 104.9 Building wheel for numpy (pyproject.toml): started
#13 158.5 Building wheel for numpy (pyproject.toml): finished with status 'er...
How can it be even this kind of issue with Python when one endpoint is giving response and other not.
` # which python
/Users/anuj.tyagi/clearml_api/venv/bin/python
(venv) LMWPRW6F3:clearml_api root# pip freeze | grep clearml
clearml==1.7.2
Traceback (most recent call last):
File "get_all_task.py", line 8, in <module>
print (client.tasks.get_all())
File "/Users/anuj.tyagi/clearml_api/venv/lib/python3.8/site-packages/clearml/backend_api/session/client/client.py", line 422, in get
result=self.session.send(request_cls(*args, **kwargs)),
File "/Users/anuj.tyagi/clearml_api/venv/lib...
` "tags": [],
"system_tags": [
"interactive"
],
"status_changed": "2022-10-13 17:05:22.844000+00:00",
"status_message": "",
"status_reason": "",
"last_worker": "xxx01:!2c1:cpu:10:service:0a750bd8a09b4063a59c96b4370d0815",
"last_worker_report": "2022-10-30 15:23:18.695000+00:00",
"last_update": "2022-10-30 15:23:18.695000+00:00",
"last_change": "2022-10-30 15:23:18.695000+00:00",
"last_iteration": 0,
"last_metrics": {
"29c6dd717a649...
Eg. To query tasks that are both Running --> You mean status=["in_progress"] ?? How do I figure out other possible parameter I can use with status parameter?
Another,
Filter only tasks that start say 10 min ago . Is there any parameter for it also ?
My goal is to detect events when task does not uses allocated resources (e.g. GPU) for some period of time.
I am still trying to understand clearml api response.
Do you have any clue how can I get it from client.tasks.get_all(status=["in_progress"]) ?
If task has GPU allocated but not using it, would it be in in_progress status also? I want to collect those task.
I see task runtime info. I guess it's current utilization not allocation but not sure.
"runtime": {
"progress": "0",...
I see. Dev tools is useful here for finding api endpoints used for the data and
https://github.com/allegroai/clearml/blob/master/clearml/task.py#L987 what I was looking for.
Thanks
I see. It's showing since experiment started.
Worked with Bullseye image. Thanks for the suggestion.
This worked out.
`
from clearml.backend_api.session.client import APIClient
Create an instance of APIClient
client = APIClient()
project_list = client.workers.get_all()
print(project_list) `
I see. I am getting error in html output.
<noscript>Please enable JavaScript to continue using this application.</noscript>
oh! yeah. That worked out. Thanks a lot.