Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Am Trying To Pull Api Data From /Tasks.Get_All Endpoint

Hi,

I am trying to pull api data from /tasks.get_all endpoint
https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all

I can pull data from client.workers.get_all() so my info in conf file is correct.

With, /tasks.get_all endpoint I am getting error in response.

This is the snippet, I am using:

` from clearml.backend_api.session.client import APIClient
from time import time

Create an instance of APIClient

client = APIClient()

tasks = client.tasks.get_all()
for task in tasks:
print (task.data) `and this is the error, I am getting

Traceback (most recent call last): File "get_all_task.py", line 7, in <module> tasks = client.tasks.get_all() File "/Users/anuj.tyagi/Library/Python/3.8/lib/python/site-packages/clearml/backend_api/session/client/client.py", line 422, in get result=self.session.send(request_cls(*args, **kwargs)), File "/Users/anuj.tyagi/Library/Python/3.8/lib/python/site-packages/clearml/backend_api/session/client/client.py", line 124, in send raise APIError(result, extra_info="Invalid response") clearml.backend_api.session.client.client.APIError: APIError: Invalid response: code 200: OK

  
  
Posted 2 years ago
Votes Newest

Answers 30


This section is internal implementation - we can't guarantee it will not be changed. As for unused GPU - in general if you run a task with the agent having the --gpu switch a GPU will be allocated for as long as the task is running. I think the main concern is trying to make sure your task makes the most out of the GPU...?

  
  
Posted 2 years ago

Is it your own server installation or are you using the SaaS?

  
  
Posted 2 years ago

same error for tasks.get_all() endpoint

  
  
Posted 2 years ago

DrabCockroach54 I just tested with both ClearML SDK 1.7.1 and 1.7.2 and both returned a valid response to client.tasks.get_all() when running against the free-hosted app.clear.ml

  
  
Posted 2 years ago

I think the only reason you'll get that is if the returned payload was stripped somehow from the call result

  
  
Posted 2 years ago

How do I know what are possible options for status? Same for other parameters.
I don't see those in documentation.
https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all

  
  
Posted 2 years ago

Perhaps due to size? Are you running behind any firewall or any other network component?

  
  
Posted 2 years ago

can you try something like:
client.tasks.get_all(status=["in_progress"])

  
  
Posted 2 years ago

That's exactly what I did... I was thinking more in terms of the size of the response body and not the different endpoint

  
  
Posted 2 years ago

oh! yeah. That worked out. Thanks a lot.

  
  
Posted 2 years ago

ok

  
  
Posted 2 years ago

I found system_tags and all the metrics including CPU but can't find any field mentions GPU scalar reported or GPU utilization.

  
  
Posted 2 years ago

It would be great to have possible fields in the given parameters mentioned here: https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all
Any clue how do I figure out those?

  
  
Posted 2 years ago

When in table view (rows) there is a small icon next to the 'Started' column. There you can configure time periods you'd like to view 🙂

  
  
Posted 2 years ago

How can it be even this kind of issue with Python when one endpoint is giving response and other not.

  
  
Posted 2 years ago

Exactly. I am trying to create alert for tasks that have GPU/CPU allocated but not utilizing it from x period of time.
So, if task is there, GPU will be allocated to it. I will need to check if task is using it or just idle.

  
  
Posted 2 years ago

Eg. To query tasks that are both Running --> You mean status=["in_progress"] ?? How do I figure out other possible parameter I can use with status parameter?

Another,
Filter only tasks that start say 10 min ago . Is there any parameter for it also ?

  
  
Posted 2 years ago

` # which python
/Users/anuj.tyagi/clearml_api/venv/bin/python
(venv) LMWPRW6F3:clearml_api root# pip freeze | grep clearml
clearml==1.7.2

Traceback (most recent call last):
File "get_all_task.py", line 8, in <module>
print (client.tasks.get_all())
File "/Users/anuj.tyagi/clearml_api/venv/lib/python3.8/site-packages/clearml/backend_api/session/client/client.py", line 422, in get
result=self.session.send(request_cls(*args, **kwargs)),
File "/Users/anuj.tyagi/clearml_api/venv/lib/python3.8/site-packages/clearml/backend_api/session/client/client.py", line 124, in send
raise APIError(result, extra_info="Invalid response")
clearml.backend_api.session.client.client.APIError: APIError: Invalid response: code 200: OK `

  
  
Posted 2 years ago

I am running virtual env too now

  
  
Posted 2 years ago

I see. Dev tools is useful here for finding api endpoints used for the data and
https://github.com/allegroai/clearml/blob/master/clearml/task.py#L987 what I was looking for.
Thanks

  
  
Posted 2 years ago

My goal is to detect events when task does not uses allocated resources (e.g. GPU) for some period of time.
I am still trying to understand clearml api response.

Do you have any clue how can I get it from client.tasks.get_all(status=["in_progress"]) ?
If task has GPU allocated but not using it, would it be in in_progress status also? I want to collect those task.

I see task runtime info. I guess it's current utilization not allocation but not sure.

"runtime": {
"progress": "0",
"platform": "linux",
"python_version": "3.8.0",
"python_exec": "/root/.clearml/venvs-builds/3.8/bin/python",
"OS": "Linux-5.15.0-1013-gcp-x86_64-with-glibc2.27",
"processor": "x86_64",
"cpu_cores": 256,
"memory_gb": 1007.7,
"hostname": "",
"gpu_count": 1,
"gpu_type": "NVIDIA xxx -40GB",

  
  
Posted 2 years ago

SuccessfulKoala55 Yeah, that's possible but then I don't get any firewall will block only one endpoint response. I tried both workers.get_all() and get_stats(), both worked.
Can you share the snippet you used for tasks.get_all() ?

` from clearml.backend_api.session.client import APIClient
from time import time

Create an instance of APIClient

client = APIClient()

tasks = client.tasks.get_all() `This is what I used.
Doc mentions required request Body parameter type. Do I need to add this as a parameter?
I think something is either wrong with my request or it could be my permissions.
I am testing api with production ClearML server running in our production. It's running fine.

  
  
Posted 2 years ago

I see it now.

"5451af93e0bf68a4ab09f654b222ccae": { "1b790a3da2e8d6cd939cf271694fe81b": { "metric": ":monitor:gpu", "variant": "gpu_0_utilization", "value": 0.0, "min_value": 0.0, "max_value": 3.542 }, "409d4e6ad9b69b3224fceeac6e265ddc": { "metric": ":monitor:gpu", "variant": "gpu_0_mem_used_gb", "value": 0.0, "min_value": 0.0, "max_value": 0.0 }, "74646afee0e0ab18d3cbd08ce1ff6aa3": { "metric": ":monitor:gpu", "variant": "gpu_0_mem_usage", "value": 0.002, "min_value": 0.002, "max_value": 54.739 }, "abdb01e1de566d2165e902fe0839465e": { "metric": ":monitor:gpu", "variant": "gpu_0_mem_free_gb", "value": 47.461, "min_value": 21.482, "max_value": 47.461 }, "db472ace8c40b8a9f3e11ec348920662": { "metric": ":monitor:gpu", "variant": "gpu_0_temperature", "value": 46.0, "min_value": 45.0, "max_value": 59.46 } } },

  
  
Posted 2 years ago

query

tasks

that are both Running --> You mean

status=["in_progress"]

Yes!

How do I figure out other possible parameter I can use with

status

parameter?

https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all
https://clear.ml/docs/latest/docs/references/api/definitions#taskstask

Filter only tasks that start say

10 min ago

. Is there any parameter for it also ?

last_update or created then use similar filter to this one:
https://github.com/allegroai/clearml/blob/ff7b174bf162347b82226f413040ff6473401e92/examples/services/cleanup/cleanup_service.py#L70

  
  
Posted 2 years ago

DrabCockroach54 , you can set it all up. I suggest you open developer tools (F12) and see how it is done in the UI. You can then implement this in code.

For example to filter tasks that started 10 minutes ago is something you can view via the UI

  
  
Posted 2 years ago

Yeah the doctring is always the most updated 🙂

  
  
Posted 2 years ago

DrabCockroach54 that is quite cool!
Basically here is what I would do
Query Tasks that are both Running and Do not have system tag "development" (that means running on agents) + filter only tasks that start say 10 min ago Go over the list and see if (1) they have GPU scalar reported (meaning GPU is accessible) (2) min/max/val of GPU utilization is under 5%wdyt?

  
  
Posted 2 years ago

"tags": [], "system_tags": [ "interactive" ], "status_changed": "2022-10-13 17:05:22.844000+00:00", "status_message": "", "status_reason": "", "last_worker": "xxx01:!2c1:cpu:10:service:0a750bd8a09b4063a59c96b4370d0815", "last_worker_report": "2022-10-30 15:23:18.695000+00:00", "last_update": "2022-10-30 15:23:18.695000+00:00", "last_change": "2022-10-30 15:23:18.695000+00:00", "last_iteration": 0, "last_metrics": { "29c6dd717a649f7c1835bfa9249b3142": { "028d9091618657f296222d768c3dd9b8": { "metric": ":monitor:machine", "variant": "network_rx_mbs", "value": 1.691, "min_value": -23.836, "max_value": 301.954 }, "1a760266c35f86529f9c669d539a2297": { "metric": ":monitor:machine", "variant": "io_read_mbs", "value": 0.201, "min_value": 0.0, "max_value": 919.899 }, "22db6a87b76b02b50d0a8c54879484ce": { "metric": ":monitor:machine", "variant": "io_write_mbs", "value": 1.312, "min_value": 0.279, "max_value": 2098.717 }, "3964adf302d5c935e9a2451b45bd53a5": { "metric": ":monitor:machine", "variant": "memory_free_gb", "value": 911.466, "min_value": 656.194, "max_value": 943.75 }, "5385df90d0d0ad8955159a5307d34b38": { "metric": ":monitor:machine", "variant": "cpu_usage", "value": 41.059, "min_value": 3.538, "max_value": 93.25 }, "5d2e34a3c7e733e0549fa6d9c9666ce3": { "metric": ":monitor:machine", "variant": "network_tx_mbs", "value": 1.741, "min_value": -334.512, "max_value": 291.802 }, "7e44abd211aa00a7c3bf5090fb33df90": { "metric": ":monitor:machine", "variant": "memory_used_gb", "value": 1.204, "min_value": 0.143, "max_value": 1.205 }, "f4f4fd050d744fb78fc0bb7b5a2a9f99": { "metric": ":monitor:machine", "variant": "disk_free_percent", "value": 44.5, "min_value": 42.6, "max_value": 51.5 } } }, "hyperparams": { "interactive_session": { "user_base_directory": { "section": "interactive_session", "name": "user_base_directory", "value": "~/", "type": "str" }, "ssh_server": { "section": "interactive_session", "name": "ssh_server", "value": "True", "type": "bool" }, "default_docker": { "section": "interactive_session", "name": "default_docker", "value": " ", "type": "str" }, "jupyterlab": { "section": "interactive_session", "name": "jupyterlab", "value": "True", "type": "bool" }, "vscode_server": { "section": "interactive_session", "name": "vscode_server", "value": "True", "type": "bool" }, "public_ip": { "section": "interactive_session", "name": "public_ip", "value": "False", "type": "bool" }, "ssh_ports": { "section": "interactive_session", "name": "ssh_ports", "value": "", "type": "str" }, "vscode_version": { "section": "interactive_session", "name": "vscode_version", "value": "", "type": "str" } }, "properties": { "external_address": { "section": "properties", "name": "external_address", "value": "" }, "internal_ssh_port": { "section": "properties", "name": "internal_ssh_port", "value": "" }, "jupyter_port": { "section": "properties", "name": "jupyter_port", "value": "" }, "internal_stable_ssh_port": { "section": "properties", "name": "internal_stable_ssh_port", "value": "" }, "vscode_port": { "section": "properties", "name": "vscode_port", "value": "" } } },

  
  
Posted 2 years ago

You should have metric :monitor:gpu variant gpu_0_utilization
Since I see you have none of those, that points to no GPU driver ...
Could that be ?

  
  
Posted 2 years ago
999 Views
30 Answers
2 years ago
one year ago
Tags