Answered

Hi, I Am Trying To Pull Api Data From /Tasks.Get_All Endpoint

Hi,

I am trying to pull api data from /tasks.get_all endpoint
https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all

I can pull data from client.workers.get_all() so my info in conf file is correct.

With, /tasks.get_all endpoint I am getting error in response.

This is the snippet, I am using:

` from clearml.backend_api.session.client import APIClient
from time import time

Create an instance of APIClient

client = APIClient()

tasks = client.tasks.get_all()
for task in tasks:
print (task.data) `and this is the error, I am getting

Traceback (most recent call last): File "get_all_task.py", line 7, in <module> tasks = client.tasks.get_all() File "/Users/anuj.tyagi/Library/Python/3.8/lib/python/site-packages/clearml/backend_api/session/client/client.py", line 422, in get result=self.session.send(request_cls(*args, **kwargs)), File "/Users/anuj.tyagi/Library/Python/3.8/lib/python/site-packages/clearml/backend_api/session/client/client.py", line 124, in send raise APIError(result, extra_info="Invalid response") clearml.backend_api.session.client.client.APIError: APIError: Invalid response: code 200: OK

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

Votes Newest

Answers 30

It would be great to have possible fields in the given parameters mentioned here: https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all
Any clue how do I figure out those?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

I think the only reason you'll get that is if the returned payload was stripped somehow from the call result

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

same error for tasks.get_all() endpoint

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

Exactly. I am trying to create alert for tasks that have GPU/CPU allocated but not utilizing it from x period of time.
So, if task is there, GPU will be allocated to it. I will need to check if task is using it or just idle.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

I see it now.

"5451af93e0bf68a4ab09f654b222ccae": { "1b790a3da2e8d6cd939cf271694fe81b": { "metric": ":monitor:gpu", "variant": "gpu_0_utilization", "value": 0.0, "min_value": 0.0, "max_value": 3.542 }, "409d4e6ad9b69b3224fceeac6e265ddc": { "metric": ":monitor:gpu", "variant": "gpu_0_mem_used_gb", "value": 0.0, "min_value": 0.0, "max_value": 0.0 }, "74646afee0e0ab18d3cbd08ce1ff6aa3": { "metric": ":monitor:gpu", "variant": "gpu_0_mem_usage", "value": 0.002, "min_value": 0.002, "max_value": 54.739 }, "abdb01e1de566d2165e902fe0839465e": { "metric": ":monitor:gpu", "variant": "gpu_0_mem_free_gb", "value": 47.461, "min_value": 21.482, "max_value": 47.461 }, "db472ace8c40b8a9f3e11ec348920662": { "metric": ":monitor:gpu", "variant": "gpu_0_temperature", "value": 46.0, "min_value": 45.0, "max_value": 59.46 } } },

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

SuccessfulKoala55 Yeah, that's possible but then I don't get any firewall will block only one endpoint response. I tried both workers.get_all() and get_stats(), both worked.
Can you share the snippet you used for tasks.get_all() ?

` from clearml.backend_api.session.client import APIClient
from time import time

Create an instance of APIClient

client = APIClient()

tasks = client.tasks.get_all() `This is what I used.
Doc mentions required request Body parameter type. Do I need to add this as a parameter?
I think something is either wrong with my request or it could be my permissions.
I am testing api with production ClearML server running in our production. It's running fine.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

This section is internal implementation - we can't guarantee it will not be changed. As for unused GPU - in general if you run a task with the agent having the --gpu switch a GPU will be allocated for as long as the task is running. I think the main concern is trying to make sure your task makes the most out of the GPU...?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Yeah the doctring is always the most updated 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

How can it be even this kind of issue with Python when one endpoint is giving response and other not.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

can you try something like:
client.tasks.get_all(status=["in_progress"])

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

DrabCockroach54 , you can set it all up. I suggest you open developer tools (F12) and see how it is done in the UI. You can then implement this in code.

For example to filter tasks that started 10 minutes ago is something you can view via the UI

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

` # which python
/Users/anuj.tyagi/clearml_api/venv/bin/python
(venv) LMWPRW6F3:clearml_api root# pip freeze | grep clearml
clearml==1.7.2

Traceback (most recent call last):
File "get_all_task.py", line 8, in <module>
print (client.tasks.get_all())
File "/Users/anuj.tyagi/clearml_api/venv/lib/python3.8/site-packages/clearml/backend_api/session/client/client.py", line 422, in get
result=self.session.send(request_cls(*args, **kwargs)),
File "/Users/anuj.tyagi/clearml_api/venv/lib/python3.8/site-packages/clearml/backend_api/session/client/client.py", line 124, in send
raise APIError(result, extra_info="Invalid response")
clearml.backend_api.session.client.client.APIError: APIError: Invalid response: code 200: OK `

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

Perhaps due to size? Are you running behind any firewall or any other network component?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I see. Dev tools is useful here for finding api endpoints used for the data and
https://github.com/allegroai/clearml/blob/master/clearml/task.py#L987 what I was looking for.
Thanks

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

query

tasks

that are both Running --> You mean

status=["in_progress"]

Yes!

How do I figure out other possible parameter I can use with

status

parameter?

https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all
https://clear.ml/docs/latest/docs/references/api/definitions#taskstask

Filter only tasks that start say

10 min ago

. Is there any parameter for it also ?

last_update or created then use similar filter to this one:
https://github.com/allegroai/clearml/blob/ff7b174bf162347b82226f413040ff6473401e92/examples/services/cleanup/cleanup_service.py#L70

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

DrabCockroach54 I just tested with both ClearML SDK 1.7.1 and 1.7.2 and both returned a valid response to client.tasks.get_all() when running against the free-hosted app.clear.ml

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

DrabCockroach54 that is quite cool!
Basically here is what I would do
Query Tasks that are both Running and Do not have system tag "development" (that means running on agents) + filter only tasks that start say 10 min ago Go over the list and see if (1) they have GPU scalar reported (meaning GPU is accessible) (2) min/max/val of GPU utilization is under 5%wdyt?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

How do I know what are possible options for status? Same for other parameters.
I don't see those in documentation.
https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

Is it your own server installation or are you using the SaaS?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

I found system_tags and all the metrics including CPU but can't find any field mentions GPU scalar reported or GPU utilization.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

This would be a good example?
https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

oh! yeah. That worked out. Thanks a lot.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

You should have metric :monitor:gpu variant gpu_0_utilization
Since I see you have none of those, that points to no GPU driver ...
Could that be ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

That's exactly what I did... I was thinking more in terms of the size of the response body and not the different endpoint

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I am running virtual env too now

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

When in table view (rows) there is a small icon next to the 'Started' column. There you can configure time periods you'd like to view 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Eg. To query tasks that are both Running --> You mean status=["in_progress"] ?? How do I figure out other possible parameter I can use with status parameter?

Another,
Filter only tasks that start say 10 min ago . Is there any parameter for it also ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

My goal is to detect events when task does not uses allocated resources (e.g. GPU) for some period of time.
I am still trying to understand clearml api response.

Do you have any clue how can I get it from client.tasks.get_all(status=["in_progress"]) ?
If task has GPU allocated but not using it, would it be in in_progress status also? I want to collect those task.

I see task runtime info. I guess it's current utilization not allocation but not sure.

"runtime": {
"progress": "0",
"platform": "linux",
"python_version": "3.8.0",
"python_exec": "/root/.clearml/venvs-builds/3.8/bin/python",
"OS": "Linux-5.15.0-1013-gcp-x86_64-with-glibc2.27",
"processor": "x86_64",
"cpu_cores": 256,
"memory_gb": 1007.7,
"hostname": "",
"gpu_count": 1,
"gpu_type": "NVIDIA xxx -40GB",

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

"tags": [], "system_tags": [ "interactive" ], "status_changed": "2022-10-13 17:05:22.844000+00:00", "status_message": "", "status_reason": "", "last_worker": "xxx01:!2c1:cpu:10:service:0a750bd8a09b4063a59c96b4370d0815", "last_worker_report": "2022-10-30 15:23:18.695000+00:00", "last_update": "2022-10-30 15:23:18.695000+00:00", "last_change": "2022-10-30 15:23:18.695000+00:00", "last_iteration": 0, "last_metrics": { "29c6dd717a649f7c1835bfa9249b3142": { "028d9091618657f296222d768c3dd9b8": { "metric": ":monitor:machine", "variant": "network_rx_mbs", "value": 1.691, "min_value": -23.836, "max_value": 301.954 }, "1a760266c35f86529f9c669d539a2297": { "metric": ":monitor:machine", "variant": "io_read_mbs", "value": 0.201, "min_value": 0.0, "max_value": 919.899 }, "22db6a87b76b02b50d0a8c54879484ce": { "metric": ":monitor:machine", "variant": "io_write_mbs", "value": 1.312, "min_value": 0.279, "max_value": 2098.717 }, "3964adf302d5c935e9a2451b45bd53a5": { "metric": ":monitor:machine", "variant": "memory_free_gb", "value": 911.466, "min_value": 656.194, "max_value": 943.75 }, "5385df90d0d0ad8955159a5307d34b38": { "metric": ":monitor:machine", "variant": "cpu_usage", "value": 41.059, "min_value": 3.538, "max_value": 93.25 }, "5d2e34a3c7e733e0549fa6d9c9666ce3": { "metric": ":monitor:machine", "variant": "network_tx_mbs", "value": 1.741, "min_value": -334.512, "max_value": 291.802 }, "7e44abd211aa00a7c3bf5090fb33df90": { "metric": ":monitor:machine", "variant": "memory_used_gb", "value": 1.204, "min_value": 0.143, "max_value": 1.205 }, "f4f4fd050d744fb78fc0bb7b5a2a9f99": { "metric": ":monitor:machine", "variant": "disk_free_percent", "value": 44.5, "min_value": 42.6, "max_value": 51.5 } } }, "hyperparams": { "interactive_session": { "user_base_directory": { "section": "interactive_session", "name": "user_base_directory", "value": "~/", "type": "str" }, "ssh_server": { "section": "interactive_session", "name": "ssh_server", "value": "True", "type": "bool" }, "default_docker": { "section": "interactive_session", "name": "default_docker", "value": " ", "type": "str" }, "jupyterlab": { "section": "interactive_session", "name": "jupyterlab", "value": "True", "type": "bool" }, "vscode_server": { "section": "interactive_session", "name": "vscode_server", "value": "True", "type": "bool" }, "public_ip": { "section": "interactive_session", "name": "public_ip", "value": "False", "type": "bool" }, "ssh_ports": { "section": "interactive_session", "name": "ssh_ports", "value": "", "type": "str" }, "vscode_version": { "section": "interactive_session", "name": "vscode_version", "value": "", "type": "str" } }, "properties": { "external_address": { "section": "properties", "name": "external_address", "value": "" }, "internal_ssh_port": { "section": "properties", "name": "internal_ssh_port", "value": "" }, "jupyter_port": { "section": "properties", "name": "jupyter_port", "value": "" }, "internal_stable_ssh_port": { "section": "properties", "name": "internal_stable_ssh_port", "value": "" }, "vscode_port": { "section": "properties", "name": "vscode_port", "value": "" } } },

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabDolphin54
				
					0
					 × 1

Write your answer

1K Views

30 Answers

2 years ago