Reputation
Badges 1
31 × Eureka!Furthermore, when using APIClient() , users is not a valid endpoint at all.
class APIClient(object):
auth = None # type: Any
queues = None # type: Any
tasks = None # type: Any
workers = None # type: Any
events = None # type: Any
models = None # type: Any
projects = None # type: Any
This is taken from clearml/backend_api/session/client/client.py
Given that nvidia-smi is working you may have already done that. In this case depending on your ubuntu version you may have another problem. ubuntu 22+ has this issue which has workaround. This also caught me out...
@<1523701087100473344:profile|SuccessfulKoala55> Just following up as I figured out what was happening here and could be useful for the future.
The prefilled value for Number of GPUs in the GCP Autoscaler is 1 .
When one ticks Run in CPU mode (no gpus) it hides the GPU Type and Number of GPUs fields. However, the value which was these fields are still submitted in the API Request (I'm guessing here) when the Autoscaler is launched.
Hence, to get past this, you need to...
If a Task is in the 'Completed' I think the only option is to 'Reset' it (see image). You do clear the previous run execution but I think for a repetitive task this is fine.
Maybe this should only be the case if it is in a 'Completed' state rather than 'Failed'. I can see that in this case you would not want to clear the execution because you would want to see why it Failed. Thoughts?
This is not working. Please see None which details the problem
Hi, we encountered this a while ago. In our case, there is an issue with running docker containers with gpu on ubuntu22.04.
See this issue for more info:
Is there a way I can do this with the python APIClient or even with the requests library?
@<1673863823901069312:profile|BraveToad81>
Apologies for the delay.
I have obfuscated the private information with XXX . Let me know if you think any of it is relevant.
{"gcp_project_id":"XXX","gcp_zone":"XXX","subnetwork":"XXX","gcp_credentials":"{\n \"type\": \"service_account\",\n \"project_id\": \"XXX\",\n \"private_key_id\": \"XXX\",\n \"private_key\": \"XXX\",\n \"client_id\": \"XXX\",\n \"auth_uri\": \"XXX\",\n \"token_uri\": \"XXX\",\n \"auth_provider_x509_cert_url\": \"XXX\",\n \"client_x509_cert_url\": \"...
Is there documentation for this as I was not able to figure this out unfortunately.
@<1537605940121964544:profile|EnthusiasticShrimp49> How do I specify to not attach a gpu? I thought ticking 'Run in CPU Mode' would be sufficient. Is there something else I'm missing?
I don't think there's really a way around this because AWS Lambda doesn't allow for multiprocessing.
Instead, I've resorted to using a clearml Scheduler which runs on a t3.micro instance for jobs which I want to run on a cron.
Thanks Jake. Do you know how I set the GPU count to 0?
I am having the same error since yesterday on Ubuntu. Works fine on Mac.
I cannot ping api.clear.ml
I am using ClearML version 1.9.1. In code, I am creating a plot using matplotlib. I am able to see this in Tensorboard but it is not available in ClearML Plots
$ curl -H "Authorization: Bearer <TOKEN>" -X GET
{"meta":{"id":"ed6c52d030f240a89f001b447ee64a6b","trx":"ed6c52d030f240a89f001b447ee64a6b","endpoint":{"name":"debug.ping","requested_version":"2.26","actual_version":"1.0"},"result_code":200,"result_subcode":0,"result_msg":"OK","error_stack":null,"error_data":{},"alarms":{}},"data":{"msg":"Hello World"}}%
$ curl -H "Authoriz...
