
Reputation
Badges 1
31 × Eureka!@<1523701087100473344:profile|SuccessfulKoala55> Thanks for getting back to me. My image contains clearml-agent==1.9.1
. There is a recent release to 1.9.2
and now on every run the agent installs this newer version thanks to the -U
flag which is being passed. From the docs it looks like there may be a way to prevent this upgrade but it's not clear to me exactly how to do this. Is it possible?
If a Task is in the 'Completed' I think the only option is to 'Reset' it (see image). You do clear the previous run execution but I think for a repetitive task this is fine.
Maybe this should only be the case if it is in a 'Completed' state rather than 'Failed'. I can see that in this case you would not want to clear the execution because you would want to see why it Failed. Thoughts?
I am using ClearML version 1.9.1. In code, I am creating a plot using matplotlib. I am able to see this in Tensorboard but it is not available in ClearML Plots
Thanks. I am trying to completely minimise the start up time. Given I am using a docker image which has clearml-agent
and pip
installed, is there a way I can skip the installation of this when a task starts up using the daemon?
I’ve had some issues with clearml sessions. I’d be interested in seeing a PR. Would you mind posting a link please?
Furthermore, when using APIClient()
, users
is not a valid endpoint at all.
class APIClient(object):
auth = None # type: Any
queues = None # type: Any
tasks = None # type: Any
workers = None # type: Any
events = None # type: Any
models = None # type: Any
projects = None # type: Any
This is taken from clearml/backend_api/session/client/client.py
Further to this, I have inspected further. This is working as expected for ClearML 1.8.3 but not for ClearML 1.9.0.
I looked at the commits and found that a change had been made to the _decode_image
method:
This aligns with the error message I'm seeing:
2023-02-08 15:17:25,539 - clearml - WARNING - Error: I/O operation on closed file.
Can this be actioned for the next release plea...
I am having the same error since yesterday on Ubuntu. Works fine on Mac.
I cannot ping api.clear.ml
👍 thanks for clearing that up @<1523701087100473344:profile|SuccessfulKoala55>
@<1523701070390366208:profile|CostlyOstrich36> Thank you. Which docker image do you use with this machine image?
This is not working. Please see None which details the problem
I believe this was an example report I made for a demo and I've since deleted the tasks which generated it 👍
Thanks Jake. Do you know how I set the GPU count to 0?
I cannot ping api.clear.ml on Ubuntu. Works fine on Mac though.
Given that nvidia-smi
is working you may have already done that. In this case depending on your ubuntu version you may have another problem. ubuntu 22+ has this issue which has workaround. This also caught me out...
It's not immediately obvious from the GCP documentation and you don't need to do this on AWS or Azure so it can catch you out. For what it's worth, the image I used originally was from the same family Marko has referenced above.
I run using the GCP Autoscaler successfully for GPU. Have you included this line in the init-script of the autoscaler? This was a gotcha for me...
/opt/deeplearning/install-driver.sh
Solved for me as well now.
No particular reason. This was our first time trying it and it seemed the quickest way to get off the ground. When I try without I have a similar error trying to connect although that could be due to the instance.
Hi, we encountered this a while ago. In our case, there is an issue with running docker containers with gpu on ubuntu22.04.
See this issue for more info:
@<1537605940121964544:profile|EnthusiasticShrimp49> How do I specify to not attach a gpu? I thought ticking 'Run in CPU Mode' would be sufficient. Is there something else I'm missing?
Apologies for the delay.
I have obfuscated the private information with XXX
. Let me know if you think any of it is relevant.
{"gcp_project_id":"XXX","gcp_zone":"XXX","subnetwork":"XXX","gcp_credentials":"{\n \"type\": \"service_account\",\n \"project_id\": \"XXX\",\n \"private_key_id\": \"XXX\",\n \"private_key\": \"XXX\",\n \"client_id\": \"XXX\",\n \"auth_uri\": \"XXX\",\n \"token_uri\": \"XXX\",\n \"auth_provider_x509_cert_url\": \"XXX\",\n \"client_x509_cert_url\": \"...