Reputation
Badges 1
29 × Eureka!I have managed to connect. Our EC2 instances run in a private subnet so the ssh connection was not working for that reason I believe. Once I connected to my VPN it now worked.
@<1523701070390366208:profile|CostlyOstrich36> Thank you. Which docker image do you use with this machine image?
Further to this, I have inspected further. This is working as expected for ClearML 1.8.3 but not for ClearML 1.9.0.
I looked at the commits and found that a change had been made to the _decode_image
method:
This aligns with the error message I'm seeing:
2023-02-08 15:17:25,539 - clearml - WARNING - Error: I/O operation on closed file.
Can this be actioned for the next release plea...
👍 Thanks for getting back to me.
Another issue I found was that I could only use vpc subnets from the google project I am launching the VMs in.
I cannot use shared vpc subnets from another project. This would be a useful feature to implement as GCP recommends segmenting the cloud estate so that the vpc and VMs are in different projects.
I am having the same error since yesterday on Ubuntu. Works fine on Mac.
I cannot ping api.clear.ml
Apologies for the delay.
I have obfuscated the private information with XXX
. Let me know if you think any of it is relevant.
{"gcp_project_id":"XXX","gcp_zone":"XXX","subnetwork":"XXX","gcp_credentials":"{\n \"type\": \"service_account\",\n \"project_id\": \"XXX\",\n \"private_key_id\": \"XXX\",\n \"private_key\": \"XXX\",\n \"client_id\": \"XXX\",\n \"auth_uri\": \"XXX\",\n \"token_uri\": \"XXX\",\n \"auth_provider_x509_cert_url\": \"XXX\",\n \"client_x509_cert_url\": \"...
Hi,
I've managed to fix it.
Basically, I had a tracker running on our queues to ensure that none of them were lagging. This was using get_next_task
from APIClient().queues
.
If you call get_next_task
it removes the task from the queue but does not put it into another state. I think because typically get_next_task
is immediately followed by something to make the task run in the daemon or delete it.
Hence you end up in this weird state were the task thinks its queued bec...
I’ve had some issues with clearml sessions. I’d be interested in seeing a PR. Would you mind posting a link please?
👍 thanks for clearing that up @<1523701087100473344:profile|SuccessfulKoala55>
Yep that's correct. If I have a task which runs every 5 minutes, I don't want a new task every 5 minutes as that will create a lot of tasks over a day. It would be better if I had just one task.