
Reputation
Badges 1
31 × Eureka!No particular reason. This was our first time trying it and it seemed the quickest way to get off the ground. When I try without I have a similar error trying to connect although that could be due to the instance.
Further to this, I have inspected further. This is working as expected for ClearML 1.8.3 but not for ClearML 1.9.0.
I looked at the commits and found that a change had been made to the _decode_image
method:
This aligns with the error message I'm seeing:
2023-02-08 15:17:25,539 - clearml - WARNING - Error: I/O operation on closed file.
Can this be actioned for the next release plea...
Solved for me as well now.
The code is quite nested by I've tried to extract out the important parts ( summmary_writer
is a tensorboard logger).
self.figure, (ax1, ax2, axc) = plt.subplots(1, 3, figsize=(total_width, total_height), facecolor="white")
self.summary_writer = self.tb_logger.experiment
self.summary_writer.add_figure(Partition.TRAINING.value, train_plot.figure, global_step=self.current_epoch + 1)
The train_plot.figure
is a matplotlib figure created using seaborn.
Let me know if this...
Ah, didn’t know that. Yes in that case that would work 👍
I ran again without the debug mode option and got this error:
>
> Starting Task Execution:
>
>
> Traceback (most recent call last):
> File "/root/.clearml/venvs-builds/3.6/code/interactive_session.py", line 377, in <module>
> from tcp_proxy import TcpProxy
> ModuleNotFoundError: No module named 'tcp_proxy'
>
> Process failed, exit code 1
If a Task is in the 'Completed' I think the only option is to 'Reset' it (see image). You do clear the previous run execution but I think for a repetitive task this is fine.
Maybe this should only be the case if it is in a 'Completed' state rather than 'Failed'. I can see that in this case you would not want to clear the execution because you would want to see why it Failed. Thoughts?
I am using ClearML version 1.9.1. In code, I am creating a plot using matplotlib. I am able to see this in Tensorboard but it is not available in ClearML Plots
Here it is:
I have managed to connect. Our EC2 instances run in a private subnet so the ssh connection was not working for that reason I believe. Once I connected to my VPN it now worked.
$ curl -H "Authorization: Bearer <TOKEN>" -X GET
{"meta":{"id":"ed6c52d030f240a89f001b447ee64a6b","trx":"ed6c52d030f240a89f001b447ee64a6b","endpoint":{"name":"debug.ping","requested_version":"2.26","actual_version":"1.0"},"result_code":200,"result_subcode":0,"result_msg":"OK","error_stack":null,"error_data":{},"alarms":{}},"data":{"msg":"Hello World"}}%
$ curl -H "Authoriz...
Is there documentation for this as I was not able to figure this out unfortunately.
Yep that's correct. If I have a task which runs every 5 minutes, I don't want a new task every 5 minutes as that will create a lot of tasks over a day. It would be better if I had just one task.
This is not working. Please see None which details the problem
Apologies for the delay.
I have obfuscated the private information with XXX
. Let me know if you think any of it is relevant.
{"gcp_project_id":"XXX","gcp_zone":"XXX","subnetwork":"XXX","gcp_credentials":"{\n \"type\": \"service_account\",\n \"project_id\": \"XXX\",\n \"private_key_id\": \"XXX\",\n \"private_key\": \"XXX\",\n \"client_id\": \"XXX\",\n \"auth_uri\": \"XXX\",\n \"token_uri\": \"XXX\",\n \"auth_provider_x509_cert_url\": \"XXX\",\n \"client_x509_cert_url\": \"...
@<1537605940121964544:profile|EnthusiasticShrimp49> How do I specify to not attach a gpu? I thought ticking 'Run in CPU Mode' would be sufficient. Is there something else I'm missing?
Thanks Jake. Do you know how I set the GPU count to 0?
This is something you can do in the GCP console, one would imagine it can be done using their python library.
I think the limitation is that you can only pass a relative subnet path in the GCP Autoscaler console. Then, by the looks of the error message, the ClearML Autoscaler constructs the full path under the hood /project/<project_id>/subnet/<subnet_id>
.
I'd like the option to specify the full path myself in the Autoscaler which would then allow me to use a shared subnet.
I did not touch the interactive session code at all.
I installed clearml-session
using pip and ran the above command with a task id from a task I'd already run.