Reputation
Badges 1
29 × Eureka!I am having the same error since yesterday on Ubuntu. Works fine on Mac.
I cannot ping api.clear.ml
No particular reason. This was our first time trying it and it seemed the quickest way to get off the ground. When I try without I have a similar error trying to connect although that could be due to the instance.
I did not touch the interactive session code at all.
I installed clearml-session
using pip and ran the above command with a task id from a task I'd already run.
I ran again without the debug mode option and got this error:
>
> Starting Task Execution:
>
>
> Traceback (most recent call last):
> File "/root/.clearml/venvs-builds/3.6/code/interactive_session.py", line 377, in <module>
> from tcp_proxy import TcpProxy
> ModuleNotFoundError: No module named 'tcp_proxy'
>
> Process failed, exit code 1
I have managed to connect. Our EC2 instances run in a private subnet so the ssh connection was not working for that reason I believe. Once I connected to my VPN it now worked.
I believe this was an example report I made for a demo and I've since deleted the tasks which generated it 👍
Nope. But there are steps you can take to prevent this through publishing tasks and reports I believe.
👍 thanks for clearing that up @<1523701087100473344:profile|SuccessfulKoala55>
The code is quite nested by I've tried to extract out the important parts ( summmary_writer
is a tensorboard logger).
self.figure, (ax1, ax2, axc) = plt.subplots(1, 3, figsize=(total_width, total_height), facecolor="white")
self.summary_writer = self.tb_logger.experiment
self.summary_writer.add_figure(Partition.TRAINING.value, train_plot.figure, global_step=self.current_epoch + 1)
The train_plot.figure
is a matplotlib figure created using seaborn.
Let me know if this...
Further to this, I have inspected further. This is working as expected for ClearML 1.8.3 but not for ClearML 1.9.0.
I looked at the commits and found that a change had been made to the _decode_image
method:
This aligns with the error message I'm seeing:
2023-02-08 15:17:25,539 - clearml - WARNING - Error: I/O operation on closed file.
Can this be actioned for the next release plea...
I’ve had some issues with clearml sessions. I’d be interested in seeing a PR. Would you mind posting a link please?
Is there a way I can do this with the python APIClient or even with the requests library?
Yep that's correct. If I have a task which runs every 5 minutes, I don't want a new task every 5 minutes as that will create a lot of tasks over a day. It would be better if I had just one task.
This is not working. Please see None which details the problem
$ curl -H "Authorization: Bearer <TOKEN>" -X GET
{"meta":{"id":"ed6c52d030f240a89f001b447ee64a6b","trx":"ed6c52d030f240a89f001b447ee64a6b","endpoint":{"name":"debug.ping","requested_version":"2.26","actual_version":"1.0"},"result_code":200,"result_subcode":0,"result_msg":"OK","error_stack":null,"error_data":{},"alarms":{}},"data":{"msg":"Hello World"}}%
$ curl -H "Authoriz...
Is there documentation for this as I was not able to figure this out unfortunately.
Furthermore, when using APIClient()
, users
is not a valid endpoint at all.
class APIClient(object):
auth = None # type: Any
queues = None # type: Any
tasks = None # type: Any
workers = None # type: Any
events = None # type: Any
models = None # type: Any
projects = None # type: Any
This is taken from clearml/backend_api/session/client/client.py
According to the documentation users.user
should be a valid endpoint?
Ah, didn’t know that. Yes in that case that would work 👍
Hi,
I've managed to fix it.
Basically, I had a tracker running on our queues to ensure that none of them were lagging. This was using get_next_task
from APIClient().queues
.
If you call get_next_task
it removes the task from the queue but does not put it into another state. I think because typically get_next_task
is immediately followed by something to make the task run in the daemon or delete it.
Hence you end up in this weird state were the task thinks its queued bec...
I don't think there's really a way around this because AWS Lambda doesn't allow for multiprocessing.
Instead, I've resorted to using a clearml Scheduler which runs on a t3.micro instance for jobs which I want to run on a cron.