Reputation
Badges 1
31 × Eureka!I ran again without the debug mode option and got this error:
>
> Starting Task Execution:
>
>
> Traceback (most recent call last):
> File "/root/.clearml/venvs-builds/3.6/code/interactive_session.py", line 377, in <module>
> from tcp_proxy import TcpProxy
> ModuleNotFoundError: No module named 'tcp_proxy'
>
> Process failed, exit code 1
$ curl -H "Authorization: Bearer <TOKEN>" -X GET
{"meta":{"id":"ed6c52d030f240a89f001b447ee64a6b","trx":"ed6c52d030f240a89f001b447ee64a6b","endpoint":{"name":"debug.ping","requested_version":"2.26","actual_version":"1.0"},"result_code":200,"result_subcode":0,"result_msg":"OK","error_stack":null,"error_data":{},"alarms":{}},"data":{"msg":"Hello World"}}%
$ curl -H "Authoriz...
I believe this was an example report I made for a demo and I've since deleted the tasks which generated it 👍
I am having the same error since yesterday on Ubuntu. Works fine on Mac.
I cannot ping api.clear.ml
I don't think there's really a way around this because AWS Lambda doesn't allow for multiprocessing.
Instead, I've resorted to using a clearml Scheduler which runs on a t3.micro instance for jobs which I want to run on a cron.
This is not working. Please see None which details the problem
Nope. But there are steps you can take to prevent this through publishing tasks and reports I believe.
I have managed to connect. Our EC2 instances run in a private subnet so the ssh connection was not working for that reason I believe. Once I connected to my VPN it now worked.
If a Task is in the 'Completed' I think the only option is to 'Reset' it (see image). You do clear the previous run execution but I think for a repetitive task this is fine.
Maybe this should only be the case if it is in a 'Completed' state rather than 'Failed'. I can see that in this case you would not want to clear the execution because you would want to see why it Failed. Thoughts?
According to the documentation users.user
should be a valid endpoint?
Ah, didn’t know that. Yes in that case that would work 👍
@<1523701087100473344:profile|SuccessfulKoala55> Just following up as I figured out what was happening here and could be useful for the future.
The prefilled value for Number of GPUs
in the GCP Autoscaler is 1
.
When one ticks Run in CPU mode (no gpus)
it hides the GPU Type
and Number of GPUs
fields. However, the value which was these fields are still submitted in the API Request (I'm guessing here) when the Autoscaler is launched.
Hence, to get past this, you need to...
I’ve had some issues with clearml sessions. I’d be interested in seeing a PR. Would you mind posting a link please?
Solved for me as well now.
@<1523701087100473344:profile|SuccessfulKoala55> Thanks for getting back to me. My image contains clearml-agent==1.9.1
. There is a recent release to 1.9.2
and now on every run the agent installs this newer version thanks to the -U
flag which is being passed. From the docs it looks like there may be a way to prevent this upgrade but it's not clear to me exactly how to do this. Is it possible?