Reputation
Badges 1
108 × Eureka!but the solution in the answer doesn’t help cause when i do reverse with -R the server couldn’t be brought up
i tired to run trains-compose without -d to say the log,
trains-agent-services | trains_agent: ERROR: Connection Error: it seems api_server is misconfigured. Is this the TRAINS API server http://apiserver:8008 ?
trains-agent-services | http://192.5.53.86:8081 http://192.5.53.86:8080 http://apiserver:8008
I didn’t assign anything to TRAINS_HOST_IP, not sure if the apiserver:8008 caused the problem
Yes, i think trains might wrap the torch.load function, but the thing is that i need to load some part of the dataset using torch.load, so this error shows up many time during training, I found i can use this line:task = Task.init(project_name="Alfred", task_name="trains_plot", auto_connect_frameworks={'pytorch': False})
but does it mean i cannot monitor torch.load function any more?
i’m trying to install it my lab server, but the same problem happen, when i try to create credentials, it say error but this time it give more info:
Error 301 : Invalid user id: id=f46262bde88b4928997351a657901d8b, company=d1bd92a3b039400cbafc60a7a5b1e52b
I have two laptop, one is running ubuntu 20.04 and one is macos, both are running in my local network. I installed the server on ubuntu and ssh from mac to it to bring up the server then build up a tunnel using ssh -L
Yes, when i put the task init into the spawn function, it can run without error, but it seems that each of the child process has their own experimentsClearML Task: created new task id=54ce0761934c42dbacb02a5c059314da ClearML Task: created new task id=fe66f8ec29a1476c8e6176989a4c67e9 ClearML results page:
ClearML results page:
`
ClearML Task: overwriting (reusing) task id=de46ccdfb6c047f689db6e50e6fb8291
ClearML Task: created new task id=91f891a272364713a4c3019d0afa058e
ClearML re...
But it seems buggy
I tried from clearml.backend_api.session import client no luck
Guess my best chance is to check out the agent source code right?
I can comment it on the github issue
Yeah, i’m done with the test, not i can run as what you said
I’ll try it tomorrow and let you know if there is anything wrong
it’s shared but only user files, everything under ~/ directory
Apiclient will report
we all use conda, guess not need for docker
Then access the 8008 through the tunnel
i’ll finish my breakfast real quick and get in couple minuets
Guess I’ll need to implement job schedule myself
This is so awesome
Sure I'm right here with you
Or can I enable agent in this kind of local mode?
Yeah the ultimate goal I'm trying to achieve is to flexibly running tasks for example before running, could have a claim saying how many resources I can and the agent will run as soon as it find there are enough resources
So is there any tutorial on this topic
works fine awesome!
How can I do to help extend it?