Reputation
Badges 1
108 × Eureka!Then access the 8008 through the tunnel
I have two laptop, one is running ubuntu 20.04 and one is macos, both are running in my local network. I installed the server on ubuntu and ssh from mac to it to bring up the server then build up a tunnel using ssh -L
but the solution in the answer doesn’t help cause when i do reverse with -R the server couldn’t be brought up
Yes, when i put the task init into the spawn function, it can run without error, but it seems that each of the child process has their own experimentsClearML Task: created new task id=54ce0761934c42dbacb02a5c059314da ClearML Task: created new task id=fe66f8ec29a1476c8e6176989a4c67e9 ClearML results page:
ClearML results page:
`
ClearML Task: overwriting (reusing) task id=de46ccdfb6c047f689db6e50e6fb8291
ClearML Task: created new task id=91f891a272364713a4c3019d0afa058e
ClearML re...
Can get the result now but failed with this
I found server API here https://allegro.ai/clearml/docs/rst/references/clearml_api_ref , but not sure how to use it, for example /debug.ping, should i post request on “ http://localhost:8080/debug/ping ” or “ http://localhost:8080/debug.ping ”?
for example i have this experiments of 99 tasks, i’d like to pull all scaler data. How can i achieve this? Thank you!
is there any document for this?
Thank you! another question this method seems need to get the result one by one on the fly. Because i have lots of complete experiments, is there a way that i can pull all scalars at once? Or, can i get experiments list and pull the data?
unfortunately, no, when i try to click the link, there is nothing there https://demoapp.demo.clear.ml/projects/0d49bffcdaa441c2aa3224054737d0bd/experiments/26dd46ec11fd4f95ba522955820a8444/output/log
Can you tell me how can i find out where the scalar log is?
Yes, i think trains might wrap the torch.load function, but the thing is that i need to load some part of the dataset using torch.load, so this error shows up many time during training, I found i can use this line:task = Task.init(project_name="Alfred", task_name="trains_plot", auto_connect_frameworks={'pytorch': False})
but does it mean i cannot monitor torch.load function any more?
Yeah, i’m done with the test, not i can run as what you said
I’ve been added multi-node support for my code, and i found our lab seems only have shared user file because i installed trains on one node, but it doesn’t appear on the others
never done this before, let me do a quick search
I’ll get back to you after i get this done
Not for now, i think it can only run on multiple GPU at one node
we all use conda, guess not need for docker
i see, now we are trying to let the agent pop up the experiment separately and see if they can communicate with each other, right?