Reputation
Badges 1
25 × Eureka!This one should work:
` path = task.connect_configuration(path, name=name)
if task.running_locally():
my_params = read_from_path(path)
my_params = change_parmas(my_params) # change some staff
store back the change, my_params assumed to be the content of the param file (text)
task.set_configuration_object(name=name, config_taxt=my_params) `
i keep getting an failed getting token error
MiniatureCrocodile39 what's the server you are using ?
Hi CleanPigeon16
Yes there is, when you are cloning the pipeline in the UI, go to the Configuration/Pipeline/continue_pipeline and change it to True
Hmm, I see the jump from 50 to 100, is that consistent with the last iteration on the aborted Task (before continuing )?
Hi SourOx12
How do you set the iteration when you continue the experiment? is it with Task.init
continue_last_task
?
SourOx12
Hmmm. So if last iteration was 75, the next iteration (after we continue) will be 150 ?
Optional[Sequence[Union[str, Dataset]]]
None, list of string or list of Datasets objects
(each one is a parent (supporting multiple parents)
This is not very clear from the documentation
ElegantCoyote26 which documentation are you referring to ?
So the thing is clearml
automatically detects the last iteration of the previous run, my assumption you also add it hence the double shift.
SourOx12 could that be it?
Do you think this is better ? (the API documentation is coming directly from the python doc-string, so the code will always have the latest documentation)
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/clearml/datasets/dataset.py#L633
SourOx12
Run this example:
https://github.com/allegroai/clearml/blob/master/examples/reporting/scalar_reporting.py
Once, then change line #26 to:task = Task.init(project_name="examples", task_name="scalar reporting", continue_last_task=True)
and run again,
I'm not sure I follow the example... Are you sure this experiment continued a previous run?
What was the last iteration on the previous run ?
Hi GreasyPenguin14
Yes, I think you are right the series name should be next to the title. Let me check it...
GreasyPenguin14 let me check with the guys when is the next version .
Are you using the self-hosted server of the community server ?
ETA for the next release is end of the month/early March, it is planned to include many other improvements 🙂
We have tried to manually restart tasks reloading all the scalars from a dead task and loading latest saved torch model.
Hi ThickKitten19
how did you try to restart them ? how are you monitoring dying instances ? where . how they are running?
DefeatedCrab47 if TB has it as image, you should find it under "debug_samples" as image.
Can you locate it there ?
is removed from the experiment list?
You mean archived ?
shows that the trains-agent is stuck running the first experiment, not
the trains_agent execute --full-monitoring --id a445e40b53c5417da1a6489aad616fee
is the second trains-agent instance running inside the docker, if the task is aborted, this process should have quit...
Any suggestions on how I can reproduce it?
ElegantKangaroo44 it seems to work here?!
https://demoapp.trains.allegro.ai/projects/0e152d03acf94ae4bb1f3787e293a9f5/experiments/48907bb6e870479f8b230e6b564cd52e/output/metrics/plots
ElegantKangaroo44 definitely a bug, will be fixed in 0.15.1 (release in a week or so)
https://github.com/allegroai/trains/issues/140
Feel free to add to the UI request list:
https://github.com/allegroai/trains/issues/81
ElegantKangaroo44 I think TrainsCheckpoint
would probably be the easiest solution. I mean it will not be a must, but another option to deepen the integration, and allow us more flexibility.
It all depends how we store the meta-data on the performance. You could actually retrieve it from the say val metric and deduce the epoch based on that
"Updates a few seconds ago"
That just means that the process is not dead.
Yes that seemed to be stuck 😞
Any chance you can verify with the RC version?
I'll try to dig into the commits, maybe I can come up with an explanation ...
Hi @<1664079296102141952:profile|DangerousStarfish38>
You mean spin the agent on multiple Windows machines? Yes that is supported, I think that it is limited to venv (i.e. not docker) mode, but other than that should work out of the box
Thanks for answering, Yes, this is exactly what I wanted
Hmm should be possible, how slow is the update that we want to save the time ?