Reputation
Badges 1
978 × Eureka!The rest of the configuration is set with env variables
no, one worker (trains-agent-1) "forget from time to time" the current experiment he is running and picks another experiment on top of the one he is currently running
Hoo I found:user@trains-agent-1: ps -ax 5199 ? Sl 29:25 python3 -m trains_agent --config-file ~/trains.conf daemon --queue default --log-level DEBUG --detached 6096 ? Sl 30:04 python3 -m trains_agent --config-file ~/trains.conf daemon --queue default --log-level DEBUG --detached
Awesome! (Broken link in migration guide, step 3: https://allegro.ai/docs/deploying_trains/trains_server_es7_migration/ )
When an experiment on trains-agent-1 is finished, I see randomly no experiment/long experiment and when two experiments are running, I see randomly one of the two experiments
So two possible cases for trains-agent-1: either:
It picks a new experiment -> show randomly one of the two experiments in the "workers" tab no new experiment in default queue to start -> show randomly no experiment or the one that it is running
For the moment this is what I would be inclined to believe
Looks like its a hurray then π π πΎ
I want the clearml-agent/instance to stop right after the experiment/training is βpausedβ (experiment marked as stopped + artifacts saved)
AgitatedDove14 yes but I don't see in the docs how to attach it to the logger of the earlystopping handler
v0.17.5rc2
Just tested locally, in terminal its the same: with the hack it works, without the hack it doesn't show the logger messages
AgitatedDove14 I was able to redirect the logger by doing so:clearml_logger = Task.current_task().get_logger().report_text early_stopping = EarlyStopping(...) early_stopping.logger.debug = clearml_logger early_stopping.logger.info = clearml_logger early_stopping.logger.setLevel(logging.DEBUG)
Should I open an issue in github clearml-agent repo?
Very good job! One note: in this version of the web-server, the experiments logo types are all blank, what was the reason to change them? Having a color code in the logos helps a lot to quickly check the nature of the different experiments tasks, isnt it?
Ok so the problem was indeed the way docker was installed (with snap)
Add carriage return flush support using the sdk.development.worker.console_cr_flush_period configuration setting (GitHub trains Issue 181)
SuccessfulKoala55 Am I doing/saying something wrong regarding the problem of flushing every 5 secs (See my previous message)
AgitatedDove14 Yes that might work, also the first one (with conda) might work as well, I will give it a try, thanks!
I also don't understand what you mean by unless the domain is different...
The same way ssh keys are global, I would have expected the git creds to be used for any git operation
sure, will be happy to debug that π
That would be awesome, yes, only from my side I have 0 knowledge of the pip codebase π
Still failing with the same error π