Reputation
Badges 1
981 × Eureka!I didnโt use ignite callbacks, for future reference:
` early_stopping_handler = EarlyStopping(...)
def log_patience(_):
clearml_logger.report_scalar("patience", "early_stopping", early_stopping_handler.counter, engine.state.epoch)
engine.add_event_handler(Events.EPOCH_COMPLETED, early_stopping_handler)
engine.add_event_handler(Events.EPOCH_COMPLETED, log_patience) `
Hi TimelyPenguin76 ,
trains-server: 0.16.1-320
trains: 0.15.1
trains-agent: 0.16
Ok, this I cannot locate
So previous_task actually ignored the output_uri
SuccessfulKoala55 I deleted all :monitor:machine and :monitor:gpu series, but only deleted ~20M documents out of 320M documents in the events-training_debug_image-xyz . I would like now to understand which experiments contain most of the document to delete them. I would like to aggregate the number of document per experiment. Is there a way do that using the ES REST api?
ubuntu18.04 is actually 64Mo, I can live with that ๐
That's why I suspected trains was installing a different version that the one I expected
correct, you could also use
Task.create
that creates a Task but does not do any automagic.
Yes, I didn't use it so far because I didn't know what to expect since the doc states:
"Create a new, non-reproducible Task (experiment). This is called a sub-task."
In my github action, I should just have a dummy clearml server and run the task there, connecting to this dummy clearml server
is there a command / file for that?
What I put in the clearml.conf is the following:
agent.package_manager.pip_version = "==20.2.3" agent.package_manager.extra_index_url: [" "] agent.python_binary = python3.8
Hi SuccessfulKoala55 , will I be able to update all references to the old s3 bucket using this command?
BTW, is there any specific reason for not upgrading to clearml?
I just didn't have time so far ๐
I donโt think it is, I was rather wondering how you handled it to understand potential sources of slow down in the training code
very cool, good to know, thanks SuccessfulKoala55 ๐
Ha I see, it is not supported by the autoscaler > https://github.com/allegroai/clearml/blob/282513ac33096197f82e8f5ed654948d97584c35/trains/automation/aws_auto_scaler.py#L120-L125
I opened an https://github.com/pytorch/ignite/issues/2343 in igniteโs repo and a https://github.com/pytorch/ignite/pull/2344 , could you please have a look? There might be a bug in clearml Task.init in distributed envs
Hi AgitatedDove14 , initially I was doing this, but then I realised that with the approach you suggest all the packages of the local environment also end up in the โinstalled packagesโ, while in reality I only need the dependencies of the local package. Thatโs why I use _update_requirements , with this approach only the package required will be installed in the agent
I also tried setting ebs_device_name = "/dev/sdf" - didn't work
my agents are all .16 and I install trains 0.16rc2 in each Task being executed by the agent
How exactly is the clearml-agent killing the task?
I have two controller tasks running in parallel in the trains-agent services queue