Reputation
Badges 1
981 × Eureka!Hi TimelyPenguin76 ,
trains-server: 0.16.1-320
trains: 0.15.1
trains-agent: 0.16
Ok, this I cannot locate
In all the steps I want to store them as artifacts to s3 because itโs very convenient.
The last step should merge them all, ie. it needs to know all the other artifacts of the previous steps
So previous_task actually ignored the output_uri
SuccessfulKoala55 I deleted all :monitor:machine and :monitor:gpu series, but only deleted ~20M documents out of 320M documents in the events-training_debug_image-xyz . I would like now to understand which experiments contain most of the document to delete them. I would like to aggregate the number of document per experiment. Is there a way do that using the ES REST api?
I get the following error:
ubuntu18.04 is actually 64Mo, I can live with that ๐
That's why I suspected trains was installing a different version that the one I expected
correct, you could also use
Task.create
that creates a Task but does not do any automagic.
Yes, I didn't use it so far because I didn't know what to expect since the doc states:
"Create a new, non-reproducible Task (experiment). This is called a sub-task."
In my github action, I should just have a dummy clearml server and run the task there, connecting to this dummy clearml server
is there a command / file for that?
What I put in the clearml.conf is the following:
agent.package_manager.pip_version = "==20.2.3" agent.package_manager.extra_index_url: [" "] agent.python_binary = python3.8
Hi SuccessfulKoala55 , will I be able to update all references to the old s3 bucket using this command?
BTW, is there any specific reason for not upgrading to clearml?
I just didn't have time so far ๐
That would be amazing!
I donโt think it is, I was rather wondering how you handled it to understand potential sources of slow down in the training code
very cool, good to know, thanks SuccessfulKoala55 ๐
Ha I see, it is not supported by the autoscaler > https://github.com/allegroai/clearml/blob/282513ac33096197f82e8f5ed654948d97584c35/trains/automation/aws_auto_scaler.py#L120-L125
I opened an https://github.com/pytorch/ignite/issues/2343 in igniteโs repo and a https://github.com/pytorch/ignite/pull/2344 , could you please have a look? There might be a bug in clearml Task.init in distributed envs
Hi AgitatedDove14 , initially I was doing this, but then I realised that with the approach you suggest all the packages of the local environment also end up in the โinstalled packagesโ, while in reality I only need the dependencies of the local package. Thatโs why I use _update_requirements , with this approach only the package required will be installed in the agent
I also tried setting ebs_device_name = "/dev/sdf" - didn't work