Reputation
Badges 1
981 × Eureka!AgitatedDove14 yes! I now realise that the ignite events callbacks seem to not be fired (I tried to print a debug message on a custom Events.ITERATION_COMPLETED) and I cannot see it logged
I followed https://github.com/NVIDIA/nvidia-docker/issues/1034#issuecomment-520282450 and now it seems to be setting up properly
but not as much as the ELB reports
AgitatedDove14 Yes exactly, I tried the fix suggested in the github issue urllib3>=1.25.4 and the ImportError disappeared 🙂
AgitatedDove14 Is it possible to shut down the server while an experiment is running? I would like to resize the volume and then restart it (should take ~10 mins)
I will let the team answer you on that one 🙂
AgitatedDove14 How can I filter out tasks archived? I don't see this option
This is how I start the agent that is running the two experiments in parallel:python3 -m trains_agent --config-file "~/trains.conf" daemon --queue default --log-level DEBUG --detached
Hi AgitatedDove14 , I don’t see any in the https://pytorch.org/ignite/_modules/ignite/handlers/early_stopping.html#EarlyStopping but I guess I could overwrite it and add one?
Hi CostlyOstrich36 , most of the time I want to compare two experiments in the DEBUG SAMPLE, so if I click on one sample to enlarge it I cannot see the others. Also once I closed the panel, the iteration number is not updated
Thanks for the hack! The use case is the following: I have a controler that creates training/validation/testing tasks by cloning (so that the parent task id is properly set to the controler). Otherwise I could simply create these tasks with Task.init, but then I would need to set manually the parent task for each one of these tasks, probably with a similar hack, right?
I am not sure what you mean by unless the domain is different ? Personal Access Token are designed such that to allow cloning a private repo, the user has to give the PAT full access to repos, including public repos. So it should also work with all other git repos
with the CLI, on a conda env located in /data
Still getting the same error, it is not taken into account 🤔
ho wait, actually I am wrong
Setting it after the training correctly updated the task and I was able to store artifacts remotely
I should also rename /opt/trains/data/elastic_migrated_2020-08-11_15-27-05 folder to /opt/trains/data/elastic before running the migration tool right?
Here I have to do it for each task, is there a way to do it for all tasks at once?
in my clearml.conf, I only have:sdk.aws.s3.region = eu-central-1 sdk.aws.s3.use_credentials_chain = true agent.package_manager.pip_version = "==20.2.3"
Not of the ES cluster, I only created a backup of the clearml-server instance disk, I didn’t think there could be a problem with ES…
But that was too complicated, I found an easier approach
The file /tmp/.clearml_agent_out.j7wo7ltp.txt  does not exist
Thanks! Corrected both, now its building