
Reputation
Badges 1
981 × Eureka!I think we should switch back, and have a configuration to control which mechanism the agent uses , wdyt? (edited)
That sounds great!
AgitatedDove14 After investigation, another program on the machine consumed all the memory available, most likely making the OS killing the agent/task
I created a snapshot of both disks
Hi AgitatedDove14 , I don’t see any in the https://pytorch.org/ignite/_modules/ignite/handlers/early_stopping.html#EarlyStopping but I guess I could overwrite it and add one?
Hi, /opt/clearml is ~40Mb, /opt/clearml/data is about ~50gb
Should I try to disable dynamic mapping before doing the reindex operation?
Although task.data.last_iteration
 is correct when resuming, there is still this doubling effect when logging metrics after resuming 😞
AppetizingMouse58 btw I had to delete the old logs index before creating the alias, otherwise ES won’t let me create an alias with the same name as an existing index
interestingly, it works on one machine, but not on another one
AgitatedDove14 Didn’t work 😞
I’ll definitely check that out! 🤩
I cannot share the file itself, but here are some potential helpful points:
Multiple lines empty One line is empty but has spaces (6 to be exact) The last line of the file is empty
SuccessfulKoala55 Am I doing/saying something wrong regarding the problem of flushing every 5 secs (See my previous message)
I reindexed only the logs to a new index afterwards, I am now doing the same with the metrics since they cannot be displayed in the UI because of their wrong dynamic mappings
Ok, in that case it probably doesn’t work, because if the default value is 10 secs, it doesn’t match what I get in the logs of the experiment: every second the tqdm adds a new line
now I can do nvcc --version
and I getCuda compilation tools, release 10.1, V10.1.243
Also I can simply delete the /elastic_7 folder, I don’t use it anymore (I have a remote ES cluster). In that case, I guess I would have enough space?
on /data or /opt/clearml? these are two different disks
--- /data ---------- 48.4 GiB [##########] /elastic_7 1.8 GiB [ ] /shared 879.1 MiB [ ] /fileserver . 163.5 MiB [ ] /clearml_cache . 38.6 MiB [ ] /mongo 8.0 KiB [ ] /redis
Yes! not a strong use case though, rather I wanted to ask if it was supported somehow
Whohoo! Thanks 👌
No I agree, it’s probably not worth it
MagnificentSeaurchin79 You could also just fork the tensorflow repo, make changes in a specific branch and specify your forked repo with your custom branch in the install_requires of your setup.py
I asked this question some time ago, I think this is just not implemented but it shouldn’t be difficult to add? I am also interested in such feature!
yes, because it won’t install the local package which has this setup.py with the problem in its install_requires described in my previous message