Some more context: the second experiment finished and now, in the UI, in workers&queues tab, I see randomlytrains-agent-1 | - | - | - | ... (refresh page) trains-agent-1 | long-experiment | 12h | 72000 |
I made sure before deleting the old index that the number of docs matched
Should I try to disable dynamic mapping before doing the reindex operation?
my agents are all .16 and I install trains 0.16rc2 in each Task being executed by the agent
SuccessfulKoala55 Here is the trains-elastic error
how would it interact with the clearml-server api service? would it be completely transparent?
That said, you might have accessed the artifacts before any of them were registered
I called task.wait_for_status() to make sure the task is done
Yes, but I am not certain how: I just deleted the /data folder and restarted the server
You are right, thanks! I was trying to move /opt/trains/data to an external disk, mounted at /data
So I created a symlink in /opt/train/data -> /data
it worked for the other folder, so I assume yes --> I archived the /opt/trains/data/mongo, sent the archive via scp, unarchived, updated the rights and now it works
Early debugging signals show that auto_connect_frameworks={'matplotlib': False, 'joblib': False}
seem to have a positive impact - it is running now, I will confirm in a bit
you mean to run it on the CI machine ?
yes
That should not happen, no? Maybe there is a bug that needs fixing on clearml-agent ?
It just to test that the logic being executed in if not Task.running_locally()
is correct
AppetizingMouse58 btw I had to delete the old logs index before creating the alias, otherwise ES wonโt let me create an alias with the same name as an existing index
I think that somehow somewhere a reference to the figure is still living, so plt.close("all") and gc cannot free the figure and it ends up accumulating. I don't know where yet
Also what is the benefit of having by default index.number_of_shards = 1
for the metrics and the logs indices? Having more allows to scale and later move them in separate nodes if needed - the default heap size being 2Gb, it should be possible, or?
I am using 0.17.5, it could be either a bug on ignite or indeed a delay on the send. I will try to build a simple reproducible example to understand to cause
Alright, thanks for the answer! Seems legit then ๐
Indeed, I actually had the old configuration that was not JSON - I converted to json, now works ๐
Actually I think I am approaching the problem from the wrong angle
Because it lives behind a VPN and github workers donโt have access to it
I tested by installing flask in the default env -> which was installed in the ~/.local/lib/python3.6/site-packages
folder. Then I created a venv with flag --system-site-packages
. I activated the venv and flask was indeed available
Hi CostlyOstrich36 , most of the time I want to compare two experiments in the DEBUG SAMPLE, so if I click on one sample to enlarge it I cannot see the others. Also once I closed the panel, the iteration number is not updated