Reputation
Badges 1
981 × Eureka!This is consistent: Each time I send a new task on the default queue, if trains-agent-1 has only one task running (the long one), it will pick another one. If I add one more experiment in the queue at that point (trains-agent-1 running two experiments at the same time), that experiment will stay in queue (trains-agent-2 and trains-agent-3 will not pick it because they also are running experiments)
(I am not part of the awesome ClearML team, just a happy user π )
AppetizingMouse58 btw I had to delete the old logs index before creating the alias, otherwise ES wonβt let me create an alias with the same name as an existing index
AgitatedDove14 I see that the default sample_frequency_per_sec=2. , but in the UI, I see that there isnβt such resolution (ie. it logs every ~120 iterations, corresponding to ~30 secs.) What is the difference with report_frequency_sec=30. ?
Hi SuccessfulKoala55 , not really wrong, rather I don't understand it, the docker image with the args after it
Just found yea, very cool! Thanks!
So the wheel that was working for me was this one: [torch-1.11.0+cu115-cp38-cp38-linux_x86_64.whl](https://download.pytorch.org/whl/cu115/torch-1.11.0%2Bcu115-cp38-cp38-linux_x86_64.whl)
Maybe there is setting in docker to move the space used in a different location? I can simply increase the storage of the first disk, no problem with that
but then why do I have to do task.connect_configuration(read_yaml(conf_path))._to_dict() ?
Why not task.connect_configuration(read_yaml(conf_path)) simply?
I mean what is the benefit of returning ProxyDictPostWrite instead of a dict?
I see what I described in https://allegroai-trains.slack.com/archives/CTK20V944/p1598522409118300?thread_ts=1598521225.117200&cid=CTK20V944 :
randomly, one of the two experiments is shown for that agent
CostlyOstrich36 How is clearml-session setting the ssh config?
AgitatedDove14 In theory yes there is no downside, in practice running an app inside docker inside a VM might introduce slowdowns. I guess itβs on me to check whether this slowdown is negligible or not
No space, I will add and test π
Installing collected packages: my-engine Attempting uninstall: my-engine Found existing installation: my-engine 1.0.0 Uninstalling my-engine-1.0.0: Successfully uninstalled my-engine-1.0.0 Successfully installed my-engine-1.0.0
yes, the only thing I changed is:install_requires=[ ... "my-dep @ git+ ]to:install_requires=[ ... "git+ "]
yes, because it wonβt install the local package which has this setup.py with the problem in its install_requires described in my previous message
my agents are all .16 and I install trains 0.16rc2 in each Task being executed by the agent
AgitatedDove14 I was able to redirect the logger by doing so:clearml_logger = Task.current_task().get_logger().report_text early_stopping = EarlyStopping(...) early_stopping.logger.debug = clearml_logger early_stopping.logger.info = clearml_logger early_stopping.logger.setLevel(logging.DEBUG)
I followed https://github.com/NVIDIA/nvidia-docker/issues/1034#issuecomment-520282450 and now it seems to be setting up properly
Ok to be fair I get the same curve even when I remove clearml from the snippet, not sure why