ResponsiveHedgehong88 , do you have an option to log into the machine and see the state or if there were any errors? Is there any chance it's running out of memory? The agent also keeps a local log, can you take a look there to see if there is any discrepancy?
Hi CostlyOstrich36 is there a default location for the agents local log?
When the agent starts running a task it will print out where the logs are being saved
For example:task 613b77be5dac4f6f9eaea7962bf4e034 pulled from eb1c9d9c680d4bdea2dbf5cf90e54af2 by worker worker-bruce:3 Running task '613b77be5dac4f6f9eaea7962bf4e034' Storing stdout and stderr log to '/tmp/.clearml_agent_out._sox_04u.txt', '/tmp/.clearml_agent_out._sox_04u.txt'
unfortunately the experiment is run in docker and the container is down already... I don't know if this happened at the same time. So you're saying it might be memory issues? Any other hints i might check while running a new experiment?
ResponsiveHedgehong88 you can try mapping out the /tmp/ folder inside the docker outside for later inspection so the data wouldn't be lost. This could give us a better idea of what's happening
Ok good idea thanks, will do in the next run
Also, in the Scalers section you can see the machine statistics to maybe get an idea. If the memory usage is high this might be the issue. If not then we can cancel out this hypothesis (probably)
Hi ResponsiveHedgehong88 , I was trying to do the same thing but the loggerhook doesn't seem to work. The console log and scalar logs didn't come out when I registered via init.py and load via log_config. Are you able to share how you configure it?