Why do you want to keep containers running after the code finished executing?
if want to debug why it failed if it failed due to tiny issue - maybe to fix it locally and restart the training command from within docker, not to wait for docker to get assembled again
Aren't you getting logs from the docker via ClearML? I think you can build that capability fairly easily with ClearML, maybe add a PR?
yeah. I am getting logs, but they are extremely puzzling to me. I would appreciate to actually have access to whole package structure.. indeed. can you maybe point where the docker command is composed. looking for it for past 30 mins or so. not so familiar with internals really 😕
yeah. I am getting logs, but they are extremely puzzling to me. I would appreciate to actually have access to whole package structure..
Actual packages are updated back to "Installed Packages" section (under the execution tab).
indeed. can you maybe point where the docker command is composed.
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/clearml_agent/commands/worker.py#L3694
🙂
BTW: you can run/build the entire thing on your machine with:clearml-agent execute --docker --id <task_id>
orclearml-agent build --docker --id <task_id> --target <image_name_here>
ah, stupid me. I was looking everywhere but not in clearml-agent 🙂 thanks!
but who exactly executes agent in this case? does it happen on my machine, server or worker?
but who exactly executes agent in this case?
with both execute
/ build
commands, you execute it on your machine, for debugging purposes. make sense ?
nono. that one is clear. i am about general workflow...
but from diagrams it looks like its worker who runs that worker.py you pointed above.