Reputation
Badges 1
7 × Eureka!if want to debug why it failed if it failed due to tiny issue - maybe to fix it locally and restart the training command from within docker, not to wait for docker to get assembled again
yeah. I am getting logs, but they are extremely puzzling to me. I would appreciate to actually have access to whole package structure.. indeed. can you maybe point where the docker command is composed. looking for it for past 30 mins or so. not so familiar with internals really 😕
nono. that one is clear. i am about general workflow...
but from diagrams it looks like its worker who runs that worker.py you pointed above.
ah, stupid me. I was looking everywhere but not in clearml-agent 🙂 thanks!
but who exactly executes agent in this case? does it happen on my machine, server or worker?
okay, after digging a little, I think the snippet above is not valid. "standalone script" - assumes just a single python file. I still need repo where training scripts live. It just cant be cloned on remote worker. Is it possible to somehow pack local repo to be executed on remote worker?
was rather on infrastructure level - worker doesnt have access to the repository. I found workaround of using mirror repo that is accessible by worker..