Thanks for your help SuccessfulKoala55 ! Appreciate the patience š
Is there a preferred way to stop the agent?
I'll kill the agent and try again but with the detached mode š¤
So theĀ
..data
Ā referenced in the example above are part of the git repository?
Yup š
I just used this to create the dual_gpu
queue:clearml-agent daemon --queue dual_gpu --create-queue --gpus 0,1 --detached
A follow up question (instead of opening a new thread), is there a way I could signal some files/directories to be copied to theĀ
execute_remotely
Ā task?
For that you'll need to use a Git repository - the repository will be automatically cloned when running the task remotely.
You can always upload using the StorageManager and download if the file is not there
I was thinking of using the --volume
settings in clearml.conf
to mount the relevant directories for each user (so it's somewhat customizable). Would that work?
It would be amazing if one can specify specific local dependencies for remote execution, and those would be uploaded to the file server and downloaded before the code starts executing
A follow up question (instead of opening a new thread), is there a way I could signal some files/directories to be copied to the execute_remotely
task?
So the ..data
referenced in the example above are part of the git repository?
What about setting the working_directory
to the user working directory using Task.init
or Task.create
?
Seemed to work fine again in detached mode, what went wrong there :shocked_face_with_exploding_head:
This follows the standard ClearML remote execution practice - an agent runs the task, and either uses the actual python code file (stored entirely in the server under the uncommitted changes section), or clones a git repository
I guess following the example https://github.com/allegroai/clearml/blob/master/examples/advanced/execute_remotely_example.py , it's not clear to me how the server has access to the data loaders location when it hits execute_remotely
The idea is that the features would be copied/accessed by the server, so we can transition slowly and not use the available storage manager for data monitoring
Is there a preferred way to stop the agent?
Same agent command + --stop
Don't do it in detached mode - do it in another console window
From the log you shared, the task is picked up by theĀ
worker_d1bd92a3b039400cbafc60a7a5b1e52b_4e831c4cbaf64e02925b918e9a3a1cf6_<hostname>:gpu0,1
Ā worker
I can try and target the default one if it helps..?
Hm, that seems less than ideal. I was hoping I could pass some CSV locations. I'll try and find a workaround for that. Thanks!
What about setting theĀ
working_directory
Ā to the user working directory usingĀ
Task.init
Ā orĀ
Task.create
?
The working_directory
is simply one of the parameters used when cloning a git repository, so it won't work...
You can rely on a fixed mount point, for example, but that requires more setup
It failed on some missing files in my remote_execution, but otherwise seems fine now
Or store as a configuration item (if it's not a lots of data)