It does not 🙂
We started discussing it here - https://clearml.slack.com/archives/CTK20V944/p1640955599257500?thread_ts=1640867211.238900&cid=CTK20V944
You suggested this solution - https://clearml.slack.com/archives/CTK20V944/p1640973263261400?thread_ts=1640867211.238900&cid=CTK20V944
And I eventually found this solution to work - https://clearml.slack.com/archives/CTK20V944/p1641034236266500?thread_ts=1640867211.238900&cid=CTK20V944
Feels like we've been over this 😄 Has there been new developments perhaps?
It's essentially that this - https://clear.ml/docs/latest/docs/guides/advanced/multiple_tasks_single_process cannot work in a remote execution.
That could work, given that:
Could we add a preview section? One reason I don't like using the configuration section is that it makes debugging much much harder. Will the clearml-agent download and unzip the files, placing them into the same local folder as needed for execution? What if we want to include non-configuration objects? (i.e. the model case I listed)
Feels like we've been over this
LOL, I think I can't wrap my head around the use case 🙂
When running locally, this is "out of the box", as we can init and close before and after each model.
I finally got it!
Task.init should be dubbed "init Main task" , automagic kicks in Only when it is the only one existing. You remote execution is "linear" Task after Task, in theory a good candidate for pipeline.
Basically option (2) , the main task is being "replaced" (which locally would be task.close), right?
Your use case is a good example of
Task.create use case (which you use), as it does n't mean it is the "main" Task, but as a result, all reports have to be done manualy.
How would you improve the current state? (take into account that when the agent spins the remote Task, it will continue to report console outputs, and monitor the status of the Original task)
hmm, yes, but then this kind of a hacky solution... The original #340 was about packaging source code that was not in git... Now we want to add "data" (even if ephemeral) on to it, no?
My thinking is somehow make sure a Task can reference a "Dataset" to be downloaded before it starts by the agent ?!
regrading the artifact, yes that make sense, I guess this is why there is "input" type for an artifact, the actual use case was never found (I guess until now?! what are you point there?)
Regrading the configuration
It's very useful for us to be able to see the contents of the configuration and understand
Wouldn't that just do exactly what you are looking for:
` local_config_file_that_i_can_always_open = task.connect_configuration("important", "/path/to/config/I/only/have/on/my/machine")
with open(local_config_file_that_i_can_always_open, 'rt') as f:
do something `This means that when running locally:
local_config_file_that_i_can_always_open == "/path/to/config/I/only/have/on/my/machine"And when running with an agent
local_config_file_that_i_can_always_open == "/tmp/config/file/from_my_machine.stuff"wdyt?
It's always the details... Is the new Task running inside a new subprocess ?
basically there is a difference between
remote task spawning new tasks (as subprocesses, or as jobs on remote machine), remote task still running remote task, is being replaced by a spawned task (same process?!)UnevenDolphin73 am I missing a 3rd option? which of these is your case?
p,s. I have a suspicion that there might be a misuse of "Task" here?! What are you considering a Task? (from clearml perspective a Task is Not a function, a Task is an entire standalone process, usually not a very short one)
UnevenDolphin73 following the discussion https://clearml.slack.com/archives/CTK20V944/p1643731949324449 , I suggest this change in the pseudo code
` # task code
task = Task.init(...)
if not task.running_locally() and task.is_main_task():
# pre-init stage
StorageManager.download_folder(...) # Prepare local files for execution
StorageManager.upload_file(...) # Repeated for many files needed
task.execute_remotely(...) `Now when I look at is, it kinds of make sense to have to callbacks added to execute_remotely (basically covering the exact if statement I have above)
AgitatedDove14 the issue was that we'd like the remote task to be able to spawn new tasks, which it cannot do if I use
When would this callback be called? I'm not sure I understand the usecase.
Am I making sense ?
No, not really. I don't see how
task.connect_configuration interacts with our existing CLI? Additionally, the documentation for
task.connect_configuration say the second argument is the name of a file, not the path to it? So something is off
The new task is not running inside a new subprocess. Our platform trains several models, and we'd like each of them to be tracked in their own
Task . When running locally, this is "out of the box", as we can init and close before and after each model.
When running remotely, one cannot close the main task (since it is what orchestrates everything), and so this workaround was needed.
Honestly, this is all related to issue #340.
makes total sense.
But actually this id different from #340. The feature is to store the Data on the Task, this means each Task in your "pipeline" will be upload a new copy of the data. No?
I'd suggest some
method for remote execution maybe
That is a good idea, in theory it can also be used in local execution
Are they ephemeral or later used by other Tasks, execution etc ?
For example: configuration files, they are specific for an execution, and someone will edit them.
Initial weights files, are something that multiple execution might needs them, and they will be used to restore an execution. Data, even if changing, is usually used by multiple executions tasks etc.
It seems like you treat these files as "configurations", is that right ?
Hmm, so what I'm thinking is "extending" the capabilities of the "configuration" section (as it seems this is the right context). Allowing to upload a bunch of files (with the same mechanism as artifacts), as zip files, in the configuration "editable" section have the URL storing the zip, together with the target folder. wdyt?
One reason I don't like using the configuration section is that it makes debugging much much harder.
debugging ? please explain how it relates to the configuration, and presentation (i.e. preview)
Yes in theory, but in your case it will not change things, unless these "configurations" are copied on any Task (which is just storage, otherwise no real harm)
I was thinking "zip" file that the Task creates and uploads, and a new configuration type, say "external/zip" , and in the config section have something like
Debugging. It's very useful for us to be able to see the contents of the configuration and understand what is going on and what is meant to be going on. Without a preview (which in our case is the entire content of the configuration file), one has to take an annoying route of downloading the files etc. The configurations are uploaded to a single task and then linked across all task to conserve storage space (so the S3 storage point is identical across tasks) Sure, sounds good. I think it's a bit odd to enforce this only to the configurations section though? 🤔 It would be nice to be able to say "this artifact/configuration is needed prior to execution" for each artifact/configuration separately.
As the meme goes, well yes but actually no, since the input path is provided via argparse? I'm also not sure how this would help debug from the WebUI - you can't really see the contents of a zipped file/the configuration tab is too messy for such a nested configuration as the one we have. It's best suited as an artifact.
EDIT: Or am I missing something? Point being, when the remote execution begins, the entry point tries to run e.g.
python train.py --config_file path/to/local/file.yaml which then fails, since that file does not exist.
Even if we override
argparse with some arguments from ClearML (I suppose this is the idea behind the autoconnect feature?), then our yaml file contains
!include instructions, referring to other yaml files, using a relative path. Then, our config file may or may not refer to additional files (again relative path).
As of now, we internally analyze the configuration file, use
StorageManager to upload everything, check if we're running as a remote execution before doing anything, and if so, download using
StorageManager and cleanup using
task.upload_artifact(..., is_requirement=True) ,
Just implies these artifacts/configurations must be downloaded prior to running the code itself; then you also don't have to worry about zipping? 🤔
UnevenDolphin73 I have a suspicion we have a few terms mixed:
These are essentially key/value.
when you call Task. connect (dict_with_params), clearml will flatten the dict and you end up with key/value
configuration objects :
These are actually blobs of text, the UI will show as is
When you call my_local_file=Task. connect_configuration (name, "path/to/config/file")
The entire Content of the config file is stored on the Task object itself.
Back to the use case, instead of:
python train.py --config_file path/to/local/file.yaml
then inside the code:
` my_param_file = args.config_file
with open(my_param_file, 'rt'):
read parse etc. `You could bake both into the same line:
` my_param_file = task.connect_configuration("config_file", "path/to/local/file.yaml")
with open(my_param_file, 'rt'):
read parse etc. `When the "connect_configuration" is used, it actually combines the need to have both an argument pointing to the config file, and the content of the config file. It was designed to solve this exact use case. Am I making sense ?
"then our yaml file contains
Is this the point where connect_configuration breaks ?
Honestly, this is all related to issue #340. The only reason we have this to begin with is because we need one separate "initializer" task that downloads the remote cache and prepares the agent environment for execution (downloading the configuration files, etc).
Otherwise it fits perfectly with pipelines, but we're not there yet.
In the local execution we don't have this initializer task, so we use
Task.init() before starting to work on a model, and
task.close() when we're done.
I'd suggest some
task.detach() method for remote execution maybe? We still find use-cases for this initializer task (it holds the original user and as such we can easily filter and report it to Slack, etc).