task.upload_artifact(..., is_requirement=True)
, task.connect_configuration(..., is_requirement=True)
Just implies these artifacts/configurations must be downloaded prior to running the code itself; then you also don't have to worry about zipping? ๐ค
Debugging. It's very useful for us to be able to see the contents of the configuration and understand what is going on and what is meant to be going on. Without a preview (which in our case is the entire content of the configuration file), one has to take an annoying route of downloading the files etc. The configurations are uploaded to a single task and then linked across all task to conserve storage space (so the S3 storage point is identical across tasks) Sure, sounds good. I think it's a bit odd to enforce this only to the configurations section though? ๐ค It would be nice to be able to say "this artifact/configuration is needed prior to execution" for each artifact/configuration separately.
It does not ๐
We started discussing it here - https://clearml.slack.com/archives/CTK20V944/p1640955599257500?thread_ts=1640867211.238900&cid=CTK20V944
You suggested this solution - https://clearml.slack.com/archives/CTK20V944/p1640973263261400?thread_ts=1640867211.238900&cid=CTK20V944
And I eventually found this solution to work - https://clearml.slack.com/archives/CTK20V944/p1641034236266500?thread_ts=1640867211.238900&cid=CTK20V944
Am I making sense ?
No, not really. I don't see how task.connect_configuration
interacts with our existing CLI? Additionally, the documentation for task.connect_configuration
say the second argument is the name of a file, not the path to it? So something is off
And yes, our flow would break anyway with the internal references within the yaml file. It would be much simpler if we could specify the additional files
UnevenDolphin73 I have a suspicion we have a few terms mixed:
hyperparameters :
These are essentially key/value.
when you call Task. connect (dict_with_params), clearml will flatten the dict and you end up with key/value
configuration objects :
These are actually blobs of text, the UI will show as is
When you call my_local_file=Task. connect_configuration (name, "path/to/config/file")
The entire Content of the config file is stored on the Task object itself.
Back to the use case, instead of:python train.py --config_file path/to/local/file.yaml
then inside the code:
` my_param_file = args.config_file
with open(my_param_file, 'rt'):
read parse etc. `You could bake both into the same line:
` my_param_file = task.connect_configuration("config_file", "path/to/local/file.yaml")
with open(my_param_file, 'rt'):
read parse etc. `When the "connect_configuration" is used, it actually combines the need to have both an argument pointing to the config file, and the content of the config file. It was designed to solve this exact use case. Am I making sense ?
EDIT:
"then our yaml file containsย
!include
"
Is this the point where connect_configuration breaks ?
As the meme goes, well yes but actually no, since the input path is provided via argparse? I'm also not sure how this would help debug from the WebUI - you can't really see the contents of a zipped file/the configuration tab is too messy for such a nested configuration as the one we have. It's best suited as an artifact.
EDIT: Or am I missing something? Point being, when the remote execution begins, the entry point tries to run e.g. python train.py --config_file path/to/local/file.yaml
which then fails, since that file does not exist.
Even if we override argparse
with some arguments from ClearML (I suppose this is the idea behind the autoconnect feature?), then our yaml file contains !include
instructions, referring to other yaml files, using a relative path. Then, our config file may or may not refer to additional files (again relative path).
As of now, we internally analyze the configuration file, use StorageManager
to upload everything, check if we're running as a remote execution before doing anything, and if so, download using StorageManager
and cleanup using boto3
.
UnevenDolphin73 following the discussion https://clearml.slack.com/archives/CTK20V944/p1643731949324449 , I suggest this change in the pseudo code
` # task code
task = Task.init(...)
if not task.running_locally() and task.is_main_task():
# pre-init stage
StorageManager.download_folder(...) # Prepare local files for execution
else:
StorageManager.upload_file(...) # Repeated for many files needed
task.execute_remotely(...) `Now when I look at is, it kinds of make sense to have to callbacks added to execute_remotely (basically covering the exact if statement I have above)
wdyt?
Yeah that works too. So one can override the queue ID but not the worker ๐ค
Honestly, this is all related to issue #340. The only reason we have this to begin with is because we need one separate "initializer" task that downloads the remote cache and prepares the agent environment for execution (downloading the configuration files, etc).
Otherwise it fits perfectly with pipelines, but we're not there yet.
In the local execution we don't have this initializer task, so we use Task.init()
before starting to work on a model, and task.close()
when we're done.
I'd suggest some task.detach()
method for remote execution maybe? We still find use-cases for this initializer task (it holds the original user and as such we can easily filter and report it to Slack, etc).
hmm, yes, but then this kind of a hacky solution... The original #340 was about packaging source code that was not in git... Now we want to add "data" (even if ephemeral) on to it, no?
My thinking is somehow make sure a Task can reference a "Dataset" to be downloaded before it starts by the agent ?!
I guess it's mixed. If #340 is resolved, then this initializer task will be a no-op: detach, and init-close new tasks as needed.
I didn't mention code in #340 nor did I mention data here ๐ The idea was to package non git-specific files for remote execution
Hmm, so what I'm thinking is "extending" the capabilities of the "configuration" section (as it seems this is the right context). Allowing to upload a bunch of files (with the same mechanism as artifacts), as zip files, in the configuration "editable" section have the URL storing the zip, together with the target folder. wdyt?
That could work, given that:
Could we add a preview section? One reason I don't like using the configuration section is that it makes debugging much much harder. Will the clearml-agent download and unzip the files, placing them into the same local folder as needed for execution? What if we want to include non-configuration objects? (i.e. the model case I listed)
Since this is a single process, most of these are only needed once when our "initializer" task starts and loads.
1
One reason I don't like using the configuration section is that it makes debugging much much harder.
debugging ? please explain how it relates to the configuration, and presentation (i.e. preview)
2.
Yes in theory, but in your case it will not change things, unless these "configurations" are copied on any Task (which is just storage, otherwise no real harm)
3.
I was thinking "zip" file that the Task creates and uploads, and a new configuration type, say "external/zip" , and in the config section have something likeurl:
target: ./
wdyt?
Honestly, this is all related to issue #340.
makes total sense.
But actually this id different from #340. The feature is to store the Data on the Task, this means each Task in your "pipeline" will be upload a new copy of the data. No?
I'd suggest someย
task.detach()
ย method for remote execution maybe
That is a good idea, in theory it can also be used in local execution
It's okay ๐ I was originally hoping to delete my "initializer" task, but I'll just archive it if someone is interested in the worker data etc. Setting the queue is quite nice.
I think this should get my team excited enough ๐
Feels like we've been over this
LOL, I think I can't wrap my head around the use case ๐
When running locally, this is "out of the box", as we can init and close before and after each model.
I finally got it! Task.init
should be dubbed "init Main task" , automagic kicks in Only when it is the only one existing. You remote execution is "linear" Task after Task, in theory a good candidate for pipeline.
Basically option (2) , the main task is being "replaced" (which locally would be task.close), right?
Your use case is a good example of Task.create
use case (which you use), as it does n't mean it is the "main" Task, but as a result, all reports have to be done manualy.
How would you improve the current state? (take into account that when the agent spins the remote Task, it will continue to report console outputs, and monitor the status of the Original task)
Are they ephemeral or later used by other Tasks, execution etc ?
For example: configuration files, they are specific for an execution, and someone will edit them.
Initial weights files, are something that multiple execution might needs them, and they will be used to restore an execution. Data, even if changing, is usually used by multiple executions tasks etc.
It seems like you treat these files as "configurations", is that right ?
Feels like we've been over this ๐ Has there been new developments perhaps?
It's essentially that this - https://clear.ml/docs/latest/docs/guides/advanced/multiple_tasks_single_process cannot work in a remote execution.
AgitatedDove14 the issue was that we'd like the remote task to be able to spawn new tasks, which it cannot do if I use Task.init
before override_current_task_id(None)
.
When would this callback be called? I'm not sure I understand the usecase.
The new task is not running inside a new subprocess. Our platform trains several models, and we'd like each of them to be tracked in their own Task
. When running locally, this is "out of the box", as we can init and close before and after each model.
When running remotely, one cannot close the main task (since it is what orchestrates everything), and so this workaround was needed.
It's always the details... Is the new Task running inside a new subprocess ?
basically there is a difference between
remote task spawning new tasks (as subprocesses, or as jobs on remote machine), remote task still running remote task, is being replaced by a spawned task (same process?!)UnevenDolphin73 am I missing a 3rd option? which of these is your case?
p,s. I have a suspicion that there might be a misuse of "Task" here?! What are you considering a Task? (from clearml perspective a Task is Not a function, a Task is an entire standalone process, usually not a very short one)
So one can override the queue ID but not the worker
apparently ... I can't think of a good reason for that actually ...
regrading the artifact, yes that make sense, I guess this is why there is "input" type for an artifact, the actual use case was never found (I guess until now?! what are you point there?)
Regrading the configuration
It's very useful for us to be able to see the contents of the configuration and understand
Wouldn't that just do exactly what you are looking for:
` local_config_file_that_i_can_always_open = task.connect_configuration("important", "/path/to/config/I/only/have/on/my/machine")
with open(local_config_file_that_i_can_always_open, 'rt') as f:
do something `This means that when running locally:
local_config_file_that_i_can_always_open == "/path/to/config/I/only/have/on/my/machine"
And when running with an agentlocal_config_file_that_i_can_always_open == "/tmp/config/file/from_my_machine.stuff"
wdyt?
Most of these are configurations (specific for an execution, but one such configuration defines multiple tasks). Some models might be uploaded if the user does not use our built-in link to ClearML model fetching ๐
But since this has come up a lot recently, any updates on #340? ๐
UnevenDolphin73
we'd like the remote task to be able to spawn new tasks,
Why is this an issue? this should work out of the box ?