
Reputation
Badges 1
149 × Eureka!I think it's still an issue, not critical though, because we have another way to do it and it works
Suppose I have the following scenario (real-world project, real ML pipeline scenario)
- I have separate projects for different steps (ETL, train, test, tensorrt conversion...). Every step has it's own git repository, docker image, branch etc
- For quite a long time all the steps were not functioning as parts of an automated pipeline. For example, collaborative experimentation (training and validation steps). We were just focusing on reproducibility/versioning etc
- After some time, we decided...
so I had a base_task for every step. Then I wanted to modify a shitton of things, and I found no better way than to export, modify, and import to another task. Then this second task serves as step without cloning (that was my intention)
@<1523701070390366208:profile|CostlyOstrich36> On the screenshot, the upper task has the lower task as parent
Before the code I shared, there were some lines like this
step_base_task = Task.get_task(project_name=cfg[name]['base_project'],
task_name=cfg[name]['base_name'])
export_data = step_base_task.export_task()
... modify export_data in-place ...
task = Task.import_task(export_data)
pipe.add_step(base_task_id=task.id, clone_base_task=False, ...)
@<1523701205467926528:profile|AgitatedDove14> yeah, I'll try calling task.reset()
before add_step
No, IMO it's better to leave task_overrides
arguments with "." - the same structure as in the dictionary we get from export_data
- this is more intuitive
In principle, I can modify almost everything with task_overrides
, omitting export part, and it's fine. But seems that by exporting I can change more things, for example project_name
base task is in draft status, so when I call import_data
it imports draft status as well, am I right?
We digressed a bit from the original thread topic though 😆 About clone_base_task=False
.
I ended up using task_overrides
for every change, and this way I only need 2 tasks (a base task and a step task, thus I use clone_base_task=True
and it works as expected - yay!)
So, the problem I described in the beginning can be reproduced only this way:
- to have a base task
- export_data - modify - import_data - have a second task
- pass the second task to
add_step
with `cl...
task = Task.import_task(export_data)
pipe.add_step(
name=name,
base_task_id=task.id,
parents=parents,
task_overrides={'script.branch': 'main', 'script.version_num': '', },
execution_queue=pipe_cfg['step_queue'],
cache_executed_step=True,
clone_base_task=False
)
@<1523701205467926528:profile|AgitatedDove14> clearml 1.1.1
Yeah, of course it is in draft mode ( Task.import_task
creates a task in draft mode, it is the lower task on the screenshot)
maybe being able to change 100% of things with task_overrides
would be the most convenient way
SparklingElephant70 in WebUI Execution/SCRIPT PATH
Or maybe there is a log with something more informative which I could check up
I specifically set is as empty with export_data['script']['requirements'] = {}
in order not to reduce overhead during launch. I have everything installed inside the container
It doesn't install anything with pip during launch, I'm assuming it should take everything from the container itself (otherwise there would be a huge overhead). It simply fails trying to import things in the script
File "preprocess.py", line 4, in <module> from easydict import EasyDict as edict ModuleNotFoundError: No module named 'easydict'
When I launch tasks with a pipeline, they keep complaining about missing pip packages. I run it inside a docker container, and I'm sure these packages are present inside it (when I launch the container locally, run python3 and import them, it works like charm). Any ideas how to fix this?
I have a base task for each pipeline step. When I initialize a pipeline, for each step I clone the corresponding task, modify it and add it as a step. Tasks are launched from a pipeline, not cli. I'm absolutely sure docker argument is not empty (I specify it with export_data['container']['image'] = '
http://registry.gitlab.com/cherrylabs/ml/clearml-demo:clearml '
, and it shows on Web UI)