Actually, it already does not work if my tasks themselves have imports on modules which I wrote myself. I.e., let's say I have the following files in my repo:
pipeline.py
task1.py
task2.py
utils.py
and pipeline.py
has imports on task1 and task2, and e.g. task1 does import utils
. Then I would get an error on the remote agent ModuleNotFoundError
.
Hi @<1724960468822396928:profile|CumbersomeSealion22> , what was the structure that worked previously for you and what is the new structure?
Hi @<1724960468822396928:profile|CumbersomeSealion22>
As soon as I refactor my project into multiple folders, where on top-level I put my pipeline file, and keep my tasks in a subfolder, the clearml agent seems to have problems:
Notice that you need to specify the git repo for each component. If you have a process (step) with more than a single file, you have to have those files inside a git repository, otherwise the agent will not be able to bring them to the remote machine
Yes, I do have my files in the git repo. Although I have not quite understood which part it takes from the remote git repo, and which part it takes from my local system.
It seems that one also needs to explicitly hand in the git repo in the pipeline and task definitions via PipelineController, since otherwise, the agent would start the task process in some random working directory, where, of course, it cannot find any own module.
And a second thing: When starting a pipeline, it seems that ClearML would take my local virtual environment and extract its dependencies from that, independent of the requirements.txt or pyproject.toml I have in my repo. I noticed that, because I happened to installed my own project locally into my virtual env (let's call it foo==0.1), and when I started the pipeline, on the agent it looked for a package called foo==0.1 which it didn't find on PyPI (obviously) and thus the pipeline failed. Can I tell ClearML to please not take my local virtual env, but rather install what it needs directly only from the req or .toml files?
And a third thing: The pipeline installs dependencies. Also, the tasks install dependencies, too. How do I avoid redundant dependency installations?
Yes, I do have my files in the git repo. Although I have not quite understood which part it takes from the remote git repo, and which part it takes from my local system.
it will do "git pull" on the remote machine and then apply any uncommitted changes it has stored in the Task
It seems that one also needs to explicitly hand in the git repo in the pipeline and task definitions via PipelineController,
Correct, unless the pipeline logic and the steps are the same git repo, you can verify that if you click on the detials of each step and check what is listed under the repo section in the execution tab
And a second thing: When starting a pipeline, it seems that ClearML would take my local virtual environment and extract its dependencies from that, independent of the requirements.txt
Correct, if you want to disable this behaviour set pass " packages=False
" to the decorator and the agent will default to the packages in your git repo