I mean that I have a script for data preprocessing task where I need the following dependencies:
` import sys
from pathlib import Path
from contextlib import contextmanager
import numpy as np
from clearml import Task
with add_temporary_module_search_path("/home/user/myclearML/"):
from helpers import (
read_netcdf_dataset,
write_records,
) However, the
xarray package is a dependency of the
helpers module which is required by the
read_netcdf_dataset function. Since
helpers ` is a custom module that is imported into the preprocessing task script, clearML is unable to detect it as a dependency and, therefore, does not install it in the environment it creates for the preprocessing task.
That is the reason why I add the Task.add_requirements
part to indicate to the agent that I will need those dependencies. The problem is that the agent reinstalls all the requirements for the next task (the training task), even though both tasks share the same environment. So my question is whether there is a way to tell the PipelineController
to only generate the packages environment once and use it in the training task as well.