Hey, I Have Pipeline From Code, But Have Problem With Caching. Actually Clearml Didn'T Cache Already Executed Steps (Tried To Re-Run Pipeline From Web-Ui). Did I Miss Something?

Answered

Hey, I have pipeline from code, but have problem with caching. Actually clearml didn't cache already executed steps (tried to re-run pipeline from web-ui). Did I miss something?

pipe.add_function_step(
    name='download_data',
    function=download_data_step,
    function_kwargs=dict(
        dataset_version='${pipeline.dataset_version}',
        file_name='${pipeline.file_name}',
    ),
    function_return=['data_path'],
    cache_executed_step=True,
    repo=repo,
    repo_branch=repo_branch,
    working_dir=working_dir
)

Also, I have problem with pipeline execution on local machine, problem with imports. Looks like clearml do not know for imports of tasks (in my example task1.py imports utils.some_utils_functions.py). If I add absolute path of utils to sys.path it works, but I do not like this solution. The problem has gone if I execute pipeline on remote machine when I add git repo, branch and working dir as arguments to add_function_step function, but didn't manage to solve on local machine. Here is my dir structure:

tasks

utils- some_utils_functions.py- task1.py
task2.pypipeline_controller.py

pipe.start_locally(run_pipeline_steps_locally=True)

  				
Posted 
	4 months ago

					More
				  		
  Report
		
					YummyGrasshopper29
				
					0
					 × 1

Votes Newest

Answers 3

Thanks a lot! I do not have problem executing the pipeline remotely, I have problam executing it locally.

  				
Posted 
	4 months ago

					More
				  		
  Report
		
					YummyGrasshopper29
				
					0
					 × 1

Hi @<1702492411105644544:profile|YummyGrasshopper29> ! To enable caching while using a repo , you also need to specify a commit (as the repo might change which would invalidate the caching). We will add a warning regarding this in the near future.
Regarding the imports: we are aware that there are some problems when executing the pipeline remotely as described. At the moment, appending to sys.path is one of the only solutions (other than making utils a package on your local machine so it can be imported from anywhere). We will look into this as well asap

  				
Posted 
	4 months ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

@<1702492411105644544:profile|YummyGrasshopper29> you could try adding the directory you are starting the pipeline with to the python path. then you would run the pipeline like this:

 PYTHONPATH="${PYTHONPATH}:/path/to/pipeline_dir" python my_pipeline.py

  				
Posted 
	4 months ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

Write your answer

373 Views

3 Answers

4 months ago