Hi again.
Thanks for the previous replies and links but I haven't been able to find the answer to my question: How do I prevent the content of a uri returned by a task from being saved by clearml at all.
I'm using this simplified snippet (that avoids fastai and large data)
` from clearml.automation.controller import PipelineDecorator
from clearml import TaskTypes
@PipelineDecorator.component(
return_values=["run_datasets_path"], cache=False, task_type=TaskTypes.data_processing
)
def make_dataset(datasets_path, run_id):
from pathlib import Path
run_datasets_path = Path(datasets_path) / run_id
run_datasets_path.mkdir(parents=True, exist_ok=True)
with open(run_datasets_path / 'very_large_data_file.txt', 'w') as fp:
fp.write('large amount of data\n')
return run_datasets_path
@PipelineDecorator.pipeline(
name="test_pipeline",
project="lavi_evaluation",
version="0.2",
)
def fastai_image_classification_pipeline(datasets_path, run_id):
print("make dataset")
run_dataset_path = make_dataset(datasets_path=datasets_path, run_id=run_id)
print(f"ret run_dataset_path: {run_dataset_path}")
print("pipeline complete")
if name == "main":
from pathlib import Path
PipelineDecorator.run_locally()
fastai_image_classification_pipeline("/data/my_datasets_path", 'run_id_1') The contents of
run_datasets_path are zipped and saved to the clearml files server. I want them to go nowhere, not even to some alternative location The return value of my task is modified from the path where files are written by my task to the cache path that clearml uses. I'd like to understand why this happens (and how to avoid it). Also, i'd like to know why caching is applied in spite of the decorator containing
cache=False `Help very much appreciated. I know that in real scenarios data generated within some node would need to go somewhere or it will be deleted but I'd like to see how this can be controlled and done with/without clearml automation.