Well actually I have tried a different approach and it works.
` task = Task.init(project_name=args['cml_project_name'],
task_type=TaskTypes.data_processing,
task_name=f'Dataset for {os.path.basename(OBJECT_NAME)}',
tags=args['cml_tags'].split(','),
output_uri = args['cml_output_uri'],
auto_connect_frameworks=True)
dataset = Dataset.create(
dataset_name=os.path.basename(OBJECT_NAME),
dataset_tags=args['cml_tags'].split(','),
dataset_project=args['cml_project_name'])
dataset.add_files(DATA_RAW, verbose=True)
# upload data to s3
dataset.upload(output_url=args['cml_output_uri'])
dataset.finalize(verbose=True)
task.close() `With this approach there is a master Task and a separate task for dataset creation. But cloning and sending master task to agent works just fine.
GentleSwallow91 , you can also use Task.create()
https://clear.ml/docs/latest/docs/references/sdk/task#taskcreate
You could probably either:
Start the task first (using Task.init
), and then set the parameters if needed Attach the dataset to the task itself