Hi, I'm having a hard time uploading files as metadata to datasets.
I need to log a dictionary with preserved order, ClearML orders the saved dictionary and there is no control of the user on this behavior. hence, I'm creating a Json file and log it to my dataset as metadata, and that is where I'm having trouble.
my code:
import json
from clearml import Task
from clearml import Dataset
task = Task.current_task()
pipe_dict = {'step_0': 'this is a test example', 'step_1': 'this is a test example', 'step_2': 'this is a test example'}
dataset = Dataset.create(dataset_name='test', dataset_project='test')
# make my own json file (not ordered)
json_pipe = json.dumps(pipe_dict)
with open('pipeline.json', 'w') as f:
f.write(json_pipe)
# let's see the content of the json file
with open('pipeline.json', 'r') as f:
print(f.read())
# link the json file as metadata
dataset.set_metadata('pipeline.json', metadata_name='pipeline')
dataset.upload() # is this a must?
# failing when trying to get the metadata
my_pipe = dataset.get_metadata('pipeline')
print('success')
CLI command:
clearml-task --queue k8s_scheduler --project test --name test --script Scripts/clearml_tests/json_file.py --requirements Scripts/clearml_tests/requirements.txt
Req file:
clearml==1.12.2
boto3
Error (full log is attached):
Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.10/task_repository/CorrAlgo.git/Scripts/clearml_tests/json_file.py", line 32, in <module>
my_pipe = dataset.get_metadata('pipeline')
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/datasets/dataset.py", line 876, in get_metadata
return metadata.get()
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/binding/artifacts.py", line 171, in get
local_file = self.get_local_copy(raise_on_error=True, force_download=force_download)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/binding/artifacts.py", line 240, in get_local_copy
raise ValueError(
ValueError: Could not retrieve a local copy of artifact pipeline, failed downloading
what have i tried so far:
- running the script locally works as expected.
- tried logging
f"{task.cache_dir}/pipeline.json"
instead of only "pipeline.json"
- finalizing the dataset before getting the metadata solves this issue,but i wouldlike to keep the dataset in uploading mode
would love to get some help, I'm pretty stucked here 😞
Thanks!