Hi, I'M Having A Hard Time Uploading Files As Metadata To Datasets. I Need To Log A Dictionary With Preserved Order, Clearml Orders The Saved Dictionary And There Is No Control Of The User On This Behavior. Hence, I'M Creating A Json File And Log It To My

Answered

Hi, I'm having a hard time uploading files as metadata to datasets.
I need to log a dictionary with preserved order, ClearML orders the saved dictionary and there is no control of the user on this behavior. hence, I'm creating a Json file and log it to my dataset as metadata, and that is where I'm having trouble.

my code:

import json

from clearml import Task
from clearml import Dataset

task = Task.current_task()

pipe_dict = {'step_0': 'this is a test example', 'step_1': 'this is a test example', 'step_2': 'this is a test example'}
dataset = Dataset.create(dataset_name='test', dataset_project='test')

# make my own json file (not ordered)
json_pipe = json.dumps(pipe_dict)
with open('pipeline.json', 'w') as f:
    f.write(json_pipe)

# let's see the content of the json file
with open('pipeline.json', 'r') as f:
    print(f.read())

# link the json file as metadata
dataset.set_metadata('pipeline.json', metadata_name='pipeline')
dataset.upload()  # is this a must?

# failing when trying to get the metadata
my_pipe = dataset.get_metadata('pipeline')
print('success')

CLI command:

clearml-task --queue k8s_scheduler --project test --name test --script Scripts/clearml_tests/json_file.py --requirements Scripts/clearml_tests/requirements.txt

Req file:

clearml==1.12.2
boto3

Error (full log is attached):

Traceback (most recent call last):
  File "/root/.clearml/venvs-builds/3.10/task_repository/CorrAlgo.git/Scripts/clearml_tests/json_file.py", line 32, in <module>
    my_pipe = dataset.get_metadata('pipeline')
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/datasets/dataset.py", line 876, in get_metadata
    return metadata.get()
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/binding/artifacts.py", line 171, in get
    local_file = self.get_local_copy(raise_on_error=True, force_download=force_download)
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/binding/artifacts.py", line 240, in get_local_copy
    raise ValueError(
ValueError: Could not retrieve a local copy of artifact pipeline, failed downloading

what have i tried so far:

running the script locally works as expected.
tried logging f"{task.cache_dir}/pipeline.json" instead of only "pipeline.json"
finalizing the dataset before getting the metadata solves this issue,but i wouldlike to keep the dataset in uploading mode
would love to get some help, I'm pretty stucked here 😞
Thanks!

  				
Posted 
	one year ago

					More  		
  Report
		
					DangerousBee35
				
					0
					 × 1

Votes Newest

Answers 3

DangerousBee35 this might simply be a sync issue - the upload is done in the background, it's possible you simply did it too quickly - try adding some sleep after setting the metadata and check than

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

censored aws credentials

  				
Posted 
	one year ago

					More  		
  Report
		
					DangerousBee35
				
					0
					 × 1

It worked, thanks! i spent a few hours trying to figure it out 😅

  				
Posted 
	one year ago

					More  		
  Report
		
					DangerousBee35
				
					0
					 × 1

Write your answer

1K Views

3 Answers

one year ago