Dear Community, I Have Tried To Use

Answered

Dear community,
I have tried to use clearml-data sync to update a previously created clearml dataset with a folder's content. There had been no change since the time I created the dataset, so this was the output :

clearml-data - Dataset Management & Versioning CLI
Syncing dataset id <ID> to local folder .
Generating SHA2 hash for x files
100%|█████████████████████████████████████████| x/x [x<x,  xit/s]
Hash generation completed
Sync completed: 0 files removed, 0 added, 0 modified
Finalizing dataset
Pending uploads, starting dataset upload to


File compression and upload completed: total size 0 B, 0 chunk(s) stored (average size 0 B)
clearml - INFO - No pending files, skipping upload.
clearml.Task - ERROR - Action failed <400/110: tasks.add_or_update_artifacts/v2.10 (Invalid task status: expected=created, status=completed)> (task=<ID>, artifacts=[{'key': 'state', 'type': 'dict', 'uri': '

<PROJECT_NAME>/.datasets/DATASET_NAME/DATASET_FOLDER/artifacts/state/state.json', 'content_size': xxx, 'hash': 'xxxx', 'timestamp': xxx, 'type_data': {'preview': 'Dataset state\nFiles added/modified: x - total size x MB\nCurrent dependency graph: {\n  "xxx": []\n}\n', 'content_type': 'application/json'}, 'display_data': [('files added', 'x'), ('files modified', '0'), ('files removed', '0')]}], force=True)

Is it normal that it is crashing ? Did I do anything wrong or is it related to the fact that there was no change ?
Thanks in advance !

  				
Posted 
	one year ago

					More  		
  Report
		
					GracefulCoral77
				
					0
					 × 1

Votes Newest

Answers 4

Hi GracefulCoral77 ! The error is a bit misleading. What it actually means is that you shouldn't attempt to modify a finalized clearml dataset (I suppose that is what you are trying to achieve). Instead, you should create a new dataset that inherits from the finalized one and sync that dataset, or leave the dataset in an unfinalized state

  				
Posted 
	one year ago

					More  		
  Report
		
					SmugDolphin23
				
					0

Thanks for your answers !

  				
Posted 
	one year ago

					More  		
  Report
		
					GracefulCoral77
				
					0
					 × 1

Hi SmugDolphin23 , and thank you for your prompt response.

For my understanding, what is the intended workflow if I intend to keep the same dataset (which should therefore have the same name as it has in the past, and everything should be similar), but generate a new version of that dataset ? Is this what a child dataset is meant to be, or does it mean that I should not have finalised my dataset to begin with ? If the latter, when am I supposed to know when I can finalise a dataset ?

I am particularly puzzled because, according to the documentation of clearml-data sync , "This option is useful in case a user has a single point of truth (i.e. a folder) which updates from time to time", which to me means that I can use this regularly when I update my "truth folder", but the documentation also states "This command also uploads the data and finalizes the dataset automatically.", which means that then I can no longer use this command. Did I misunderstand something ?

Thank you in advance for your support !

  				
Posted 
	one year ago

					More  		
  Report
		
					GracefulCoral77
				
					0
					 × 1

GracefulCoral77 You can both create a child or keep the same dataset as long as it is not finalized.
You can skip the finalization using the --skip-close argument. Anyhow, I can see why the current workflow is confusing. I will discuss it with the team, maybe we should allow syncing unfinalized datasets as well.

  				
Posted 
	one year ago

					More  		
  Report
		
					SmugDolphin23
				
					0

Write your answer

959 Views

4 Answers

one year ago