I'M New To Using Datasets, If My Git Project Root Is

Answered

I'm new to using datasets, if my git project root is myProject and I expect file.json to be at the root level, how do I accomplish this?

  				
Posted 
	2 years ago

					More  		
  Report
		
					BoredHedgehog47
				
					0
					 × 1

Votes Newest

Answers 18

ok, but if you were to run it from a different machine (or a different user!) it wouldn’t work

  				
Posted 
	2 years ago

					More  		
  Report
		
					AbruptCow41
				
					0
					 × 1

Or is there an easier way?

  				
Posted 
	2 years ago

					More  		
  Report
		
					BoredHedgehog47
				
					0
					 × 1

ok good to know

  				
Posted 
	2 years ago

					More  		
  Report
		
					BoredHedgehog47
				
					0
					 × 1

Is not direcly cached in the ~/.clearml folder. There are some directories inside (one for storage, one for pip, another for venvs, etc.

So in your case it would be stored in ~/.clearml/cache/storage_manager/datasets/ds_{ds_id}/my_file.json

  				
Posted 
	2 years ago

					More  		
  Report
		
					AbruptCow41
				
					0
					 × 1

you would, but I’d advise against it, since that is not the intended way

  				
Posted 
	2 years ago

					More  		
  Report
		
					AbruptCow41
				
					0
					 × 1

After proving we can run our training, I would then advise we update our code base

  				
Posted 
	2 years ago

					More  		
  Report
		
					BoredHedgehog47
				
					0
					 × 1

Thanks!

  				
Posted 
	2 years ago

					More  		
  Report
		
					BoredHedgehog47
				
					0
					 × 1

ClearML downloads/caches datasets to ~/.clearml/ folder so yes, you need to modify your code.
dataset_folder = Dataset.get(project_name=, dataset_name=, version=).get_local_copy() file_json_path = os.path.join(dataset_folder, 'file.json')

  				
Posted 
	2 years ago

					More  		
  Report
		
					AbruptCow41
				
					0
					 × 1

Could I simply just reference the files by name and pass in a string such as ~/.clearml/my_file.json

  				
Posted 
	2 years ago

					More  		
  Report
		
					BoredHedgehog47
				
					0
					 × 1

You can save it as a dataset and then fetch it during run time, or am i missing something?

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

I’m afaid I don’t think there is a way to go around this without modifying your code.

  				
Posted 
	2 years ago

					More  		
  Report
		
					AbruptCow41
				
					0
					 × 1

so it caches to ~/.clearml/ any files that are under the same project name?

  				
Posted 
	2 years ago

					More  		
  Report
		
					BoredHedgehog47
				
					0
					 × 1

do I have to fetch it via code? I was hoping to not modify my scripts

  				
Posted 
	2 years ago

					More  		
  Report
		
					BoredHedgehog47
				
					0
					 × 1

Sure. My git repo myProject.git does not have file.json checked into VCS. I'd like to add this file at experiment runtime or equivalent.

  				
Posted 
	2 years ago

					More  		
  Report
		
					BoredHedgehog47
				
					0
					 × 1

Can you please elaborate on what you mean?

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

I wouldn't be able to pass in ~/.clearml/cache/storage_manager/datasets/ds_{ds_id}/my_file.json as an argument?

  				
Posted 
	2 years ago

					More  		
  Report
		
					BoredHedgehog47
				
					0
					 × 1

This would be a short term solution as we build a proof of concept

  				
Posted 
	2 years ago

					More  		
  Report
		
					BoredHedgehog47
				
					0
					 × 1

I assumed I would need to upload it and then reference it somehow?

  				
Posted 
	2 years ago

					More  		
  Report
		
					BoredHedgehog47
				
					0
					 × 1

Write your answer

1K Views

18 Answers

2 years ago