Hi, A Question About Dataset Storage Suppose I Create A Dataset Like This

Answered

Hi, a question about dataset storage

Suppose I create a dataset like this
dataset = Dataset.create(dataset_name='my_name', dataset_project='my_project')
What I expect: a dataset with the name my_name in the project my_project
What I get: a dataset with the name my_name in the project my_project/.datasets/my_name

What am I doing wrong? Why such a project name?

The client version is 1.9.1. The problem doesn't reproduce on some older versions

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

Votes Newest

Answers 8

Hi MelancholyElk85
So the way datasets now work, is they are actually an entity (folder) inside a project , all under TFW hidden .datasets sub project
This is so all data and tasks are both on the same project , but at the same time will not intersect with subprojects by the same name. Does that make sense?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 I still have name my_name , but the project name my_project/.datasets/my_name rather than my_project/.datasets

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

I still have name

my_name

, but the project name

my_project/.datasets/my_name

rather than

my_project/.datasets

Yes, this is the expected behavior

And I don't see any new projects / subprojects where that dataset creation Task is stored

They are marked "hidden" hence by default you cannot see them in the UI (so they will only appear in the Dataset page),
you can turn the UI hidden flag by going to your settings page and selecting "Configuration" then click on "Show Hidden Projects"

And also to resolve the project name depending on the version of a client

If the client is using Dataset.get(dataset_project="my_project", dataset_name="my_name") with the new clearml SDK it should work

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

And I don't see any new projects / subprojects where that dataset creation Task is stored

Previously I had a separate, manually created project where I stored all newly created datasets for my main project. Very neat

Now the task is visible only in the "All experiment" section, but there is no separate project in the web ui where I could see it...

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

AgitatedDove14 yeah, makes sense, that would require some refactoring in our projects though...

But why is my_name a subproject? Why not just my_project/.datasets ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

Refactoring is to account for the new project names. And also to resolve the project name depending on the version of a client

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

AgitatedDove14 is it expected behavior?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

Why would that require refactoring ? Dataset class should take care if it internally ,no?
The reason my_name is a subproject , is that so every version could be a "Task" inside that project , just easier to manage (or at least that was the idea)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

8 Answers

2 years ago