Hi All! I Trying To Organize My Workflow With Clearml, And I Found Out About Datasets. I Like The Concept And I Wonder If I Can Connect A Dataset To A Task / Experiment? Currently The Dataset Appears As Another Task In The Project Page. Thanks!

Answered

Hi all!
I trying to organize my workflow with ClearML, and I found out about Datasets.
I like the concept and I wonder if I can connect a dataset to a task / experiment? Currently the dataset appears as another task in the project page.
Thanks!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WhoppingMole85
				
					0
					 × 1

Votes Newest

Answers 19

hey WhoppingMole85
Do you want to initiate a task and link it to a dataset, or simply create a dataset ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SweetBadger76
				
					0
					 × 1

Hey David!
I want to initiate a task and link it to an existing dataset 🙂 (The final goal is for me to able to see which dataset each task was run on)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WhoppingMole85
				
					0
					 × 1

You can initiate your task as usual. When some dataset will be used in it - for example it could start by retrieving it using Dataset.get - then the dataset will be registered in the Info section (check in the UI) 😊

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SweetBadger76
				
					0
					 × 1

Alright! So I followed these steps but unfortunately I can't see anything in the info section (neither of the task's nor of the dataset). Am I missing something?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WhoppingMole85
				
					0
					 × 1

can you tell me what your clearml and clearml server versions are please ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SweetBadger76
				
					0
					 × 1

Sure!
WebApp: 1.5.0-192 • Server: 1.5.0-192 • API: 2.18

Client: 1.0.5

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WhoppingMole85
				
					0
					 × 1

We have released a lot of versions since that one 🙂 🙂
Can you please try to upgrade to the lastest clearml (1.6.2) and try again ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SweetBadger76
				
					0
					 × 1

Yes! I actually upgraded this morning and got this version 🤔 Is https://clear.ml/docs/latest/docs/deploying_clearml/upgrade_server_linux_mac the latest tutorial to follow?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WhoppingMole85
				
					0
					 × 1

yes it is 🙂 do you manage to upgrade ?
We also brought a lot of new features in the datasets in 1.6.2 version !

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SweetBadger76
				
					0
					 × 1

I will try to run it again and update 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WhoppingMole85
				
					0
					 × 1

Good morning SweetBadger76 !
I ran the commands from the "how to update page" but I'm still with the same versions... I tried opening the page from incognito to make sure it's not the cache but I still get version 1.5.0

What am I missing?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WhoppingMole85
				
					0
					 × 1

hey WhoppingMole85 good morning !
try to pip it !
pip install clearml -U
and then check with
pip show clearml

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SweetBadger76
				
					0
					 × 1

It worked!
But now I have another question: When I try to get the dataset by the dataset name I get this error:
` In [11]: d = Dataset.get(dataset_name='dataset name')

AttributeError Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 d = Dataset.get(dataset_name='dataset name')

File ~/.local/lib/python3.8/site-packages/clearml/datasets/dataset.py:1534, in Dataset.get(cls, dataset_id, dataset_project, dataset_name, dataset_tags, only_completed, only_published, auto_create, writable_copy, dataset_version, alias, overridable, **kwargs)
1531 return dataset
1533 if not dataset_id:
-> 1534 dataset_id = cls._find_dataset_id(
1535 dataset_project=dataset_project,
1536 dataset_name=dataset_name,
1537 dataset_version=dataset_version,
1538 raise_on_error=False,
1539 dataset_tags=dataset_tags,
1540 dataset_filter=dict(
1541 system_tags=[cls.__tag, "-archived"],
1542 order_by=["-created"],
1543 type=[str(Task.TaskTypes.data_processing)],
1544 page_size=1,
1545 page=0,
1546 status=["published"]
1547 if only_published
1548 else ["published", "completed", "closed"]
1549 if only_completed
1550 else None,
1551 ),
1552 )
1553 if not dataset_id and not auto_create:
1554 raise ValueError(
1555 "Could not find Dataset {} {}".format(
1556 "id" if dataset_id else "project/name/version",
1557 dataset_id if dataset_id else (dataset_project, dataset_name, dataset_version),
1558 )
1559 )

File ~/.local/lib/python3.8/site-packages/clearml/datasets/dataset.py:2941, in Dataset._find_dataset_id(cls, dataset_project, dataset_name, dataset_version, dataset_tags, dataset_filter, raise_on_error)
2939 dataset_filter["search_hidden"] = True
2940 dataset_filter["allow_extra_fields"] = True
-> 2941 hidden_dataset_project, _ = cls._build_hidden_project_name(dataset_project, dataset_name)
2942 tasks = Task.get_tasks(
2943 project_name=hidden_dataset_project,
2944 task_name=exact_match_regex(dataset_name) if dataset_name else None,
2945 tags=dataset_tags,
2946 task_filter=dataset_filter,
2947 )
2948 if not tasks and raise_on_error:

File ~/.local/lib/python3.8/site-packages/clearml/datasets/dataset.py:2978, in Dataset._build_hidden_project_name(cls, dataset_project, dataset_name)
2965 @classmethod
2966 def _build_hidden_project_name(cls, dataset_project, dataset_name):
2967 # type: (str, str) -> Tuple[str, str]
2968 """
2969 Build the corresponding hidden name of a dataset, given its dataset_project
2970 and dataset_name
(...)
2976 is the parent project
2977 """
-> 2978 dataset_project = cls._remove_hidden_part_from_dataset_project(dataset_project)
2979 if bool(Session.check_min_api_server_version(cls.__min_api_version)):
2980 parent_project = "{}.datasets".format(dataset_project + "/" if dataset_project else "")

File ~/.local/lib/python3.8/site-packages/clearml/datasets/dataset.py:2998, in Dataset._remove_hidden_part_from_dataset_project(cls, dataset_project)
2987 @classmethod
2988 def _remove_hidden_part_from_dataset_project(cls, dataset_project):
2989 # type: (str, str) -> str
2990 """
2991 The project name contains the '.datasets' part, as well as the dataset_name.
2992 Remove those parts and return the project used when creating the dataset.
(...)
2996 :return: The project name without the '.datasets' part
2997 """
-> 2998 return dataset_project.partition("/.datasets/")[0]

AttributeError: 'NoneType' object has no attribute 'partition' `It works well with the id, but not with the name. Any ideas?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WhoppingMole85
				
					0
					 × 1

hey
You have 2 options to retrieve a dataset : by its id or by the project_name AND dataset_name - those ones are working together, you need to pass both of them !

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SweetBadger76
				
					0
					 × 1

Amazing! Thanks so much!
Perhaps it's worth mentioning in https://clear.ml/docs/latest/docs/clearml_data/clearml_data_sdk#accessing-datasets page 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WhoppingMole85
				
					0
					 × 1

yes it could worth it, i will submit, thanks. This is the same for Task.get_task() : either id or project_name/task_name
🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SweetBadger76
				
					0
					 × 1

Another question - when trying to upload an external file (s3) I get an error because I have no key and secret config'd. How can I config them?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WhoppingMole85
				
					0
					 × 1

found it! 🙂

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					WhoppingMole85
				
					0
					 × 1

great ! 😉

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SweetBadger76
				
					0
					 × 1

Write your answer

1K Views

19 Answers

2 years ago

one year ago

Answers 19

It worked!But now I have another question: When I try to get the dataset by the dataset name I get this error:` In [11]: d = Dataset.get(dataset_name='dataset name')

It worked!
But now I have another question: When I try to get the dataset by the dataset name I get this error:
` In [11]: d = Dataset.get(dataset_name='dataset name')