Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All! I Trying To Organize My Workflow With Clearml, And I Found Out About Datasets. I Like The Concept And I Wonder If I Can Connect A Dataset To A Task / Experiment? Currently The Dataset Appears As Another Task In The Project Page. Thanks!

Hi all!
I trying to organize my workflow with ClearML, and I found out about Datasets.
I like the concept and I wonder if I can connect a dataset to a task / experiment? Currently the dataset appears as another task in the project page.
Thanks!

  
  
Posted one year ago
Votes Newest

Answers 19


can you tell me what your clearml and clearml server versions are please ?

  
  
Posted one year ago

found it! 🙂

  
  
Posted one year ago

It worked!
But now I have another question: When I try to get the dataset by the dataset name I get this error:
` In [11]: d = Dataset.get(dataset_name='dataset name')

AttributeError Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 d = Dataset.get(dataset_name='dataset name')

File ~/.local/lib/python3.8/site-packages/clearml/datasets/dataset.py:1534, in Dataset.get(cls, dataset_id, dataset_project, dataset_name, dataset_tags, only_completed, only_published, auto_create, writable_copy, dataset_version, alias, overridable, **kwargs)
1531 return dataset
1533 if not dataset_id:
-> 1534 dataset_id = cls._find_dataset_id(
1535 dataset_project=dataset_project,
1536 dataset_name=dataset_name,
1537 dataset_version=dataset_version,
1538 raise_on_error=False,
1539 dataset_tags=dataset_tags,
1540 dataset_filter=dict(
1541 system_tags=[cls.__tag, "-archived"],
1542 order_by=["-created"],
1543 type=[str(Task.TaskTypes.data_processing)],
1544 page_size=1,
1545 page=0,
1546 status=["published"]
1547 if only_published
1548 else ["published", "completed", "closed"]
1549 if only_completed
1550 else None,
1551 ),
1552 )
1553 if not dataset_id and not auto_create:
1554 raise ValueError(
1555 "Could not find Dataset {} {}".format(
1556 "id" if dataset_id else "project/name/version",
1557 dataset_id if dataset_id else (dataset_project, dataset_name, dataset_version),
1558 )
1559 )

File ~/.local/lib/python3.8/site-packages/clearml/datasets/dataset.py:2941, in Dataset._find_dataset_id(cls, dataset_project, dataset_name, dataset_version, dataset_tags, dataset_filter, raise_on_error)
2939 dataset_filter["search_hidden"] = True
2940 dataset_filter["allow_extra_fields"] = True
-> 2941 hidden_dataset_project, _ = cls._build_hidden_project_name(dataset_project, dataset_name)
2942 tasks = Task.get_tasks(
2943 project_name=hidden_dataset_project,
2944 task_name=exact_match_regex(dataset_name) if dataset_name else None,
2945 tags=dataset_tags,
2946 task_filter=dataset_filter,
2947 )
2948 if not tasks and raise_on_error:

File ~/.local/lib/python3.8/site-packages/clearml/datasets/dataset.py:2978, in Dataset._build_hidden_project_name(cls, dataset_project, dataset_name)
2965 @classmethod
2966 def _build_hidden_project_name(cls, dataset_project, dataset_name):
2967 # type: (str, str) -> Tuple[str, str]
2968 """
2969 Build the corresponding hidden name of a dataset, given its dataset_project
2970 and dataset_name
(...)
2976 is the parent project
2977 """
-> 2978 dataset_project = cls._remove_hidden_part_from_dataset_project(dataset_project)
2979 if bool(Session.check_min_api_server_version(cls.__min_api_version)):
2980 parent_project = "{}.datasets".format(dataset_project + "/" if dataset_project else "")

File ~/.local/lib/python3.8/site-packages/clearml/datasets/dataset.py:2998, in Dataset._remove_hidden_part_from_dataset_project(cls, dataset_project)
2987 @classmethod
2988 def _remove_hidden_part_from_dataset_project(cls, dataset_project):
2989 # type: (str, str) -> str
2990 """
2991 The project name contains the '.datasets' part, as well as the dataset_name.
2992 Remove those parts and return the project used when creating the dataset.
(...)
2996 :return: The project name without the '.datasets' part
2997 """
-> 2998 return dataset_project.partition("/.datasets/")[0]

AttributeError: 'NoneType' object has no attribute 'partition' `It works well with the id, but not with the name. Any ideas?

  
  
Posted one year ago

You can initiate your task as usual. When some dataset will be used in it - for example it could start by retrieving it using Dataset.get - then the dataset will be registered in the Info section (check in the UI) 😊

  
  
Posted one year ago

hey
You have 2 options to retrieve a dataset : by its id or by the project_name AND dataset_name - those ones are working together, you need to pass both of them !

  
  
Posted one year ago

Another question - when trying to upload an external file (s3) I get an error because I have no key and secret config'd. How can I config them?

  
  
Posted one year ago

yes it is 🙂 do you manage to upgrade ?
We also brought a lot of new features in the datasets in 1.6.2 version !

  
  
Posted one year ago

hey WhoppingMole85
Do you want to initiate a task and link it to a dataset, or simply create a dataset ?

  
  
Posted one year ago

Sure!
WebApp: 1.5.0-192 • Server: 1.5.0-192 • API: 2.18

Client: 1.0.5

  
  
Posted one year ago

great ! 😉

  
  
Posted one year ago

Yes! I actually upgraded this morning and got this version 🤔 Is https://clear.ml/docs/latest/docs/deploying_clearml/upgrade_server_linux_mac the latest tutorial to follow?

  
  
Posted one year ago

Hey David!
I want to initiate a task and link it to an existing dataset 🙂 (The final goal is for me to able to see which dataset each task was run on)

  
  
Posted one year ago

Amazing! Thanks so much!
Perhaps it's worth mentioning in https://clear.ml/docs/latest/docs/clearml_data/clearml_data_sdk#accessing-datasets page 🙂

  
  
Posted one year ago

We have released a lot of versions since that one 🙂 🙂
Can you please try to upgrade to the lastest clearml (1.6.2) and try again ?

  
  
Posted one year ago

yes it could worth it, i will submit, thanks. This is the same for Task.get_task() : either id or project_name/task_name
🙂

  
  
Posted one year ago

hey WhoppingMole85 good morning !
try to pip it !
pip install clearml -U
and then check with
pip show clearml

  
  
Posted one year ago

I will try to run it again and update 🙂

  
  
Posted one year ago

Good morning SweetBadger76 !
I ran the commands from the "how to update page" but I'm still with the same versions... I tried opening the page from incognito to make sure it's not the cache but I still get version 1.5.0

What am I missing?

  
  
Posted one year ago

Alright! So I followed these steps but unfortunately I can't see anything in the info section (neither of the task's nor of the dataset). Am I missing something?

  
  
Posted one year ago
621 Views
19 Answers
one year ago
one year ago
Tags