hey WhoppingMole85
Do you want to initiate a task and link it to a dataset, or simply create a dataset ?
Hey David!
I want to initiate a task and link it to an existing dataset 🙂 (The final goal is for me to able to see which dataset each task was run on)
You can initiate your task as usual. When some dataset will be used in it - for example it could start by retrieving it using Dataset.get - then the dataset will be registered in the Info section (check in the UI) 😊
Alright! So I followed these steps but unfortunately I can't see anything in the info section (neither of the task's nor of the dataset). Am I missing something?
can you tell me what your clearml and clearml server versions are please ?
Sure!WebApp: 1.5.0-192 • Server: 1.5.0-192 • API: 2.18
Client: 1.0.5
We have released a lot of versions since that one 🙂 🙂
Can you please try to upgrade to the lastest clearml (1.6.2) and try again ?
Yes! I actually upgraded this morning and got this version 🤔 Is https://clear.ml/docs/latest/docs/deploying_clearml/upgrade_server_linux_mac the latest tutorial to follow?
yes it is 🙂 do you manage to upgrade ?
We also brought a lot of new features in the datasets in 1.6.2 version !
I will try to run it again and update 🙂
Good morning SweetBadger76 !
I ran the commands from the "how to update page" but I'm still with the same versions... I tried opening the page from incognito to make sure it's not the cache but I still get version 1.5.0
What am I missing?
hey WhoppingMole85 good morning !
try to pip it !pip install clearml -U
and then check withpip show clearml
It worked!
But now I have another question: When I try to get the dataset by the dataset name I get this error:
` In [11]: d = Dataset.get(dataset_name='dataset name')
AttributeError Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 d = Dataset.get(dataset_name='dataset name')
File ~/.local/lib/python3.8/site-packages/clearml/datasets/dataset.py:1534, in Dataset.get(cls, dataset_id, dataset_project, dataset_name, dataset_tags, only_completed, only_published, auto_create, writable_copy, dataset_version, alias, overridable, **kwargs)
1531 return dataset
1533 if not dataset_id:
-> 1534 dataset_id = cls._find_dataset_id(
1535 dataset_project=dataset_project,
1536 dataset_name=dataset_name,
1537 dataset_version=dataset_version,
1538 raise_on_error=False,
1539 dataset_tags=dataset_tags,
1540 dataset_filter=dict(
1541 system_tags=[cls.__tag, "-archived"],
1542 order_by=["-created"],
1543 type=[str(Task.TaskTypes.data_processing)],
1544 page_size=1,
1545 page=0,
1546 status=["published"]
1547 if only_published
1548 else ["published", "completed", "closed"]
1549 if only_completed
1550 else None,
1551 ),
1552 )
1553 if not dataset_id and not auto_create:
1554 raise ValueError(
1555 "Could not find Dataset {} {}".format(
1556 "id" if dataset_id else "project/name/version",
1557 dataset_id if dataset_id else (dataset_project, dataset_name, dataset_version),
1558 )
1559 )
File ~/.local/lib/python3.8/site-packages/clearml/datasets/dataset.py:2941, in Dataset._find_dataset_id(cls, dataset_project, dataset_name, dataset_version, dataset_tags, dataset_filter, raise_on_error)
2939 dataset_filter["search_hidden"] = True
2940 dataset_filter["allow_extra_fields"] = True
-> 2941 hidden_dataset_project, _ = cls._build_hidden_project_name(dataset_project, dataset_name)
2942 tasks = Task.get_tasks(
2943 project_name=hidden_dataset_project,
2944 task_name=exact_match_regex(dataset_name) if dataset_name else None,
2945 tags=dataset_tags,
2946 task_filter=dataset_filter,
2947 )
2948 if not tasks and raise_on_error:
File ~/.local/lib/python3.8/site-packages/clearml/datasets/dataset.py:2978, in Dataset._build_hidden_project_name(cls, dataset_project, dataset_name)
2965 @classmethod
2966 def _build_hidden_project_name(cls, dataset_project, dataset_name):
2967 # type: (str, str) -> Tuple[str, str]
2968 """
2969 Build the corresponding hidden name of a dataset, given its dataset_project
2970 and dataset_name
(...)
2976 is the parent project
2977 """
-> 2978 dataset_project = cls._remove_hidden_part_from_dataset_project(dataset_project)
2979 if bool(Session.check_min_api_server_version(cls.__min_api_version)):
2980 parent_project = "{}.datasets".format(dataset_project + "/" if dataset_project else "")
File ~/.local/lib/python3.8/site-packages/clearml/datasets/dataset.py:2998, in Dataset._remove_hidden_part_from_dataset_project(cls, dataset_project)
2987 @classmethod
2988 def _remove_hidden_part_from_dataset_project(cls, dataset_project):
2989 # type: (str, str) -> str
2990 """
2991 The project name contains the '.datasets' part, as well as the dataset_name.
2992 Remove those parts and return the project used when creating the dataset.
(...)
2996 :return: The project name without the '.datasets' part
2997 """
-> 2998 return dataset_project.partition("/.datasets/")[0]
AttributeError: 'NoneType' object has no attribute 'partition' `It works well with the id, but not with the name. Any ideas?
hey
You have 2 options to retrieve a dataset : by its id or by the project_name AND dataset_name - those ones are working together, you need to pass both of them !
Amazing! Thanks so much!
Perhaps it's worth mentioning in https://clear.ml/docs/latest/docs/clearml_data/clearml_data_sdk#accessing-datasets page 🙂
yes it could worth it, i will submit, thanks. This is the same for Task.get_task() : either id or project_name/task_name
🙂
Another question - when trying to upload an external file (s3) I get an error because I have no key and secret config'd. How can I config them?