Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Let'S Say I Have A Project Call Proj1 To Store Datasets With Type "Data Process".. What Is The Best Practice To Get The Latest Datasets ? Example, I Start The First Data (A). Then Using Clearml-Data, I Add Another Dataset (B) As Child To The Previous On

Let's say I have a project call Proj1 to store datasets with Type "Data Process".. What is the best practice to get the latest datasets ?
Example, I start the first data (A). Then using clearml-data, I add another dataset (B) as child to the previous one (A).. But this is the latest. So if i do Task.get_task(project_name='Proj1') , how to get B ?
Currently, I just add a tag to the final dataset ...

  
  
Posted 3 years ago
Votes Newest

Answers 5


HI another qn,
dataset_upload_task = Task.get_task(task_id=args['dataset_task_id'])
iris_pickle = dataset_upload_task.artifacts['dataset'].get_local_copy()
How would I replicate the above for Dataset ? Like how to get the iris_pickle file. I did some hacking likewise below.
ds.get_mutable_local_copy(target_folder='data')
Subesequently, I have to load the file by name also.I wonder whether there is more elegant way

  
  
Posted 3 years ago

Hi DeliciousBluewhale87 ,

You can get the latest dataset by calling Dataset.get :

from clearml import Dataset ds = Dataset.get(dataset_project="dataset-project", dataset_name="dataset-task-name")This will return you the latest dataset from the project

  
  
Posted 3 years ago

yeah, seems good for now.. Tks

  
  
Posted 3 years ago

Hi DeliciousBluewhale87 ,

You can just get a local copy of the dataset with ds.get_local_copy() , this will download the dataset from the dataset task (using cache) and return a path to the downloaded files.

Now, in this path you’ll have all the files that you have in the dataset, you can go over the files in the dataset with ds.list_files() (or ds.list_files()[0] if you have only 1 file) and get the one you want

maybe something like:

ds_path = ds.get_local_copy() iris_pickle_file_name = ds.list_files()[0] iris_pickle_path = os.path.join(ds_path, iris_pickle_file_name)Can this do the trick?

  
  
Posted 3 years ago

Nice, tks

  
  
Posted 3 years ago
644 Views
5 Answers
3 years ago
one year ago
Tags