Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I'M A Bit Confused. It Seems Like Something Has Changed With How Clearml Handles Recording Datasets In Tasks. It Used To Be The Case That When I Would Create A Dataset Under A Task, Clearml Would Record The Id Of The Dataset In The Hyperparameters/Datase

I'm a bit confused. It seems like something has changed with how ClearML handles recording datasets in tasks.

It used to be the case that when I would create a dataset under a task, ClearML would record the ID of the dataset in the Hyperparameters/Datasets section automatically (see first attached image). I'd also get all the relevant datasets under the info section. Now I don't get either (see second image), instead I get a "General" section in hyperparameters that tells me how many files were changed but nothing about the ID of the dataset. From what I can tell now there's nowhere the task is storing the dataset ID so I have no means to track the dataset with the task.

I did update from 1.11.0 to 1.11.1 but I had the problem even when I reverted back to 1.11.0. I'm using ClearML using the web client, I'm not self hosting or using a SAAS.

I'll reply with a snippet with the code I'm using to construct the dataset. It's a bit of a long process because I do a series of things. Namely: check to see if a dataset exists with that name already, compare the local data to the remote data, and if there's a change, I upload the new dataset as a child of the last version.
image
image

  
  
Posted one year ago
Votes Newest

Answers 7


PRed: None

  
  
Posted one year ago

@<1545216070686609408:profile|EnthusiasticCow4> a PR would be greatly appreciated. If the problem lies in _query_tasks then it should be addressed there

  
  
Posted one year ago

Alright, I'll try and put that together for Monday.

  
  
Posted one year ago

I see. Thanks for the insight. That seems to be the case. I'm struggling a bit with datasets. For example, if I wanted to trace the genealogy of a dataset that's used by traditional tasks and pipelines. I'll try and write something up about the challenges around that when I get the chance. But your comment revealed another issue:

It appears that the partial name matching isn't going well. I'm unclear why this wouldn't be matching. In the attached photo you can see the input for partial_name is '[LTV] Dataset Test'
and you can see from the unfiltered search there are many datasets titled identically. Yet, with that search criteria I get 0 results. One would assume that a partial match would include perfect matches?
image

  
  
Posted one year ago

Yes, it indeed appears to be a regex issue. If I run:

Dataset.list_datasets(
                dataset_project=self.task.get_project_name(),
                partial_name=re.escape('[LTV] Dataset Test'),
                only_completed=True,
            )

It works as expected. I'm not sure how raw you want to leave the partial_name features. I could create a PR to fix this but would you want me to re.escape at the list_datasets() level? Or go deeper and do it at Task._query_tasks() level?

  
  
Posted one year ago

The plot thickens. It seems like there's something odd going on with the interaction between [LTV] and additional text. If I just search [LTV] it works, if I just search Dataset Test it works, but if I put them together it breaks the search. Now that I think about it, there's other oddities that seem to happen in the web interface that might be explained by some bugs around using brackets in names.

  
  
Posted one year ago

Hi @<1545216070686609408:profile|EnthusiasticCow4> ! Note that the Datasets section is created only if you get the dataset with an alias? are you sure that number_of_datasets_on_remote != 0 ?
If so, can you provide a short snippet that would help us reproduce? The code you posted looks fine to me, not sure what the problem could be.

  
  
Posted one year ago
1K Views
7 Answers
one year ago
one year ago
Tags
Similar posts