Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Bug?

Bug?
dataset name is ignored if use_current_task=True

  
  
Posted one year ago
Votes Newest

Answers 14


I have a task where I create a dataset but I also create a set of matplotlib figures, some numeric statistics and a pandas table that describe the data which I wish to have associated with the dataset and vieawable from the clearml web page for the dataset.

Oh sure, use https://clear.ml/docs/latest/docs/references/sdk/dataset#get_logger they will be visible on the Dataset page on the version in question

  
  
Posted one year ago

I just think that the create function should expect

dataset_name

to be None in the case of

use_current_task=True

(or allow the dataset name to differ from the task name)

I think you are correct, at least we should output a warning that it is ignored ... I'll make sure we do 🙂

  
  
Posted one year ago

Hi PanickyMoth78

dataset name is ignored if

use_current_task=True

Kind of, it stores the Dataset on the Task itself (then dataset.name becomes the Task name), actually we should probably deprecate this feature, I think this is too confusing?!
What was the use case for using it ?

  
  
Posted one year ago

I don't mind assigning to the task the same name that I'd assign to the dataset. I just think that the create function should expect dataset_name to be None in the case of use_current_task=True (or allow the dataset name to differ from the task name)

  
  
Posted one year ago

Yep the automagic only kick in with Task.init... The main difference and the advantage of using a Dataset object is the underlying Task resides in a specific structure that is used when searching based on project/name/version, but other than that, it should just work

  
  
Posted one year ago

I have a task where I create a dataset but I also create a set of matplotlib figures, some numeric statistics and a pandas table that describe the data which I wish to have associated with the dataset and vieawable from the clearml web page for the dataset.

  
  
Posted one year ago

Hmm interesting...
of course you can do:
dataset._task.connect(...)But maybe it should be public?!

How are you using that (I mean in the context of a Dataset)?

  
  
Posted one year ago

I was doing it with the task that I had been using. Mostly for logging arguments that control what the dataset will contain.

  
  
Posted one year ago

Yeah. I was only using the task for the process of creating the dataset.

My code does start out with a step that checks for the existence of the dataset, returning it if it exists (search by project name/dataset name/version) rather than recreating it.
I noticed the name mismatch when that check kept failing me...

I think that init-ing the encompassing task with the relevant dataset name still allows me to search for the dataset by dataset_name=task_name / project_name (shared by both dataset and task) / dataset_version.

So I guess I'll switch back to initiating a task (with the dataset name as the task name) and setting the use_current_task=True in dataset create().

Does that alleviate the concern around:

The main difference and the advantage of using a Dataset object is the underlying Task resides in a specific structure that is used when searching based on project/name/version,

?

  
  
Posted one year ago

here is what I do:
` try:
dataset = Dataset.get(
dataset_project=bucket_name,
dataset_name=dataset_name,
dataset_version=dataset_version,
)
print(
f"dataset found {dataset.project}/{dataset.name} v{dataset.version}\n(id: {dataset.id})"
)
return dataset
except ValueError:
pass

task = Task.current_task()
if task is None:
    task = Task.init(
        project_name=bucket_name, task_name=dataset_name
    )
dataset = Dataset.create(
    dataset_name=dataset_name, # has no effect
    dataset_project=bucket_name,
    dataset_version=dataset_version,
    output_uri=f"gs://{bucket_name}",
    description=f"cropped_images",
    use_current_task=True,
) `having run this once, the dataset.get will find the dataset the next time around
  
  
Posted one year ago

I think your use case is the original idea behind "use_current_task" option, it was basically designed to connect code that creates the Dataset together with the dataset itself.
I think the only caveat in the current implementation is that it should "move" the current Task into the dataset project / set the name. wdyt?

  
  
Posted one year ago

Just verified the with the code base, should work out of the box 🙂 nothing to worry about

  
  
Posted one year ago

Oh sure, use

they will be visible on the Dataset page on the version in question

That sounds simple enough.
Though I imagine I'd need to explicitly report every figure. Correct?

  
  
Posted one year ago

hmm.
this isn't supported though:
dataset_args = dataset.connect(dataset_args)

  
  
Posted one year ago