Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Is There An Easy Way To Add A Link To One Of The Tasks Panels? (As An Artifact, Configuration, Info, Etc)? Edit: And Follow Up Regarding The Dataset. As Discussed Somewhere Previously, The Datasets Are Now Automatically Moved To A Hidden "Sub-Project" Pr

Is there an easy way to add a link to one of the tasks panels? (as an artifact, configuration, info, etc)?

EDIT: And follow up regarding the dataset. As discussed somewhere previously, the datasets are now automatically moved to a hidden "sub-project" prefixed with .datasets . This creates several annoyances that I believe should be treated:
First, when looking at the parent project, it appears as if there are two projects nested (the actual project, and the hidden .datasets project). That's fine, except when you click on the actual parent project, you have to click again on the same project name to see the tasks, whereas previously (when there was no .datasets ), you only needed a single click. When you enter this .datasets project via the parent project, it appears empty (tasks are hidden?), adding to the nuisance. Finally, if you try to delete a project that has this - you can't. You have to find the dataset in the Datasets tab, delete it from there, and only then can you delete the project (since you cannot delete the task from the hidden project).See examples in https://clearml.slack.com/archives/CTK20V944/p1662633944688589?thread_ts=1661256050.014979&cid=CTK20V944

  
  
Posted one year ago
Votes Newest

Answers 28


Basically you have the details from the Dataset page, why should it be mixed with the others ?

Because maybe it contains code and logs on how to prepare the dataset. Or maybe the user just wants increased visibility for the dataset itself in the tasks view.

why would you need the Dataset Task itself is the main question?

For the same reason as above. Visibility and ease of access. Coupling relevant tasks and dataset in the same project makes it easier to understand that they're linked together.

Not sure I can imagine one, can you provide an example?

Yes. Because my old https://github.com/allegroai/clearml/issues/395 has never been resolved (though closed), we use the dataset object to upload e.g. local files needed for remote execution. These are not the same as actual datasets, but can be reused and can be useful for introspection.

What you mean here is, if the dataset ".dataset" project is already hidden, why do we also "hide" the Tasks inside ?

No, I mean why does it show up in the task view (see attached image), forcing me to click twice on the same project name.

I'm a bit lost of words in describing this. Would be happy to show quickly via e.g. a Slack call/huddle.

  
  
Posted one year ago

AgitatedDove14 Basically the fact that this happens without user control is very frustrating - https://github.com/allegroai/clearml/blob/447714eaa4ac09b4d44a41bfa31da3b1a23c52fe/clearml/datasets/dataset.py#L191

  
  
Posted one year ago

What if I have multiple files that are not in the same folder? (That is the current use-case)

I think you can do weights_filenames= ['a_folder/firstfile.bin', 'b_folder/secondfile.bin']
(it will look for a common file path for both so it retains the folder structure)

Our workaround now for using a

Dataset

as we do, is to store the dataset ID as a configuration parameter, so it's always included too

Exactly, so with Input Model it's the same only kind of built in 🙂

  
  
Posted one year ago

can I assume these files are reused

A definite maybe, they may or may not be used, but we'd like to keep that option 🙃

Maybe the "old" way Dataset were shown is better suited ?

It was, but then it's gone now 😞

I see your point, this actually might be a "bug"?!

I would say so myself, but could be also by design..?

Awesome, I'll ask Product to reach out

LMK, happy to help out!
I know our use case is maybe a very different one, but generalizing from it would surely be beneficial 🙂

  
  
Posted one year ago

I'll give it a shot. Honestly, the SDK documentation for both InputModel and OutputModel is (sorry)

horrible

...

I have to agree, we are changing this interface, I do not think it is good 😞

  
  
Posted one year ago

Yes. Because my old

has never been resolved (though closed), we use the dataset object to upload e.g. local files needed for remote execution.

Ohh No I remember... following this line, can I assume these files are reused, i.e. this is not a "per instance" . I have to admit that I have a feeling this is a very unique usecase. and Maybe the "old" way Dataset were shown is better suited ?

No, I mean why does it show up in the task view (see attached image), forcing me to click twice on the same project name.

I see your point, this actually might be a "bug"?!

I'm a bit lost of words in describing this. Would be happy to show quickly via e.g. a Slack call/huddle.

Awesome, I'll ask Product to reach out 🙂

  
  
Posted one year ago

Well, -ish. Ideally what we're after is one of the following:
Couple a task with a dataset. Keep it visible in it's destined location. Create a dataset separately from the task. Have control over its visibility and location. If it's hidden, it should not affect normal UI interaction (most annoying is having to click twice on the same project name when there are hidden datasets, which do not appear in the project view)

  
  
Posted one year ago

LOL love that approach.
Basically here is what I'm thinking,
` from clearml import Task, InputModel, OutputModel

task = Task.init(...)

run this part once

if task.running_locally():
my_auxiliary_stuff = OutputModel()
my_auxiliary_stuff.system_tags = ["DATA"]
my_auxiliary_stuff.update_weights_package(weights_path="/path/to/additional/files")
input_my_auxiliary = InputModel(model_id=my_auxiliary_stuff.id)
task.connect(input_my_auxiliary, "my_auxiliary")

task.execute_remotely()
my_auxiliary_path = task.models["input"]["my_auxiliary"].get_weights_package(return_path=True) `I might have some typos but it should do the trick.
You will Have a "Model" with all your auxiliary data, and when you clone the Tasks it will copy the reference to the data, But when you delete a Task it will not by default delete the Model (aka data)
WDYT?

  
  
Posted one year ago

I'll give it a shot. Honestly, the SDK documentation for both InputModel and OutputModel is (sorry) horrible ...

Can't wait for the documentation revamping.

  
  
Posted one year ago

Hi UnevenDolphin73

Is there an easy way to add a link to one of the tasks panels? (as an artifact, configuration, info, etc)?

You can add a link as an artifact, that is probably the easiest:
tasl.upload_artifact(name="just link", artifact_object=" ")

EDIT: And follow up regarding the dataset. As discussed somewhere previously, the datasets are now automatically moved to a hidden "sub-project" prefixed with

.datasets

. This creates several annoyances that I believe should be treated: ...

Yes Datasets from the UI should be accessed from the Datasets tab (the .datasets etc. we can think about as implementation details)
That said I think the main issue is what happens if you do "use current Task" for the dataset, then things become more complicated and less intuitive, is this the correct context ?

  
  
Posted one year ago

That gives us the benefit of creating "local datasets" (confined to the scope of the project, do not appear in Datasets tabs, but appear as normal tasks within the project)

  
  
Posted one year ago

For now we've monkey-patched it to our usecase:

LOL, that's a cool hack

That gives us the benefit of creating "local datasets" (confined to the scope of the project, do not appear in

Datasets

tabs, but appear as normal tasks within the project)

So what would be a "perfect" solution here?
I think I'm missing the point on why it became an issue in the first place.
Notice that in new versions Dataset will be registered on the Tasks that use them (they are already there in the Info Tab, and will be part of the configuration as well, so that you can override them if you wish when running remotely).
The second point is to better highlight the "creating Task" of a dataset, so that the preprocessing code is more visible in the Dataset UI.
What else am I missing ?

  
  
Posted one year ago

Any sneak preview? 😉 😁

  
  
Posted one year ago

The current implementation (since 1.6.3 I think) creates the issues in the linked comment (with images to visualize).

Understood, basically the moment we add nested project view to the dataset (and pipelines for that matter, and both are already being worked on), it should solve everything. Is that correct?

  
  
Posted one year ago

Hmm, maybe the right way to do so is to abuse "models" which have entity, you can specify a system_tag on them, they can store a folder (and extract it if you need), they are on projects and they are cloned and can be changed.
wdyt?

  
  
Posted one year ago

Why is it using an OutputModel and an InputModel?

So calling OutputModel will create the new Model entity and upload the data, InputModel will store it as required input Model.
Basically on the Task you have input & output section, when you clone the Task you are copying the input section into the newly created Task, and the assumption is that when you execute it, your code will create the output section.
Here when you clone the Task you will be clone the reference to the InputModel (i.e. you data), and it will always go with you.
wdyt?

  
  
Posted one year ago

Looks good! Why is it using an OutputModel and an InputModel?

  
  
Posted one year ago

Why does ClearML hide the dataset task from the main WebUI?

Basically you have the details from the Dataset page, why should it be mixed with the others ?

If I specified a project for the dataset, I specifically want it there, in that project, not hidden away in some

.datasets

hidden sub-project.

This maybe a request for "Dataset" tab under project, why would you need the Dataset Task itself is the main question?

Not all dataset objects are equal, and perhaps not all of them should appear in the

Datasets

panel.

Not sure I can imagine one, can you provide an example?
If a dataset is already hidden - its project should not appear anywhere in the project view. Users anyway can't access it from the UI (since it's hidden), but now have additional clutter and require additional clicks to get to where they wanted.What you mean here is, if the dataset ".dataset" project is already hidden, why do we also "hide" the Tasks inside ?

  
  
Posted one year ago

A definite maybe, they may or may not be used, but we'd like to keep that option

The precursor to the question is the idea of storing local files as "input artifacts" on the Task, which means that if the Task is cloned the links go with it. Let's assume for a second this is the case, how would you upload these artifacts in the first place?

  
  
Posted one year ago

Hi AgitatedDove14 !

Ah, thanks! I'll use the artifacts for linking.

We've forgone the "use current task" already because it indeed made things even more difficult (the task that was used is then automatically hidden by this automatic renaming of dataset tasks).
The current implementation (since 1.6.3 I think) creates the issues in the linked comment (with images to visualize).

  
  
Posted one year ago

packages an entire folder as zip

What if I have multiple files that are not in the same folder? (That is the current use-case)

It otherwise makes sense I think 🙂
Our workaround now for using a Dataset as we do, is to store the dataset ID as a configuration parameter, so it's always included too 😉

  
  
Posted one year ago

For now we've monkey-patched it to our usecase:

` Dataset._Dataset__hidden_tag = "active"

    def foo(cls, dataset_project, dataset_name):
        dataset_project = dataset_project or "Datasets"
        return dataset_project, dataset_project.rpartition("/")[0]

    Dataset._build_hidden_project_name = foo `
  
  
Posted one year ago

I'm not entirely sure I understand the flow but I'll give it a go. I have two final questions:
This seems to only work for a single file (weights_path implies a single file, not multiple ones). Is that the case? Why do you see this as preferred to the dataset method we have now? 🤔

  
  
Posted one year ago

I commented on your suggestion to this on GH. Uploading the artifacts would happen via some SDK before switching to remote execution.
When cloning a task (via WebUI or SDK), a user should have an option to also clone these input artifacts or simply linking to the original. If linking to the original, then if the original task is deleted - it is the user's mistake.

Alternatively, this potentially suggests "Input Datasets" (as we're imitating now), such that they are not tied to the original task. These can also hold references to all tasks that use them, so deleting them would be made harder

  
  
Posted one year ago

Those are cool and very welcome additions (hopefully the additional info in the Info tab will be a link?) 😁

The main issue is the clutter that the forced renaming creates, as shown in the pictures I attached in the other thread.
Why does ClearML hide the dataset task from the main WebUI? Users should have some control over that. If I specified a project for the dataset, I specifically want it there, in that project, not hidden away in some .datasets hidden sub-project. Not all dataset objects are equal, and perhaps not all of them should appear in the Datasets panel. If a dataset is already hidden - its project should not appear anywhere in the project view. Users anyway can't access it from the UI (since it's hidden), but now have additional clutter and require additional clicks to get to where they wanted.

  
  
Posted one year ago

This seems to only work for a single file (weights_path implies a single file, not multiple ones). Is that the case?See update_weights_package actually packages an entire folder as zip and will do the extraction when you get it back (check the function docstring, I think you can also specify wildcard etc if needed)

Why do you see this as preferred to the dataset method we have now?

So it answers a few requirements that you raised
It is fully visible as part of the project and separate entity When you clone a Task it will go with it (and will let you change it in the UI if needed) It is not actually data but additional required inputs to execute (closer to input model than to standalone dataset) It has simple interface that does not require differentiable storage but allow multiple "versions" nonetheless It is coupled with the Task & Project and not a "standalone" datasetwdyt?

  
  
Posted one year ago

I'm not sure what you mean by "entity", but honestly anything work. We're already monkey-patching our way 😄

  
  
Posted one year ago

We are working hard on release 1.7 once that is out we will push an RC for review (I hope) 🙂

  
  
Posted one year ago
650 Views
28 Answers
one year ago
one year ago
Tags
Similar posts