Answered

Is There An Easy Way To Add A Link To One Of The Tasks Panels? (As An Artifact, Configuration, Info, Etc)? Edit: And Follow Up Regarding The Dataset. As Discussed Somewhere Previously, The Datasets Are Now Automatically Moved To A Hidden "Sub-Project" Pr

Is there an easy way to add a link to one of the tasks panels? (as an artifact, configuration, info, etc)?

EDIT: And follow up regarding the dataset. As discussed somewhere previously, the datasets are now automatically moved to a hidden "sub-project" prefixed with .datasets . This creates several annoyances that I believe should be treated:
First, when looking at the parent project, it appears as if there are two projects nested (the actual project, and the hidden .datasets project). That's fine, except when you click on the actual parent project, you have to click again on the same project name to see the tasks, whereas previously (when there was no .datasets ), you only needed a single click. When you enter this .datasets project via the parent project, it appears empty (tasks are hidden?), adding to the nuisance. Finally, if you try to delete a project that has this - you can't. You have to find the dataset in the Datasets tab, delete it from there, and only then can you delete the project (since you cannot delete the task from the hidden project).See examples in https://clearml.slack.com/archives/CTK20V944/p1662633944688589?thread_ts=1661256050.014979&cid=CTK20V944

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Votes Newest

Answers 28

Yes. Because my old

has never been resolved (though closed), we use the dataset object to upload e.g. local files needed for remote execution.

Ohh No I remember... following this line, can I assume these files are reused, i.e. this is not a "per instance" . I have to admit that I have a feeling this is a very unique usecase. and Maybe the "old" way Dataset were shown is better suited ?

No, I mean why does it show up in the task view (see attached image), forcing me to click twice on the same project name.

I see your point, this actually might be a "bug"?!

I'm a bit lost of words in describing this. Would be happy to show quickly via e.g. a Slack call/huddle.

Awesome, I'll ask Product to reach out 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Looks good! Why is it using an OutputModel and an InputModel?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Why does ClearML hide the dataset task from the main WebUI?

Basically you have the details from the Dataset page, why should it be mixed with the others ?

If I specified a project for the dataset, I specifically want it there, in that project, not hidden away in some

.datasets

hidden sub-project.

This maybe a request for "Dataset" tab under project, why would you need the Dataset Task itself is the main question?

Not all dataset objects are equal, and perhaps not all of them should appear in the

Datasets

panel.

Not sure I can imagine one, can you provide an example?
If a dataset is already hidden - its project should not appear anywhere in the project view. Users anyway can't access it from the UI (since it's hidden), but now have additional clutter and require additional clicks to get to where they wanted.What you mean here is, if the dataset ".dataset" project is already hidden, why do we also "hide" the Tasks inside ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hmm, maybe the right way to do so is to abuse "models" which have entity, you can specify a system_tag on them, they can store a folder (and extract it if you need), they are on projects and they are cloned and can be changed.
wdyt?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 Basically the fact that this happens without user control is very frustrating - https://github.com/allegroai/clearml/blob/447714eaa4ac09b4d44a41bfa31da3b1a23c52fe/clearml/datasets/dataset.py#L191

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

can I assume these files are reused

A definite maybe, they may or may not be used, but we'd like to keep that option 🙃

Maybe the "old" way Dataset were shown is better suited ?

It was, but then it's gone now 😞

I see your point, this actually might be a "bug"?!

I would say so myself, but could be also by design..?

Awesome, I'll ask Product to reach out

LMK, happy to help out!
I know our use case is maybe a very different one, but generalizing from it would surely be beneficial 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

The current implementation (since 1.6.3 I think) creates the issues in the linked comment (with images to visualize).

Understood, basically the moment we add nested project view to the dataset (and pipelines for that matter, and both are already being worked on), it should solve everything. Is that correct?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi UnevenDolphin73

Is there an easy way to add a link to one of the tasks panels? (as an artifact, configuration, info, etc)?

You can add a link as an artifact, that is probably the easiest:
tasl.upload_artifact(name="just link", artifact_object=" ")

EDIT: And follow up regarding the dataset. As discussed somewhere previously, the datasets are now automatically moved to a hidden "sub-project" prefixed with

.datasets

. This creates several annoyances that I believe should be treated: ...

Yes Datasets from the UI should be accessed from the Datasets tab (the .datasets etc. we can think about as implementation details)
That said I think the main issue is what happens if you do "use current Task" for the dataset, then things become more complicated and less intuitive, is this the correct context ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Why is it using an OutputModel and an InputModel?

So calling OutputModel will create the new Model entity and upload the data, InputModel will store it as required input Model.
Basically on the Task you have input & output section, when you clone the Task you are copying the input section into the newly created Task, and the assumption is that when you execute it, your code will create the output section.
Here when you clone the Task you will be clone the reference to the InputModel (i.e. you data), and it will always go with you.
wdyt?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

A definite maybe, they may or may not be used, but we'd like to keep that option

The precursor to the question is the idea of storing local files as "input artifacts" on the Task, which means that if the Task is cloned the links go with it. Let's assume for a second this is the case, how would you upload these artifacts in the first place?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Basically you have the details from the Dataset page, why should it be mixed with the others ?

Because maybe it contains code and logs on how to prepare the dataset. Or maybe the user just wants increased visibility for the dataset itself in the tasks view.

why would you need the Dataset Task itself is the main question?

For the same reason as above. Visibility and ease of access. Coupling relevant tasks and dataset in the same project makes it easier to understand that they're linked together.

Not sure I can imagine one, can you provide an example?

Yes. Because my old https://github.com/allegroai/clearml/issues/395 has never been resolved (though closed), we use the dataset object to upload e.g. local files needed for remote execution. These are not the same as actual datasets, but can be reused and can be useful for introspection.

What you mean here is, if the dataset ".dataset" project is already hidden, why do we also "hide" the Tasks inside ?

No, I mean why does it show up in the task view (see attached image), forcing me to click twice on the same project name.

I'm a bit lost of words in describing this. Would be happy to show quickly via e.g. a Slack call/huddle.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I'll give it a shot. Honestly, the SDK documentation for both InputModel and OutputModel is (sorry)

horrible

...

I have to agree, we are changing this interface, I do not think it is good 😞

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi AgitatedDove14 !

Ah, thanks! I'll use the artifacts for linking.

We've forgone the "use current task" already because it indeed made things even more difficult (the task that was used is then automatically hidden by this automatic renaming of dataset tasks).
The current implementation (since 1.6.3 I think) creates the issues in the linked comment (with images to visualize).

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

For now we've monkey-patched it to our usecase:

LOL, that's a cool hack

That gives us the benefit of creating "local datasets" (confined to the scope of the project, do not appear in

Datasets

tabs, but appear as normal tasks within the project)

So what would be a "perfect" solution here?
I think I'm missing the point on why it became an issue in the first place.
Notice that in new versions Dataset will be registered on the Tasks that use them (they are already there in the Info Tab, and will be part of the configuration as well, so that you can override them if you wish when running remotely).
The second point is to better highlight the "creating Task" of a dataset, so that the preprocessing code is more visible in the Dataset UI.
What else am I missing ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I'm not entirely sure I understand the flow but I'll give it a go. I have two final questions:
This seems to only work for a single file (weights_path implies a single file, not multiple ones). Is that the case? Why do you see this as preferred to the dataset method we have now? 🤔

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

I'm not sure what you mean by "entity", but honestly anything work. We're already monkey-patching our way 😄

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

For now we've monkey-patched it to our usecase:

` Dataset._Dataset__hidden_tag = "active"

    def foo(cls, dataset_project, dataset_name):
        dataset_project = dataset_project or "Datasets"
        return dataset_project, dataset_project.rpartition("/")[0]

    Dataset._build_hidden_project_name = foo `

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

We are working hard on release 1.7 once that is out we will push an RC for review (I hope) 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I'll give it a shot. Honestly, the SDK documentation for both InputModel and OutputModel is (sorry) horrible ...

Can't wait for the documentation revamping.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

packages an entire folder as zip

What if I have multiple files that are not in the same folder? (That is the current use-case)

It otherwise makes sense I think 🙂
Our workaround now for using a Dataset as we do, is to store the dataset ID as a configuration parameter, so it's always included too 😉

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Any sneak preview? 😉 😁

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Well, -ish. Ideally what we're after is one of the following:
Couple a task with a dataset. Keep it visible in it's destined location. Create a dataset separately from the task. Have control over its visibility and location. If it's hidden, it should not affect normal UI interaction (most annoying is having to click twice on the same project name when there are hidden datasets, which do not appear in the project view)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

This seems to only work for a single file (weights_path implies a single file, not multiple ones). Is that the case?See update_weights_package actually packages an entire folder as zip and will do the extraction when you get it back (check the function docstring, I think you can also specify wildcard etc if needed)

Why do you see this as preferred to the dataset method we have now?

So it answers a few requirements that you raised
It is fully visible as part of the project and separate entity When you clone a Task it will go with it (and will let you change it in the UI if needed) It is not actually data but additional required inputs to execute (closer to input model than to standalone dataset) It has simple interface that does not require differentiable storage but allow multiple "versions" nonetheless It is coupled with the Task & Project and not a "standalone" datasetwdyt?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

That gives us the benefit of creating "local datasets" (confined to the scope of the project, do not appear in Datasets tabs, but appear as normal tasks within the project)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

What if I have multiple files that are not in the same folder? (That is the current use-case)

I think you can do weights_filenames= ['a_folder/firstfile.bin', 'b_folder/secondfile.bin']
(it will look for a common file path for both so it retains the folder structure)

Our workaround now for using a

Dataset

as we do, is to store the dataset ID as a configuration parameter, so it's always included too

Exactly, so with Input Model it's the same only kind of built in 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I commented on your suggestion to this on GH. Uploading the artifacts would happen via some SDK before switching to remote execution.
When cloning a task (via WebUI or SDK), a user should have an option to also clone these input artifacts or simply linking to the original. If linking to the original, then if the original task is deleted - it is the user's mistake.

Alternatively, this potentially suggests "Input Datasets" (as we're imitating now), such that they are not tied to the original task. These can also hold references to all tasks that use them, so deleting them would be made harder

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

Those are cool and very welcome additions (hopefully the additional info in the Info tab will be a link?) 😁

The main issue is the clutter that the forced renaming creates, as shown in the pictures I attached in the other thread.
Why does ClearML hide the dataset task from the main WebUI? Users should have some control over that. If I specified a project for the dataset, I specifically want it there, in that project, not hidden away in some .datasets hidden sub-project. Not all dataset objects are equal, and perhaps not all of them should appear in the Datasets panel. If a dataset is already hidden - its project should not appear anywhere in the project view. Users anyway can't access it from the UI (since it's hidden), but now have additional clutter and require additional clicks to get to where they wanted.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					UnevenDolphin73
				
					0
					 × 1

LOL love that approach.
Basically here is what I'm thinking,
` from clearml import Task, InputModel, OutputModel

task = Task.init(...)

run this part once

if task.running_locally():
my_auxiliary_stuff = OutputModel()
my_auxiliary_stuff.system_tags = ["DATA"]
my_auxiliary_stuff.update_weights_package(weights_path="/path/to/additional/files")
input_my_auxiliary = InputModel(model_id=my_auxiliary_stuff.id)
task.connect(input_my_auxiliary, "my_auxiliary")

task.execute_remotely()
my_auxiliary_path = task.models["input"]["my_auxiliary"].get_weights_package(return_path=True) `I might have some typos but it should do the trick.
You will Have a "Model" with all your auxiliary data, and when you clone the Tasks it will copy the reference to the data, But when you delete a Task it will not by default delete the Model (aka data)
WDYT?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

28 Answers

3 years ago

2 years ago