Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
JealousParrot68
Moderator
3 Questions, 28 Answers
  Active since 10 January 2023
  Last activity one year ago

Reputation

0

Badges 1

18 × Eureka!
0 Votes
18 Answers
994 Views
0 Votes 18 Answers 994 Views
3 years ago
0 Votes
13 Answers
1K Views
0 Votes 13 Answers 1K Views
Is it possible to filter tasks by there output and input names using .get_tasks?
3 years ago
0 Votes
3 Answers
970 Views
0 Votes 3 Answers 970 Views
do tasks that are created through create_function_task run the entry_script again instead of just the pure function?
3 years ago
0 Hi Again

ok that makes more sense thanks

3 years ago
0 Hi Again

AgitatedDove14 awesome, is there an example somewhere by any chance?

3 years ago
3 years ago
0 Hi Again

AgitatedDove14 does this release include the decorator for function tasks? 🙂

3 years ago
0 Hmm Is There Any Clear (Pun Intended) Documentation On The Roles Of Storagemanager, Dataset And Artefacts? It Seems To Me There Are Various Overlapping Roles And I'M Not Sure I Fully Grasp The Best Way Of Using Them. Especially When Looking At The Way Da

Ok the caching part is nice. I think the tricky part (as always) are going to be all the edge cases. E.g. in my preprocessing pipeline I might have a lot of tasks so that I can parallelise nicely but at the cost of quite a lot of boiler plate code for getting and writing artefacts as well as having a lot of tasks in the UI. Lets see

3 years ago
0 Hi Again

Also the docstring is a bit inconclusive:
Launch every 15 minutes add_task(task_id='1235', queue='default', minute=15) Launch every 1 hour add_task(task_id='1235', queue='default', hour=1)
but then later:
:param minute: If specified launch Task at a specific minute of the day (Valid values 0-60) :param hour: If specified launch Task at a specific hour (24h) of the day (Valid values 0-24)
The first seems to imply that 15 will launch every 15 minu...

3 years ago
0 Hi

I'm an early user but happy to chat 😉

3 years ago
0 Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

a bit fidely to figure out but I think it works. can't seem to be checking for artifact names AgitatedDove14 correct me here please but other filters work fine.

3 years ago
0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

AgitatedDove14

That's definitely very easy, I'm still not sure how Kedro scales on clusters. From what I saw, and I might have missed it, it seems more like a single instance with sub-processes, but no real ability to setup diff environment for the diff steps in the pipeline, is this correct ?

sub-processes is an option but it supports much more: https://kedro.readthedocs.io/en/stable/10_deployment/01_deployment_guide.html one can containerise the whole pipeline and run it pretty m...

3 years ago
0 Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

Is that API endpoint the same return than get_tasks is?

3 years ago
0 Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

yes that was what I was looking for 🙂 ok no worries I have some ideas on a workaround for now 🙂

3 years ago
0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

AgitatedDove14 . HollowKangaroo16 have you two had any further success on the kedro/clearml front?

I have been looking into this as well. The impression I have so far is that clearml is similar to mlflow just on steroids because it provides additional capabilities around orchestration and experimentation.

AgitatedDove14
Kedro in my opinion is a really nice tool to keep a clean code base for building complex Data Science projects (consisting of one or more pipelines). The UI is really se...

3 years ago
3 years ago
0 Do Tasks That Are Created Through Create_Function_Task Run The Entry_Script Again Instead Of Just The Pure Function?

AgitatedDove14 good morning first of all 😄 yeah I know the decorator is coming and that is what I am looking for. But nonetheless I still wanted to play around with things a bit and was curious about the behaviour I saw. But I also saw that it is documented 😄 sorry

3 years ago
0 Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

ok so that way I'll run my own requests against the API endpoint

3 years ago
0 Hmm Is There Any Clear (Pun Intended) Documentation On The Roles Of Storagemanager, Dataset And Artefacts? It Seems To Me There Are Various Overlapping Roles And I'M Not Sure I Fully Grasp The Best Way Of Using Them. Especially When Looking At The Way Da

AgitatedDove14 any idea on what that is?
If I have a task and I upload a dataframe with task.upload_artifact('test', dataframe)
and then on the same task to task.artifacts['test'].get() I always get an error ...

Are you able to reproduce it?

3 years ago
0 Hmm Is There Any Clear (Pun Intended) Documentation On The Roles Of Storagemanager, Dataset And Artefacts? It Seems To Me There Are Various Overlapping Roles And I'M Not Sure I Fully Grasp The Best Way Of Using Them. Especially When Looking At The Way Da

AgitatedDove14 as always much obliged to your fast responses this is actually incredible!

Yeah a bit clearer, something like this in the docs would be really helpful 😉 At least the last part as Storagemanager is actually quite clear.

Maybe I can sum up my understanding?
So am I right in the assumption that I can manage data and the passing of such between tasks either by
Managing them in a folder structure via datasets with the potential issue of syncing a lot of data between tasks ...

3 years ago
0 Hmm Is There Any Clear (Pun Intended) Documentation On The Roles Of Storagemanager, Dataset And Artefacts? It Seems To Me There Are Various Overlapping Roles And I'M Not Sure I Fully Grasp The Best Way Of Using Them. Especially When Looking At The Way Da

AgitatedDove14 Might be just an error on my side but if I use a pandas DataFrame as an Artefact and then use the .get() method in another task I get a compression error. If I use .get_local_copy() I can use: df = pd.read_csv(task.artifacts['bla'].get_local_copy(), compression=None) and it works. But I need the compression=None otherwise I'll get the same error as with .get() I'll build a minimal example tomorrow for you

3 years ago
0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

Does that mean the entire pipeline will be running on the instance spinning the container ?
From here: this is what I understand:

Yes I think that is the easiest case, however I don't think it would be all that difficult to add meta data to the nodes that specifies on what kind of queue or node it should be run.

Yep, this is exactly what's coming in the next release of Pipelines (RC should be out in a week or so)

Well if that is coming out soon I'll wait with further developmen...

3 years ago
0 When Using Something Like Pdf2Image Which Requires Poppler (Which Can Be Installed With Conda), How Can I Ensure That The Task Can Run On An Agent Correctly? As Of Now It Doesn’T Know About Poppler

Would that mean if you are running 2-3 clearml agents for 2-3 projects that their environment has to be such that they could run each of the 3 projects (each having different requirements)?

What is the pattern to start an agent within the project specific docker container based on the task? Would that be handled via the service queue? Or can you already configure that on a task level providing a docker file?

3 years ago