JealousParrot68

3 Questions, 28 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

18 × Eureka!

Questions 3
Answers 28

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Do Tasks That Are Created Through Create_Function_Task Run The Entry_Script Again Instead Of Just The Pure Function?

do tasks that are created through create_function_task run the entry_script again instead of just the pure function?

clearml

4 years ago

0 Votes

13 Answers

2K Views

0 Votes 13 Answers 2K Views

Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

Is it possible to filter tasks by there output and input names using .get_tasks?

clearml

4 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hmm Is There Any Clear (Pun Intended) Documentation On The Roles Of Storagemanager, Dataset And Artefacts? It Seems To Me There Are Various Overlapping Roles And I'M Not Sure I Fully Grasp The Best Way Of Using Them. Especially When Looking At The Way Da

Hmm is there any clear (pun intended) documentation on the roles of Storagemanager, Dataset and artefacts? It seems to me there are various overlapping roles...

clearml

4 years ago

0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

AgitatedDove14 . HollowKangaroo16 have you two had any further success on the kedro/clearml front?

I have been looking into this as well. The impression I have so far is that clearml is similar to mlflow just on steroids because it provides additional capabilities around orchestration and experimentation.

AgitatedDove14
Kedro in my opinion is a really nice tool to keep a clean code base for building complex Data Science projects (consisting of one or more pipelines). The UI is really se...

4 years ago

0 Hmm Is There Any Clear (Pun Intended) Documentation On The Roles Of Storagemanager, Dataset And Artefacts? It Seems To Me There Are Various Overlapping Roles And I'M Not Sure I Fully Grasp The Best Way Of Using Them. Especially When Looking At The Way Da

Minimum example:
df = pd.DataFrame([[1,2,3], [1,2,3]])
task.upload_artifact('test', df)
task.artifacts['test'].get()

4 years ago

0 Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

ok so that way I'll run my own requests against the API endpoint

4 years ago

AgitatedDove14 Might be just an error on my side but if I use a pandas DataFrame as an Artefact and then use the .get() method in another task I get a compression error. If I use .get_local_copy() I can use: df = pd.read_csv(task.artifacts['bla'].get_local_copy(), compression=None) and it works. But I need the compression=None otherwise I'll get the same error as with .get() I'll build a minimal example tomorrow for you

4 years ago

0 Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

yes that was what I was looking for 🙂 ok no worries I have some ideas on a workaround for now 🙂

4 years ago

0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

Does that mean the entire pipeline will be running on the instance spinning the container ?
From here: this is what I understand:

Yes I think that is the easiest case, however I don't think it would be all that difficult to add meta data to the nodes that specifies on what kind of queue or node it should be run.

Yep, this is exactly what's coming in the next release of Pipelines (RC should be out in a week or so)

Well if that is coming out soon I'll wait with further developmen...

4 years ago

0 Hi Again

ok that makes more sense thanks

4 years ago

point 3. being showcased in GrumpyPenguin23 video

4 years ago

0 Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

Is that API endpoint the same return than get_tasks is?

4 years ago

0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

AgitatedDove14

That's definitely very easy, I'm still not sure how Kedro scales on clusters. From what I saw, and I might have missed it, it seems more like a single instance with sub-processes, but no real ability to setup diff environment for the diff steps in the pipeline, is this correct ?

sub-processes is an option but it supports much more: https://kedro.readthedocs.io/en/stable/10_deployment/01_deployment_guide.html one can containerise the whole pipeline and run it pretty m...

4 years ago

0 Do Tasks That Are Created Through Create_Function_Task Run The Entry_Script Again Instead Of Just The Pure Function?

AgitatedDove14 good morning first of all 😄 yeah I know the decorator is coming and that is what I am looking for. But nonetheless I still wanted to play around with things a bit and was curious about the behaviour I saw. But I also saw that it is documented 😄 sorry

4 years ago

0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

are you using kedro with dagster?

4 years ago

0 Hi

I'm an early user but happy to chat 😉

4 years ago

AgitatedDove14 any idea on what that is?
If I have a task and I upload a dataframe with task.upload_artifact('test', dataframe)
and then on the same task to task.artifacts['test'].get() I always get an error ...

Are you able to reproduce it?

4 years ago

Ok the caching part is nice. I think the tricky part (as always) are going to be all the edge cases. E.g. in my preprocessing pipeline I might have a lot of tasks so that I can parallelise nicely but at the cost of quite a lot of boiler plate code for getting and writing artefacts as well as having a lot of tasks in the UI. Lets see

4 years ago

0 Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

a bit fidely to figure out but I think it works. can't seem to be checking for artifact names AgitatedDove14 correct me here please but other filters work fine.

4 years ago

0 Hi Again

Also the docstring is a bit inconclusive:
Launch every 15 minutes add_task(task_id='1235', queue='default', minute=15) Launch every 1 hour add_task(task_id='1235', queue='default', hour=1)
but then later:
:param minute: If specified launch Task at a specific minute of the day (Valid values 0-60) :param hour: If specified launch Task at a specific hour (24h) of the day (Valid values 0-24)
The first seems to imply that 15 will launch every 15 minu...

4 years ago

0 Hi Again

AgitatedDove14 does this release include the decorator for function tasks? 🙂

4 years ago

0 Hi Again

AgitatedDove14 awesome, is there an example somewhere by any chance?

4 years ago

0 Do Tasks That Are Created Through Create_Function_Task Run The Entry_Script Again Instead Of Just The Pure Function?

I wont ask when the decorator is coming 😉

4 years ago

Hmm this is odd, is this a download issue? if this is reproducible maybe we should investigate further...

I'll keep you informed as I play around with it 🙂

4 years ago

any idea when that hot fix is coming?

4 years ago

0 Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

yes exactly sorry

4 years ago

AgitatedDove14 as always much obliged to your fast responses this is actually incredible!

Yeah a bit clearer, something like this in the docs would be really helpful 😉 At least the last part as Storagemanager is actually quite clear.

Maybe I can sum up my understanding?
So am I right in the assumption that I can manage data and the passing of such between tasks either by
Managing them in a folder structure via datasets with the potential issue of syncing a lot of data between tasks ...

4 years ago

Ah I see, ok I'll have to wait then thanks

4 years ago

0 Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

I'll try it out

4 years ago

0 When Using Something Like Pdf2Image Which Requires Poppler (Which Can Be Installed With Conda), How Can I Ensure That The Task Can Run On An Agent Correctly? As Of Now It Doesn’T Know About Poppler

Would that mean if you are running 2-3 clearml agents for 2-3 projects that their environment has to be such that they could run each of the 3 projects (each having different requirements)?

What is the pattern to start an agent within the project specific docker container based on the task? Would that be handled via the service queue? Or can you already configure that on a task level providing a docker file?

4 years ago

maybe a pandas version issue?

4 years ago