Badges 118 × Eureka!
Would that mean if you are running 2-3 clearml agents for 2-3 projects that their environment has to be such that they could run each of the 3 projects (each having different requirements)?
What is the pattern to start an agent within the project specific docker container based on the task? Would that be handled via the service queue? Or can you already configure that on a task level providing a docker file?
That's definitely very easy, I'm still not sure how Kedro scales on clusters. From what I saw, and I might have missed it, it seems more like a single instance with sub-processes, but no real ability to setup diff environment for the diff steps in the pipeline, is this correct ?
sub-processes is an option but it supports much more: https://kedro.readthedocs.io/en/stable/10_deployment/01_deployment_guide.html one can containerise the whole pipeline and run it pretty m...
are you using kedro with dagster?
Does that mean the entire pipeline will be running on the instance spinning the container ?
From here: this is what I understand:
Yes I think that is the easiest case, however I don't think it would be all that difficult to add meta data to the nodes that specifies on what kind of queue or node it should be run.
Yep, this is exactly what's coming in the next release of Pipelines (RC should be out in a week or so)
Well if that is coming out soon I'll wait with further developmen...
AgitatedDove14 . HollowKangaroo16 have you two had any further success on the kedro/clearml front?
I have been looking into this as well. The impression I have so far is that clearml is similar to mlflow just on steroids because it provides additional capabilities around orchestration and experimentation.
Kedro in my opinion is a really nice tool to keep a clean code base for building complex Data Science projects (consisting of one or more pipelines). The UI is really se...
I'm an early user but happy to chat 😉
AgitatedDove14 does this release include the decorator for function tasks? 🙂
ok that makes more sense thanks
AgitatedDove14 awesome, is there an example somewhere by any chance?
Also the docstring is a bit inconclusive:
Launch every 15 minutes add_task(task_id='1235', queue='default', minute=15) Launch every 1 hour add_task(task_id='1235', queue='default', hour=1)
but then later:
:param minute: If specified launch Task at a specific minute of the day (Valid values 0-60) :param hour: If specified launch Task at a specific hour (24h) of the day (Valid values 0-24)
The first seems to imply that 15 will launch every 15 minu...
yes that was what I was looking for 🙂 ok no worries I have some ideas on a workaround for now 🙂
yes exactly sorry
Is that API endpoint the same return than get_tasks is?
I wont ask when the decorator is coming 😉
AgitatedDove14 good morning first of all 😄 yeah I know the decorator is coming and that is what I am looking for. But nonetheless I still wanted to play around with things a bit and was curious about the behaviour I saw. But I also saw that it is documented 😄 sorry
ok so that way I'll run my own requests against the API endpoint
a bit fidely to figure out but I think it works. can't seem to be checking for artifact names AgitatedDove14 correct me here please but other filters work fine.
AgitatedDove14 as always much obliged to your fast responses this is actually incredible!
Yeah a bit clearer, something like this in the docs would be really helpful 😉 At least the last part as Storagemanager is actually quite clear.
Maybe I can sum up my understanding?
So am I right in the assumption that I can manage data and the passing of such between tasks either by
Managing them in a folder structure via datasets with the potential issue of syncing a lot of data between tasks ...
point 3. being showcased in GrumpyPenguin23 video
Hmm this is odd, is this a download issue? if this is reproducible maybe we should investigate further...
I'll keep you informed as I play around with it 🙂
Ok the caching part is nice. I think the tricky part (as always) are going to be all the edge cases. E.g. in my preprocessing pipeline I might have a lot of tasks so that I can parallelise nicely but at the cost of quite a lot of boiler plate code for getting and writing artefacts as well as having a lot of tasks in the UI. Lets see
AgitatedDove14 Might be just an error on my side but if I use a pandas DataFrame as an Artefact and then use the .get() method in another task I get a compression error. If I use .get_local_copy() I can use:
df = pd.read_csv(task.artifacts['bla'].get_local_copy(), compression=None) and it works. But I need the compression=None otherwise I'll get the same error as with
.get() I'll build a minimal example tomorrow for you
AgitatedDove14 any idea on what that is?
If I have a task and I upload a dataframe with task.upload_artifact('test', dataframe)
and then on the same task to task.artifacts['test'].get() I always get an error ...
Are you able to reproduce it?
df = pd.DataFrame([[1,2,3], [1,2,3]])
Ah I see, ok I'll have to wait then thanks
any idea when that hot fix is coming?