Reputation
Badges 1
90 × Eureka!could work! is there a way to visualize the pipeline such that this step is “stuck” in executing?
the above only passes the overrides if I am not mistaken
and of course this solution forces me to do a git push for all the other dependent modules when creating the task…
sure CostlyOstrich36
I have something like the following:
@PipelineDecorator.component(....) def my_task(...) from my_module1 import my_func1 from my_modeul2 import ....
my_module1 and 2 are modules that are a part of the same project source. they don’t come as a separate package.
Now when I run this in clearml, these imports don’t work.
These functions may require transitive imports of course, so the following doesn’t work:
` PipelineDecorator.component(helper_function=[my_fu...
CostlyOstrich36pipe.add_step(name='train', parents=['data_pipeline', ], base_task_project='xxx', base_task_name='yyy', parameter_override={'OmegaConf': cfg.trainer})
AgitatedDove14 the emphasis is that the imports I am doing are not from external/pipe packages, they are just neighbouring modules to the function I am importing. Imports that rely on pip installed packages work well
CostlyOstrich36 not that I am aware of deleting etc.
I didn’t set up the env though…
I can try, but it will then damage the download speeds. Anyhow not a reasonable behavior in my opinion
I don’t think so.
In most cases I woudl have multiple agents pulling from the same queue. I can’t have a queue per pipeline execution.
So if I submit A and B to the same queue, it still doesn’t gurantee that they will be pulled by the same agent….
not the most intuitive approach but I’ll give it a go
As far I know storage can be https://clear.ml/docs/latest/docs/integrations/storage/#direct-access .
typical EBS is limited to being mounted to 1 machine at a time.
so in this sense, it won’t be too easy to create a solution where multiple machines consume datasets from this storage type
PS https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes-multi.html is possible under some limitations
AgitatedDove14 no clue. new folder outside of any checked out project, copied a single python file…
it is a pickle issue
‘package model doesn’t exist’
despite me attempting to add the right path to sys.path right before loading
CostlyOstrich36 from what I gather the UI creates a task in the background, in status “hidden”, and it has like 10 fields of json configurations…
I think that in principal, if you “intercept” the calls to Model.get() or Dataset.get() from within a task, you can collect the ID’s and do various stuff with them. You can store and visualize it for lineage, or expose it as another hyper parameter I suppose.
You’ll just need the user to name them as part of loading them in the code (in case they are loading multiple datasets/models).
Re. “which task did I clone from” - to my understanding “parent’ field is used for “runtime parent” - i.e. what task started me.
This is not the same as “which task was I cloned from”
SuccessfulKoala55 been experiencing it last few days.
CostlyOstrich36 Lineage information for datasets - oversimplifying but bare with me:
Task should have a section called “input datasets”)
each time I do a Dataset.get() inside a current_task, add the dataset ID to this section
Same can work with InputModel()
This way you can have a full lineage graph (also queryable/visualizable)
AgitatedDove14 let me reach out to my pocket there 😉
I mean, if it’s not tracked, I think it would be a good feature!
AgitatedDove14 decorators. but I would consider to convert it to whatever in order to achieve the above
I want to have a CI/CD pipeline that, upon Engineer A commit, ensures that the pipeline is re-deployed such that with Engineer B uses it as template, it’s definitely the latest version of the code and process