Reputation
Badges 1
90 × Eureka!As far I know storage can be https://clear.ml/docs/latest/docs/integrations/storage/#direct-access .
typical EBS is limited to being mounted to 1 machine at a time.
so in this sense, it won’t be too easy to create a solution where multiple machines consume datasets from this storage type
PS https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes-multi.html is possible under some limitations
Trust me, I had to add this field to this default dict just so that clearml doesn’t delete it for me
it does appear on the task in the UI, just somehow not repopulated in the remote run if it’s not a part of the default empty dict…
I mean, if it’s not tracked, I think it would be a good feature!
AgitatedDove14 it’s pretty much similar to your proposal but with pipelines instead of tasks, right?
IrritableGiraffe81 AgitatedDove14 there are multiple levels of what the CI/CD should automate/validate.
This one is the minimal option.
Another option is:
CI deploys (executes) the pipeline fresh, from the committed code http://2.CI waits and extracts the results (various artifacts, metrics etc.) CI compares them to the latest (published) pipeline or to absolute numbers CI decides if to publish it or not (or at least tag it as RC.Steps 2-4 can be themselves encapsulated in a clearml task ...
AgitatedDove14 from what I gather there is a lightly documented concept of “multi_instance_support” https://github.com/allegroai/clearml/blob/90854fa4a516fcb38ea0a5ec23894c5a3b6bbc4f/clearml/automation/controller.py#L3296 .
Do you think it can work?
CostlyOstrich36 Lineage information for datasets - oversimplifying but bare with me:
Task should have a section called “input datasets”)
each time I do a Dataset.get() inside a current_task, add the dataset ID to this section
Same can work with InputModel()
This way you can have a full lineage graph (also queryable/visualizable)
@ https://app.slack.com/team/UT8T0V3NE is there a non-free version support for the feature of preempting lower priority tasks to allow a higher priority task to come in?
and of course this solution forces me to do a git push for all the other dependent modules when creating the task…
AgitatedDove14 the emphasis is that the imports I am doing are not from external/pipe packages, they are just neighbouring modules to the function I am importing. Imports that rely on pip installed packages work well
AgitatedDove14 I haven’t done a full design for this 😉
Just referring to how DVC claims it can detect and invalidate changes in large remote files.
So I take it there is no such feature in http://clear.ml 🙂
I can try, but it will then damage the download speeds. Anyhow not a reasonable behavior in my opinion
SweetBadger76 I think it’s not related to the flag or whether or not I am running in a virtual env.
I just noticed that even when I clear the list of installed packages in the UI, upon startup, clearml agent still picks up the requirements.txt (after checking out the code) and tries to install it.
I wonder if there’s a way to tell it to skip this step too?
However I see I should really have made my question clearer.
My workflow is as follows:
Engineer A develops a pipeline with a number of steps. She experiments with this pipeline until she is happy with the flow and her code
I want to have a CI/CD pipeline that, upon Engineer A commit, ensures that the pipeline is re-deployed such that with Engineer B uses it as template, it’s definitely the latest version of the code and process
I suppose that yes; and I want this task to be labeled as such that it’s clear it’s the “production” task.
python 3.8
I’ve worked around the issue by doing:sys.modules['model'] = local_model_package
I mean that there will be no task created, and no invocation of any http://clear.ml API whatsoever including no imports in the “core ML task” This is the direction - add very small wrappers of http://clear.ml code around the core ML task. The http://clear.ml wrapper is “aware’ of the core ML code, and never the other way. For cases where the wrapper is only “before” and “after” the core ML task, its somewhat easier to achieve. For reporting artifacts etc. which is “mid flow” - it’s m...
CostlyOstrich36 all tasks are remote.
conrtoller - tried both
not the most intuitive approach but I’ll give it a go
DeliciousBluewhale87 what solution did you land on for this?
SuccessfulKoala55 been experiencing it last few days.
Re. “which task did I clone from” - to my understanding “parent’ field is used for “runtime parent” - i.e. what task started me.
This is not the same as “which task was I cloned from”