
Reputation
Badges 1
83 × Eureka!AgitatedDove14 mv command requires empty folders… so moving b in to a won’t work if some subfolders are already there
if the state is :
a:a a/.DS_Store a/1.txt a/b a/b/.DS_Store a/b/1.txt a/b/c a/b/c/1.txt
Dataset B:b b/2.txt b/c b/c/2.txt
Then the commandmv b a/
returns error since a/ is not empty.
That’s exactly the issue…
As a result, I need to do somethig which copies the files (e.g. cp -r or StorageManager.upload_folder(‘b’, ‘a’)
but this is expensive
Sure, but was wondering if it has more of a “first class citizen” status for tracking… e.g. something you can visualize in the UI or query via API
I mean, if it’s not tracked, I think it would be a good feature!
Re. “which task did I clone from” - to my understanding “parent’ field is used for “runtime parent” - i.e. what task started me.
This is not the same as “which task was I cloned from”
DeliciousBluewhale87 what solution did you land on for this?
is that because you couldn’t find a good way to have a “manual approval/selection” step in http://clear.ml ?
Apart from that seems that pipeline task could have worked?
JitteryCoyote63 how do you detect spot interruption is coming from within the http://clear.ml task in time to mark it as “resume”?
I think that in principal, if you “intercept” the calls to Model.get() or Dataset.get() from within a task, you can collect the ID’s and do various stuff with them. You can store and visualize it for lineage, or expose it as another hyper parameter I suppose.
You’ll just need the user to name them as part of loading them in the code (in case they are loading multiple datasets/models).
AgitatedDove14 let me reach out to my pocket there 😉
the above only passes the overrides if I am not mistaken
I think it works.
small correction - use slash and not dot in configuration/OmegaConf:parameter_override={'configuration/OmegaConf': dict...')})
I want to pass the entire hydra omegaconf as a (nested) dictionary
What I’d like is to do Dataset.get(“b”, to=‘a’) and have the download land the files directly there
which configuration are you passing? are you using any framework for configuration?
Yes, but this is not the use-case.
The use-case is that I have a local folder and I want to merge a dataset into it without re-fetching the local folder…
IrritableGiraffe81 AgitatedDove14 there are multiple levels of what the CI/CD should automate/validate.
This one is the minimal option.
Another option is:
CI deploys (executes) the pipeline fresh, from the committed code http://2.CI waits and extracts the results (various artifacts, metrics etc.) CI compares them to the latest (published) pipeline or to absolute numbers CI decides if to publish it or not (or at least tag it as RC.Steps 2-4 can be themselves encapsulated in a clearml task ...
AgitatedDove14 thanks, it was late and I wasn’t sure if I needed to use one of clearml “certified” AMI’s or just a vanilla one.
nifty trick ! replacing the git metadata inside the task and the rest happens automatically!
However I see I should really have made my question clearer.
My workflow is as follows:
Engineer A develops a pipeline with a number of steps. She experiments with this pipeline until she is happy with the flow and her code
Engineer B is in charge of running Engineer A’s pipeline with different parameters and investigate the results
I want to have a CI/CD pipeline that, upon Engineer A commit, ensures that the pipeline is re-deployed such that with Engineer B uses it as template, it’s definitely the latest version of the code and process
The training pipeline that is considered “best of breed” is committed to Git and deployed by CI/CD; tagged in ClearML clearly.
Users of this pipeline know it’s the “official” training flow that they can now play with using configuration.
Goal is to ensure that “official” pipelines are source controlled.
makes sense?
I suppose that yes; and I want this task to be labeled as such that it’s clear it’s the “production” task.
CostlyOstrich36 Lineage information for datasets - oversimplifying but bare with me:
Task should have a section called “input datasets”)
each time I do a Dataset.get() inside a current_task, add the dataset ID to this section
Same can work with InputModel()
This way you can have a full lineage graph (also queryable/visualizable)