
Reputation
Badges 1
90 × Eureka!CostlyOstrich36 If I delete the origin and all other info and set it to tag_name=‘xxx’ then it is able to work
not the most intuitive approach but I’ll give it a go
yes and no.
This is a pseudo flow:
Data download -> pre-processing -> model training (e.g. HPT) - > model evaluation (per variant) -> model comparison dashboard -> human selects the best model using a heuristic and the status of the weather -> model packaging -> inference tests etc.
I could divide it into two pipelines:
Data download --> dashboard
Packaging --> …
Where packaging takes a parameter which is the human selected ID of the model.
However, this way, I lose the context of the ent...
AgitatedDove14 thanks, it was late and I wasn’t sure if I needed to use one of clearml “certified” AMI’s or just a vanilla one.
CostlyOstrich36 I confirm this was the case.
So :
module_a.py @PipelineDecorator.pipeline()... from module_b import my_func x = my_func()
` modele_b.py
@PipelineDecorator.component()
def my_func()
pass `
Under this circumstances, the pipeline is created correctly and run correctly
But when I clone it (or click “Run” and submit) - it fails with the error above.
Moving my_func from module_a to module_b solves this.
To me this looks like a bug or unreasonable and undocumented...
CostlyOstrich36 all tasks are remote.
conrtoller - tried both
AgitatedDove14 looks like service-writing-time for me!
PS can you point me to some official example/ doc for how to persist/restore state so that tasks are restartable?
AgitatedDove14 I see the continue_pipeline
f flag.
I want to resume the same instance of the pipeline.
When I want to resume the pipeilne, I can only re-enqueue it - I cannot reset parameters (right?)
So it seems that for the pipeline to resume with the “continue pipeline” mode,
I need to pass the “continue_pipeline” first time I submit the pipeline.
Hopefully it will be ignored during the first run and just behave like a new run, and only really kick in when the pipeline is resumed....
could work! is there a way to visualize the pipeline such that this step is “stuck” in executing?
SweetBadger76 I think it’s not related to the flag or whether or not I am running in a virtual env.
I just noticed that even when I clear the list of installed packages in the UI, upon startup, clearml agent still picks up the requirements.txt (after checking out the code) and tries to install it.
I wonder if there’s a way to tell it to skip this step too?
CostlyOstrich36 from what I gather the UI creates a task in the background, in status “hidden”, and it has like 10 fields of json configurations…
nifty trick ! replacing the git metadata inside the task and the rest happens automatically!
So “The” pipeline Engineer A creates, once updated with the latest code, and perhaps ran once as test by CI CD, should be “tainted” as “The production” version of that pipeline, so that Engineer B’s code always uses the latest released pipeline code
However I see I should really have made my question clearer.
My workflow is as follows:
Engineer A develops a pipeline with a number of steps. She experiments with this pipeline until she is happy with the flow and her code
I suppose that yes; and I want this task to be labeled as such that it’s clear it’s the “production” task.
I want to have a CI/CD pipeline that, upon Engineer A commit, ensures that the pipeline is re-deployed such that with Engineer B uses it as template, it’s definitely the latest version of the code and process
Engineer B is in charge of running Engineer A’s pipeline with different parameters and investigate the results
AgitatedDove14 no clue. new folder outside of any checked out project, copied a single python file…
if the state is :
a:a a/.DS_Store a/1.txt a/b a/b/.DS_Store a/b/1.txt a/b/c a/b/c/1.txt
Dataset B:b b/2.txt b/c b/c/2.txt
Then the commandmv b a/
returns error since a/ is not empty.
That’s exactly the issue…
As a result, I need to do somethig which copies the files (e.g. cp -r or StorageManager.upload_folder(‘b’, ‘a’)
but this is expensive
What I’d like is to do Dataset.get(“b”, to=‘a’) and have the download land the files directly there
The training pipeline that is considered “best of breed” is committed to Git and deployed by CI/CD; tagged in ClearML clearly.
Users of this pipeline know it’s the “official” training flow that they can now play with using configuration.
Goal is to ensure that “official” pipelines are source controlled.
makes sense?
and of course this solution forces me to do a git push for all the other dependent modules when creating the task…
RAM=16
Task consumed 32GB memory total (had to add 16GB of swap)