Reputation
Badges 1
90 × Eureka!Sure, but was wondering if it has more of a “first class citizen” status for tracking… e.g. something you can visualize in the UI or query via API
AgitatedDove14 thanks, it was late and I wasn’t sure if I needed to use one of clearml “certified” AMI’s or just a vanilla one.
The training pipeline that is considered “best of breed” is committed to Git and deployed by CI/CD; tagged in ClearML clearly.
Users of this pipeline know it’s the “official” training flow that they can now play with using configuration.
Goal is to ensure that “official” pipelines are source controlled.
makes sense?
So “The” pipeline Engineer A creates, once updated with the latest code, and perhaps ran once as test by CI CD, should be “tainted” as “The production” version of that pipeline, so that Engineer B’s code always uses the latest released pipeline code
However I see I should really have made my question clearer.
My workflow is as follows:
Engineer A develops a pipeline with a number of steps. She experiments with this pipeline until she is happy with the flow and her code
Engineer B is in charge of running Engineer A’s pipeline with different parameters and investigate the results
nifty trick ! replacing the git metadata inside the task and the rest happens automatically!
Yes, but this is not the use-case.
The use-case is that I have a local folder and I want to merge a dataset into it without re-fetching the local folder…
is that because you couldn’t find a good way to have a “manual approval/selection” step in http://clear.ml ?
Apart from that seems that pipeline task could have worked?
which configuration are you passing? are you using any framework for configuration?
may I also add that PyYAML is the worst thing in the history of python dependency hell?
AgitatedDove14 looks like service-writing-time for me!
PS can you point me to some official example/ doc for how to persist/restore state so that tasks are restartable?
It’s more like this:
I have a pipeline, ran on all data.
Now I change/add a sub-dag to the pipeline
I want to run only that sub-dag on all historical data in ad-hoc manner
And then next runs will run the full dag (e.g. only on new data)
AgitatedDove14
Sort of.
I would go with something which is more like:
` execution_plan = {'step_b':'b_result', step_c: None, ...}
@PipelineDecorator.pipeline(...)
def pipeline(execution_plan):
step_results = {}
for step in pipeline.get_dag():
if step.name in execution_plan.keys():
step_results[step.name] = execution_plan[step.name] or step(**step_results)
`The ‘execution plan’ specifies list of steps to run (keys) and for each, whether we should use a u...
SweetBadger76 thanks for your reply.
One quirk I found was that even with this flag on, the agent decides to install whatever is in the requirements.txt.
SweetBadger76 I think it’s not related to the flag or whether or not I am running in a virtual env.
I just noticed that even when I clear the list of installed packages in the UI, upon startup, clearml agent still picks up the requirements.txt (after checking out the code) and tries to install it.
I wonder if there’s a way to tell it to skip this step too?
JitteryCoyote63 how do you detect spot interruption is coming from within the http://clear.ml task in time to mark it as “resume”?
ok, hours of debugging later, I realized that the auto_scaler example initializes a https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L68 the task is initialized on the remote side.
Apparently, https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L103 , doesn’t populate that dict with any keys that don’t already exist in it .
...
DeliciousBluewhale87 what solution did you land on for this?
RAM=16
Task consumed 32GB memory total (had to add 16GB of swap)
AgitatedDove14
What was important for me was that the user can define the entire workflow and that I can see its status as one ‘pipeline’ in the UI (vs. disparate tasks).
perform query process records into a labeling assignment Call labeling system API wait for and external hook when labels are ready clean the labels upload them to a dataset
Do you know what specific API do I need to signal “resume” after “abort”?
not “reset” I presume?
CostlyOstrich36 yes, for the cache.
AgitatedDove14 I am not sure queue will be sufficient. it would require a queue per execution of the pipeline.
Really what I need is for A and B to be separate tasks, but guarantee they will be assigned to the same machine so that the clearml dataset cache on that machine will be warm.
Is there a way to group A and B into a sub-pipeline, have the pipeline be queued and executed remotely, but the tasks A and B inside it be treated like local tasks? or s...
not sure I follow.
how can a cronjob solve this for me?
I want to manage the dataset creation task(s) in http://clear.ml .
This flow is triggered say manually whenever I want to create a train/test set for my model.
it just so happens that somewhere in this flow, the code needs to “wait” for days/weeks for the assignment to be ready.
AgitatedDove14 I tried your idea.
See code below.
Once the pipeline exists, I use the ui -> enqueue.
However it does seem to repeat the first task again when I (re) enqueue it.
Any ideas?
` from time import sleep
from clearml import PipelineDecorator, Task, TaskTypes
@PipelineDecorator.component(execution_queue='default', return_values=['message'], task_type=TaskTypes.data_processing)
def get_dateset_id():
message = "ccd8a65770e1407394cd3648246e4d25"
return message
@PipelineDecora...
What I’d like is to do Dataset.get(“b”, to=‘a’) and have the download land the files directly there