Reputation
Badges 1
90 × Eureka!and for the record - to override hydra params the syntax is :parameter_override={'Hydra/x.y':1234}where x.y=1234 is how you would override the param via the cli
DeliciousBluewhale87 what solution did you land on for this?
It’s more like this:
I have a pipeline, ran on all data.
Now I change/add a sub-dag to the pipeline
I want to run only that sub-dag on all historical data in ad-hoc manner
And then next runs will run the full dag (e.g. only on new data)
AgitatedDove14
Sort of.
I would go with something which is more like:
` execution_plan = {'step_b':'b_result', step_c: None, ...}
@PipelineDecorator.pipeline(...)
def pipeline(execution_plan):
step_results = {}
for step in pipeline.get_dag():
if step.name in execution_plan.keys():
step_results[step.name] = execution_plan[step.name] or step(**step_results)
`The ‘execution plan’ specifies list of steps to run (keys) and for each, whether we should use a u...
SweetBadger76 I think it’s not related to the flag or whether or not I am running in a virtual env.
I just noticed that even when I clear the list of installed packages in the UI, upon startup, clearml agent still picks up the requirements.txt (after checking out the code) and tries to install it.
I wonder if there’s a way to tell it to skip this step too?
AgitatedDove14 from what I gather there is a lightly documented concept of “multi_instance_support” https://github.com/allegroai/clearml/blob/90854fa4a516fcb38ea0a5ec23894c5a3b6bbc4f/clearml/automation/controller.py#L3296 .
Do you think it can work?
yes and no.
This is a pseudo flow:
Data download -> pre-processing -> model training (e.g. HPT) - > model evaluation (per variant) -> model comparison dashboard -> human selects the best model using a heuristic and the status of the weather -> model packaging -> inference tests etc.
I could divide it into two pipelines:
Data download --> dashboard
Packaging --> …
Where packaging takes a parameter which is the human selected ID of the model.
However, this way, I lose the context of the ent...
AgitatedDove14 I tried your idea.
See code below.
Once the pipeline exists, I use the ui -> enqueue.
However it does seem to repeat the first task again when I (re) enqueue it.
Any ideas?
` from time import sleep
from clearml import PipelineDecorator, Task, TaskTypes
@PipelineDecorator.component(execution_queue='default', return_values=['message'], task_type=TaskTypes.data_processing)
def get_dateset_id():
message = "ccd8a65770e1407394cd3648246e4d25"
return message
@PipelineDecora...
RAM=16
Task consumed 32GB memory total (had to add 16GB of swap)
IrritableGiraffe81 AgitatedDove14 there are multiple levels of what the CI/CD should automate/validate.
This one is the minimal option.
Another option is:
CI deploys (executes) the pipeline fresh, from the committed code http://2.CI waits and extracts the results (various artifacts, metrics etc.) CI compares them to the latest (published) pipeline or to absolute numbers CI decides if to publish it or not (or at least tag it as RC.Steps 2-4 can be themselves encapsulated in a clearml task ...
AgitatedDove14 mv command requires empty folders… so moving b in to a won’t work if some subfolders are already there
CostlyOstrich36 all tasks are remote.
conrtoller - tried both
AgitatedDove14 decorators. but I would consider to convert it to whatever in order to achieve the above
I want to pass the entire hydra omegaconf as a (nested) dictionary
AgitatedDove14 nope… you can run md5 on the file as stored in the remote storage (nfs or s3)
may I also add that PyYAML is the worst thing in the history of python dependency hell?
and of course this solution forces me to do a git push for all the other dependent modules when creating the task…
JitteryCoyote63 how do you detect spot interruption is coming from within the http://clear.ml task in time to mark it as “resume”?
CostlyOstrich36 Lineage information for datasets - oversimplifying but bare with me:
Task should have a section called “input datasets”)
each time I do a Dataset.get() inside a current_task, add the dataset ID to this section
Same can work with InputModel()
This way you can have a full lineage graph (also queryable/visualizable)
ok, hours of debugging later, I realized that the auto_scaler example initializes a https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L68 the task is initialized on the remote side.
Apparently, https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L103 , doesn’t populate that dict with any keys that don’t already exist in it .
...
SweetBadger76 thanks for your reply.
One quirk I found was that even with this flag on, the agent decides to install whatever is in the requirements.txt.
CostlyOstrich36 from what I gather the UI creates a task in the background, in status “hidden”, and it has like 10 fields of json configurations…
nifty trick ! replacing the git metadata inside the task and the rest happens automatically!
I mean that there will be no task created, and no invocation of any http://clear.ml API whatsoever including no imports in the “core ML task” This is the direction - add very small wrappers of http://clear.ml code around the core ML task. The http://clear.ml wrapper is “aware’ of the core ML code, and never the other way. For cases where the wrapper is only “before” and “after” the core ML task, its somewhat easier to achieve. For reporting artifacts etc. which is “mid flow” - it’s m...
CostlyOstrich36 I confirm this was the case.
So :
module_a.py @PipelineDecorator.pipeline()... from module_b import my_func x = my_func()
` modele_b.py
@PipelineDecorator.component()
def my_func()
pass `
Under this circumstances, the pipeline is created correctly and run correctly
But when I clone it (or click “Run” and submit) - it fails with the error above.
Moving my_func from module_a to module_b solves this.
To me this looks like a bug or unreasonable and undocumented...
AgitatedDove14 it’s pretty much similar to your proposal but with pipelines instead of tasks, right?
AgitatedDove14
the root git path should be part of your PYTHONPATH automatically
That’s true but it doesn’t respect the root package (sources root or whatever).
i.e. if all my packages are runder /path/to/git/root /src/
So I had to add it explicitly via a docker init script…
