
Reputation
Badges 1
90 × Eureka!I suppose that yes; and I want this task to be labeled as such that it’s clear it’s the “production” task.
I tested it again with much smaller data and it seems to work.
I am not sure what is the difference between the use-cases. it seems like something specifically about the particular (big) parent doesn’t agree with clearml…
I think that in principal, if you “intercept” the calls to Model.get() or Dataset.get() from within a task, you can collect the ID’s and do various stuff with them. You can store and visualize it for lineage, or expose it as another hyper parameter I suppose.
You’ll just need the user to name them as part of loading them in the code (in case they are loading multiple datasets/models).
is that because you couldn’t find a good way to have a “manual approval/selection” step in http://clear.ml ?
Apart from that seems that pipeline task could have worked?
CostlyOstrich36 from what I gather the UI creates a task in the background, in status “hidden”, and it has like 10 fields of json configurations…
not the most intuitive approach but I’ll give it a go
Re. “which task did I clone from” - to my understanding “parent’ field is used for “runtime parent” - i.e. what task started me.
This is not the same as “which task was I cloned from”
Engineer B is in charge of running Engineer A’s pipeline with different parameters and investigate the results
ok, hours of debugging later, I realized that the auto_scaler example initializes a https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L68 the task is initialized on the remote side.
Apparently, https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L103 , doesn’t populate that dict with any keys that don’t already exist in it .
...
As far I know storage can be https://clear.ml/docs/latest/docs/integrations/storage/#direct-access .
typical EBS is limited to being mounted to 1 machine at a time.
so in this sense, it won’t be too easy to create a solution where multiple machines consume datasets from this storage type
PS https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes-multi.html is possible under some limitations
the above only passes the overrides if I am not mistaken
RAM=16
Task consumed 32GB memory total (had to add 16GB of swap)
AgitatedDove14 thanks, it was late and I wasn’t sure if I needed to use one of clearml “certified” AMI’s or just a vanilla one.
Trust me, I had to add this field to this default dict just so that clearml doesn’t delete it for me
it does appear on the task in the UI, just somehow not repopulated in the remote run if it’s not a part of the default empty dict…
AgitatedDove14 thanks, good idea.
My main issue with this approach is that it breaks the workflow into “a-sync” set of tasks:
One task sends a list of images for labeling and terminates an external webhook calls http://clear.ml and creates a dataset from the labels returned from the labeling task a trigger wakes up the label post processing/splitting logic.
It will be hard to understand where things are standing from looking at the UI.
I was wondering if the “waiting” operator can actua...
AgitatedDove14
Sort of.
I would go with something which is more like:
` execution_plan = {'step_b':'b_result', step_c: None, ...}
@PipelineDecorator.pipeline(...)
def pipeline(execution_plan):
step_results = {}
for step in pipeline.get_dag():
if step.name in execution_plan.keys():
step_results[step.name] = execution_plan[step.name] or step(**step_results)
`The ‘execution plan’ specifies list of steps to run (keys) and for each, whether we should use a u...
and of course this solution forces me to do a git push for all the other dependent modules when creating the task…
sure CostlyOstrich36
I have something like the following:
@PipelineDecorator.component(....) def my_task(...) from my_module1 import my_func1 from my_modeul2 import ....
my_module1 and 2 are modules that are a part of the same project source. they don’t come as a separate package.
Now when I run this in clearml, these imports don’t work.
These functions may require transitive imports of course, so the following doesn’t work:
` PipelineDecorator.component(helper_function=[my_fu...
AgitatedDove14 1.1.5.
Yes - first locally, then it aborts (while running locally presumably).
then I re-enqueue it via the UI and it seems to run on the agent
AgitatedDove14
What was important for me was that the user can define the entire workflow and that I can see its status as one ‘pipeline’ in the UI (vs. disparate tasks).
perform query process records into a labeling assignment Call labeling system API wait for and external hook when labels are ready clean the labels upload them to a dataset
Do you know what specific API do I need to signal “resume” after “abort”?
not “reset” I presume?
Sure, but was wondering if it has more of a “first class citizen” status for tracking… e.g. something you can visualize in the UI or query via API
AgitatedDove14 the emphasis is that the imports I am doing are not from external/pipe packages, they are just neighbouring modules to the function I am importing. Imports that rely on pip installed packages work well
SmugHippopotamus96 how did this setup work for you? are you using an autoscaling node group for the jobs?
with or without GPU?
Any additional tips on usage?
IrritableGiraffe81 AgitatedDove14 there are multiple levels of what the CI/CD should automate/validate.
This one is the minimal option.
Another option is:
CI deploys (executes) the pipeline fresh, from the committed code http://2.CI waits and extracts the results (various artifacts, metrics etc.) CI compares them to the latest (published) pipeline or to absolute numbers CI decides if to publish it or not (or at least tag it as RC.Steps 2-4 can be themselves encapsulated in a clearml task ...
if the state is :
a:a a/.DS_Store a/1.txt a/b a/b/.DS_Store a/b/1.txt a/b/c a/b/c/1.txt
Dataset B:b b/2.txt b/c b/c/2.txt
Then the commandmv b a/
returns error since a/ is not empty.
That’s exactly the issue…
As a result, I need to do somethig which copies the files (e.g. cp -r or StorageManager.upload_folder(‘b’, ‘a’)
but this is expensive