Reputation
Badges 1
90 × Eureka!AgitatedDove14
What was important for me was that the user can define the entire workflow and that I can see its status as one ‘pipeline’ in the UI (vs. disparate tasks).
perform query process records into a labeling assignment Call labeling system API wait for and external hook when labels are ready clean the labels upload them to a dataset
Do you know what specific API do I need to signal “resume” after “abort”?
not “reset” I presume?
I think it has something to do with clearml since I can run this code as pure python without clearml, and when I activate clearml, I see that torch.load() hits the
import_bind
.
__patched_import3
when trying to deserialize the saved model
python 3.8
I’ve worked around the issue by doing:sys.modules['model'] = local_model_package
I will try and get back to this area of the code soon
it is a pickle issue
‘package model doesn’t exist’
despite me attempting to add the right path to sys.path right before loading
could work! is there a way to visualize the pipeline such that this step is “stuck” in executing?
yes and no.
This is a pseudo flow:
Data download -> pre-processing -> model training (e.g. HPT) - > model evaluation (per variant) -> model comparison dashboard -> human selects the best model using a heuristic and the status of the weather -> model packaging -> inference tests etc.
I could divide it into two pipelines:
Data download --> dashboard
Packaging --> …
Where packaging takes a parameter which is the human selected ID of the model.
However, this way, I lose the context of the ent...
CostlyOstrich36 from what I gather the UI creates a task in the background, in status “hidden”, and it has like 10 fields of json configurations…
CostlyOstrich36 not that I am aware of deleting etc.
I didn’t set up the env though…
I tested it again with much smaller data and it seems to work.
I am not sure what is the difference between the use-cases. it seems like something specifically about the particular (big) parent doesn’t agree with clearml…
It seems to work fine when the parent is on clear.ml storage (tried with toy example of data)
Tried with 1.6.0, doesn’t work
#this is the parent clearml-data create --project xxx --name yyy --output-uri
`
clearml-data add folder1
clearml-data close
#this is the child, where XYZ is the parent's id
clearml-data create --project xxx --name yyy1 --parents XYZ --output-uri
clearml-data add folder2
clearml-data close
#now I get the error above `
no, I tried either with very small files or with 20GB as the parent
JitteryCoyote63 how do you detect spot interruption is coming from within the http://clear.ml task in time to mark it as “resume”?
AgitatedDove14 from what I gather there is a lightly documented concept of “multi_instance_support” https://github.com/allegroai/clearml/blob/90854fa4a516fcb38ea0a5ec23894c5a3b6bbc4f/clearml/automation/controller.py#L3296 .
Do you think it can work?
AgitatedDove14 it’s pretty much similar to your proposal but with pipelines instead of tasks, right?
yeah, its a tradeoff that is dependent on parameters that lie outside the realm of human comprehension.
Let’s call if voodoo.
Yes, the manual selection can be done via tagging a model.
The main thing is that I want the selection to be part of the overall flow.
I want the task of human tagging a model to be “just another step in the pipeline”
AgitatedDove14 I haven’t done a full design for this 😉
Just referring to how DVC claims it can detect and invalidate changes in large remote files.
So I take it there is no such feature in http://clear.ml 🙂
AgitatedDove14 Not sure the pipeline decorator is what I need.
Here’s a very simplified example to my question.
Say I want to train my model on some data.
Before adding http://clear.ml , the code looks something like:def train(data_dir, ...): ...
Now I want to leverage the data versioning capability in http://clear.ml
So now, the code needs to fetch dataset by ID, save it locally, and let the model train on it as before:
` from clearml import Dataset
def train_clearml(dataset_id...
I mean that there will be no task created, and no invocation of any http://clear.ml API whatsoever including no imports in the “core ML task” This is the direction - add very small wrappers of http://clear.ml code around the core ML task. The http://clear.ml wrapper is “aware’ of the core ML code, and never the other way. For cases where the wrapper is only “before” and “after” the core ML task, its somewhat easier to achieve. For reporting artifacts etc. which is “mid flow” - it’s m...
that’s the thing. I want to it to appear like one long pipeline, vs. trigger a new set of steps after the approval. So “wait” is a better metaphore for me
AgitatedDove14 nope… you can run md5 on the file as stored in the remote storage (nfs or s3)
which configuration are you passing? are you using any framework for configuration?
AgitatedDove14 no clue. new folder outside of any checked out project, copied a single python file…
CostlyOstrich36pipe.add_step(name='train', parents=['data_pipeline', ], base_task_project='xxx', base_task_name='yyy', parameter_override={'OmegaConf': cfg.trainer})
I want to pass the entire hydra omegaconf as a (nested) dictionary
the above only passes the overrides if I am not mistaken