Reputation
Badges 1
90 × Eureka!CostlyOstrich36 If I delete the origin and all other info and set it to tag_name=‘xxx’ then it is able to work
AgitatedDove14 nope… you can run md5 on the file as stored in the remote storage (nfs or s3)
I think it has something to do with clearml since I can run this code as pure python without clearml, and when I activate clearml, I see that torch.load() hits the
import_bind
.
__patched_import3
when trying to deserialize the saved model
I tested it again with much smaller data and it seems to work.
I am not sure what is the difference between the use-cases. it seems like something specifically about the particular (big) parent doesn’t agree with clearml…
AgitatedDove14 thanks, good idea.
My main issue with this approach is that it breaks the workflow into “a-sync” set of tasks:
One task sends a list of images for labeling and terminates an external webhook calls http://clear.ml and creates a dataset from the labels returned from the labeling task a trigger wakes up the label post processing/splitting logic.
It will be hard to understand where things are standing from looking at the UI.
I was wondering if the “waiting” operator can actua...
AgitatedDove14 from what I gather there is a lightly documented concept of “multi_instance_support” https://github.com/allegroai/clearml/blob/90854fa4a516fcb38ea0a5ec23894c5a3b6bbc4f/clearml/automation/controller.py#L3296 .
Do you think it can work?
Trust me, I had to add this field to this default dict just so that clearml doesn’t delete it for me
it does appear on the task in the UI, just somehow not repopulated in the remote run if it’s not a part of the default empty dict…
AgitatedDove14 it’s pretty much similar to your proposal but with pipelines instead of tasks, right?
AgitatedDove14 1.1.5.
Yes - first locally, then it aborts (while running locally presumably).
then I re-enqueue it via the UI and it seems to run on the agent
I will try and get back to this area of the code soon
CostlyOstrich36 I’ve tried the pipeline_from_decorator.py example and it works.
Could it be a sensitivity to some components being on a different python .py file relative to the controller itself?
Tried with 1.6.0, doesn’t work
#this is the parent clearml-data create --project xxx --name yyy --output-uri
`
clearml-data add folder1
clearml-data close
#this is the child, where XYZ is the parent's id
clearml-data create --project xxx --name yyy1 --parents XYZ --output-uri
clearml-data add folder2
clearml-data close
#now I get the error above `
It seems to work fine when the parent is on clear.ml storage (tried with toy example of data)
no, I tried either with very small files or with 20GB as the parent
AgitatedDove14 mv command requires empty folders… so moving b in to a won’t work if some subfolders are already there
python 3.8
I’ve worked around the issue by doing:sys.modules['model'] = local_model_package
AgitatedDove14
the root git path should be part of your PYTHONPATH automatically
That’s true but it doesn’t respect the root package (sources root or whatever).
i.e. if all my packages are runder /path/to/git/root /src/
So I had to add it explicitly via a docker init script…
AgitatedDove14 yes, i am passing this flag to the agent with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 clearml-agent….
running inside docker
and it still tries to install the requirements.txt
Using 1.3.1
CostlyOstrich36 all tasks are remote.
conrtoller - tried both
CostlyOstrich36 I confirm this was the case.
So :
module_a.py @PipelineDecorator.pipeline()... from module_b import my_func x = my_func()
` modele_b.py
@PipelineDecorator.component()
def my_func()
pass `
Under this circumstances, the pipeline is created correctly and run correctly
But when I clone it (or click “Run” and submit) - it fails with the error above.
Moving my_func from module_a to module_b solves this.
To me this looks like a bug or unreasonable and undocumented...
@ https://app.slack.com/team/UT8T0V3NE is there a non-free version support for the feature of preempting lower priority tasks to allow a higher priority task to come in?
But you already have all the entries defined here:
yes but it’s missing a field that is actually found and parsed from my local autoscaler.yaml….
AgitatedDove14 I see the continue_pipeline
f flag.
I want to resume the same instance of the pipeline.
When I want to resume the pipeilne, I can only re-enqueue it - I cannot reset parameters (right?)
So it seems that for the pipeline to resume with the “continue pipeline” mode,
I need to pass the “continue_pipeline” first time I submit the pipeline.
Hopefully it will be ignored during the first run and just behave like a new run, and only really kick in when the pipeline is resumed....
SmugHippopotamus96 how did this setup work for you? are you using an autoscaling node group for the jobs?
with or without GPU?
Any additional tips on usage?
I mean that there will be no task created, and no invocation of any http://clear.ml API whatsoever including no imports in the “core ML task” This is the direction - add very small wrappers of http://clear.ml code around the core ML task. The http://clear.ml wrapper is “aware’ of the core ML code, and never the other way. For cases where the wrapper is only “before” and “after” the core ML task, its somewhat easier to achieve. For reporting artifacts etc. which is “mid flow” - it’s m...
AgitatedDove14 can you share if there is a plan to put the gcp autoscaler in the open source?
I think it works.
small correction - use slash and not dot in configuration/OmegaConf:parameter_override={'configuration/OmegaConf': dict...')})
and for the record - to override hydra params the syntax is :parameter_override={'Hydra/x.y':1234}
where x.y=1234 is how you would override the param via the cli
I want to pass the entire hydra omegaconf as a (nested) dictionary