
Reputation
Badges 1
90 × Eureka!ok, hours of debugging later, I realized that the auto_scaler example initializes a https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L68 the task is initialized on the remote side.
Apparently, https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L103 , doesn’t populate that dict with any keys that don’t already exist in it .
...
But you already have all the entries defined here:
yes but it’s missing a field that is actually found and parsed from my local autoscaler.yaml….
Trust me, I had to add this field to this default dict just so that clearml doesn’t delete it for me
it does appear on the task in the UI, just somehow not repopulated in the remote run if it’s not a part of the default empty dict…
not the most intuitive approach but I’ll give it a go
yeah, its a tradeoff that is dependent on parameters that lie outside the realm of human comprehension.
Let’s call if voodoo.
Yes, the manual selection can be done via tagging a model.
The main thing is that I want the selection to be part of the overall flow.
I want the task of human tagging a model to be “just another step in the pipeline”
It’s more like this:
I have a pipeline, ran on all data.
Now I change/add a sub-dag to the pipeline
I want to run only that sub-dag on all historical data in ad-hoc manner
And then next runs will run the full dag (e.g. only on new data)
and of course this solution forces me to do a git push for all the other dependent modules when creating the task…
Sure, but was wondering if it has more of a “first class citizen” status for tracking… e.g. something you can visualize in the UI or query via API
SweetBadger76 thanks for your reply.
One quirk I found was that even with this flag on, the agent decides to install whatever is in the requirements.txt.
may I also add that PyYAML is the worst thing in the history of python dependency hell?
AgitatedDove14 nope… you can run md5 on the file as stored in the remote storage (nfs or s3)
AgitatedDove14 looks like service-writing-time for me!
PS can you point me to some official example/ doc for how to persist/restore state so that tasks are restartable?
SweetBadger76 I think it’s not related to the flag or whether or not I am running in a virtual env.
I just noticed that even when I clear the list of installed packages in the UI, upon startup, clearml agent still picks up the requirements.txt (after checking out the code) and tries to install it.
I wonder if there’s a way to tell it to skip this step too?
AgitatedDove14 yes, i am passing this flag to the agent with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 clearml-agent….
running inside docker
and it still tries to install the requirements.txt
Using 1.3.1
I think that in principal, if you “intercept” the calls to Model.get() or Dataset.get() from within a task, you can collect the ID’s and do various stuff with them. You can store and visualize it for lineage, or expose it as another hyper parameter I suppose.
You’ll just need the user to name them as part of loading them in the code (in case they are loading multiple datasets/models).
CostlyOstrich36 Lineage information for datasets - oversimplifying but bare with me:
Task should have a section called “input datasets”)
each time I do a Dataset.get() inside a current_task, add the dataset ID to this section
Same can work with InputModel()
This way you can have a full lineage graph (also queryable/visualizable)
nifty trick ! replacing the git metadata inside the task and the rest happens automatically!
As far I know storage can be https://clear.ml/docs/latest/docs/integrations/storage/#direct-access .
typical EBS is limited to being mounted to 1 machine at a time.
so in this sense, it won’t be too easy to create a solution where multiple machines consume datasets from this storage type
PS https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes-multi.html is possible under some limitations
which configuration are you passing? are you using any framework for configuration?
I mean, if it’s not tracked, I think it would be a good feature!
It seems to work fine when the parent is on clear.ml storage (tried with toy example of data)
AgitatedDove14 the emphasis is that the imports I am doing are not from external/pipe packages, they are just neighbouring modules to the function I am importing. Imports that rely on pip installed packages work well
AgitatedDove14 thanks, good idea.
My main issue with this approach is that it breaks the workflow into “a-sync” set of tasks:
One task sends a list of images for labeling and terminates an external webhook calls http://clear.ml and creates a dataset from the labels returned from the labeling task a trigger wakes up the label post processing/splitting logic.
It will be hard to understand where things are standing from looking at the UI.
I was wondering if the “waiting” operator can actua...
CostlyOstrich36 not that I am aware of deleting etc.
I didn’t set up the env though…
no, I tried either with very small files or with 20GB as the parent
AgitatedDove14 thanks, it was late and I wasn’t sure if I needed to use one of clearml “certified” AMI’s or just a vanilla one.
that’s the thing. I want to it to appear like one long pipeline, vs. trigger a new set of steps after the approval. So “wait” is a better metaphore for me