Reputation
Badges 1
131 × Eureka!Well it is also failing within the same file if you read until the end, but for the cross-file issue, it's mostly because of my repo architecture organized in a v1/v2 scheme and I didn't want to pull a lot of unused files and inject github PATs that frankly lack gralunarity in the worker
Well given a file architecture looking like this:|_ __init__.py |_ my_pipeline.py |_ my_utils.py
With the content of my_pipeline.py
being:
` from clearml.automation.controller import PipelineDecorator
from clearml import Task, TaskTypes
from my_utils import do_thing
Task.force_store_standalone_script()
@PipelineDecorator.component(...)
def my_component(dataset_id: str):
import pandas as pd
from clearml import Dataset
dataset = Dataset.get(dataset_id=input_dataset_id...
I can test it empirically but I want to be sure what is the expected behavior so my pipeline don't get auto-magically broken after a patch
Well at this point I might as well try to write a PR implementing the behavior I described above
Okay looks like the call dependency resolver does not supports cross-file calls and relies instead on the local repo cloning feature to handle multiple files so the Task.force_store_standalone_script()
does not allow for a pipeline defined cross multiple files (now that you think of it it was kinda implied by the name), but what is interesting is that calling an auxiliary function in the SAME file from a component also raise a NameError: <function_name> is not defined
, that's ki...
Would have been great if the CLearML resolver would just inline the code of locally defined vanilla functions and execute that inlined code under the import scope of the component from which it is called
You can specify default storage string on projects pointing to for instance a S3 bucket
Nice it works 😍
I'll try to update the version in the image I provide to the workers of th autoscaler app (but sadly I don't control the version of those in itself since it's CLearML managed)
It doesnt seem so if you look at the REST api documentation, might be available as an ENterprise plan feature
Yes but not in the controller itself, which is also remotely executed in a docker container
Well I simply duplicated code across my components instead of centraliwing the operations that needed that env variable in the controller
I'm considering doing a PR in a few days to add the param if it is not too complex
And after your modifications are made you can use . https://clear.ml/docs/latest/docs/references/sdk/dataset/#datasetsquash to squash your modified subset with the main dataset if you want to re-integrate it in your flow. But I don't remember if squash requires the both datasets to be present locally or not...
I checked the 'CPU-only' option in the auto-scaler config but that's seemed logic at the time
And this is a standard pro saas deployment, the autoscaler scale up was triggered by the remote execution attempt of a pipeline
Ohwow, okay Ill test it with another type
Nice, thank you for the reactivity ❤
Fix confirmed on our side CostlyOstrich36 thanks for everything!
Old tags are not deleted. When executing a Task (experiment) remotely, this method has no effect).
This description in the add_tags()
doc intrigues me tho, I would like to remove a tag from a dataset and add it to another version (eg: a used_in_last_training
tag) and this method seems to only add new tags.
And additionally does the When executing a Task (experiment) remotely, this method has no effect).
part means that if it is executed in a remote worker inside a pipeline without the dataset downloaded the method will have no effect ?
That might be an issue with clearml itself that fails to send proper resources if you change the path, that kind of path modification can be a hussle, if you have a domain name available I would suggest making a subdomain of it point to the ip of your clearml machine and just add site-enabled on nginx to point on it rather than doing a proxy pass
You need to set a specific port for your clearml server and then just set a rule in your reverse proxy (eg. nginx) to make point the specific route you want toward that port
alias
: "Alias of the dataset. If set, the ‘alias : dataset ID’ key-value pair will be set under the hyperparameters section ‘Datasets’"
I have to concede that I found that description a bit vague at first but if you check that https://clear.ml/docs/latest/docs/clearml_data/best_practices#organize-datasets-for-easier-access you see that:
"In cases where you use a dataset in a task (e.g. consuming a dataset), you can easily track which dataset the task is using by using ` Dataset.get...
Okay, thank you for the explanations!
We're using Ray for hyperparameter search for non-CV model successfully on ClearML
Hey @<1523701087100473344:profile|SuccessfulKoala55> this is a fairly small dataset with a linear hierarchy of ~300 version and a size of ~2GBs
In the meantime is there some way to set a retention policy for the dataset versions ?
Thanks a lot @<1523701435869433856:profile|SmugDolphin23> ❤