Reputation
Badges 1
15 × Eureka!I've build a container using the same image used by agent.
Training ran with no errors
I've also tried with clearml-1.6.5rc2, got same error
I am lost š
Hi there,
This is exactly I want to do.
RoughTiger69
Have you be able to do it?
This is not a valid parameter: https://clear.ml/docs/latest/docs/references/sdk/task#taskinit
Also I did not find any usage example of the setup_upload
method
Thanks anyway
AgitatedDove14 Worked!
But a new error raises:
` File "kgraph/pipelines/token_join/train/pipeline.py", line 48, in main
timestamp = pd.to_datetime(data_timestamp) if data_timestamp is not None else get_latest_version(feature_view_name)
File "/root/.clearml/venvs-builds/3.8/task_repository/Data-Science/kgraph/featurestore/query_data.py", line 77, in get_latest_version
fv = store.get_feature_view(fv_name)
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/feast/u...
AgitatedDove14 , thanks for the quick answer.
I think this is the easiest way, basically the CI/CD launches a pipeline (which under the hood is another type of Task), by querying the latest āPublishedā pipeline that is also Not archived, then cloning+pushing it to execution queue
Do you have an example?
UI when you want to āupgradeā the production pipeline you just right click āPublishā on the pipeline
Iāve did saw this āpublishā option for pipelines, just for models, is thi...
Hi there,
PanickyMoth78
I am having the same issue.
Some steps of the pipeline create huge datasets (some GBs) that I donāt want to upload or save.
Wrap the returns in a dict could be a solution, but honestly, I donāt like it.
AgitatedDove14 Is there any better way to avoid the upload of some artifacts of pipeline steps?
The image above shows an example of the first step of a training pipeline, that queries data from a feature store.
It gets the DataFrame, zip and upload it (this one i...
Pipelines runs on the same machine.
We already have the feature-store to save all data, thatās why I donāt need to save it (just a reference of version of dataset).
I understand your point.
I can have different steps of the pipeline running on different machines. But this is not my use case.
that makes sense, so why donāt you point to the feature store ?
I did, the first step of the pipeline query the feature store. I mean, I set the data version as a parameter, then this steps query the data and return it (to be used in the next step)
this will cause them to get serialized to the local machineās file system, wdyt?
I am about the disk space usage that may increase over time.
I just prefer do not worry about that
The transformation has nome parameters that we change eventually
I could merge some steps, but as I may want to cache them in the future, I prefer to keep them separate
I see now.
I didnāt know that each steps runs in a different process
Thus, the return data from step 2 needs to be available somewhere to be used in step 3.
So, how wrap the returns in a dict could be a solution?
It will serialize the data on the dict? (leading to the same result, data storage somewhere)
AgitatedDove14
How do you recommend to perform this task?
I mean, have a CI/CD (e.g Github Actions) thats update my āproductionā pipeline on ClearML UI, so a Data Scientist can start to experiment things and create jobs from the UI.
Got it!
Thanks AgitatedDove14
AgitatedDove14 Thanks for the explanation
I got it.
How I can use force_requirements_env_freeze
with PipelineDecorator()
as I do not have the Task object created.@PipelineDecorator.pipeline(name='training', project='kgraph', version='1.2') def main(feature_view_name, data_timestamp=None, tk_list=None): """Pipeline to train ...
I donāt think so AgitatedDove14
Iāve tested with:
PipelineDecorator.debug_pipeline() PipelineDecorator.run_locally() Docker
Iāve got no error
AgitatedDove14 is that the expect behavior for Pipelines?
Hi MotionlessCoral18
Are you running the agent inside a container?
Would you mind to share your Dockerfile?
SubstantialElk6
Only today I've saw your comments (did not get notified for some reason)
Thanks for you suggestions
Found the issue.
For some reason, all parameters on the main functions are passed as strings.
So I have these parameters:
@PipelineDecorator.pipeline(name='Build Embeddings', project='kgraph', version='1.3') def main(tk_list=[], ngram_size=2): ...
The ngram_size variable is a int when using PipelineDecorator.debug_pipeline()
and it is a string when I used PipelineDecorator.run_locally()
Iāve add Python type hints and it fixed the issues:
` def main(tk_list:list = [], ngram...
Thanks Martin, your suggestion solves the problem.
š