Reputation
Badges 1
15 × Eureka!SubstantialElk6
Only today I've saw your comments (did not get notified for some reason)
Thanks for you suggestions
Thanks Martin, your suggestion solves the problem.
š
AgitatedDove14
How do you recommend to perform this task?
I mean, have a CI/CD (e.g Github Actions) thats update my āproductionā pipeline on ClearML UI, so a Data Scientist can start to experiment things and create jobs from the UI.
AgitatedDove14 , thanks for the quick answer.
I think this is the easiest way, basically the CI/CD launches a pipeline (which under the hood is another type of Task), by querying the latest āPublishedā pipeline that is also Not archived, then cloning+pushing it to execution queue
Do you have an example?
UI when you want to āupgradeā the production pipeline you just right click āPublishā on the pipeline
Iāve did saw this āpublishā option for pipelines, just for models, is thi...
Got it!
Thanks AgitatedDove14
I've build a container using the same image used by agent.
Training ran with no errors
I've also tried with clearml-1.6.5rc2, got same error
I am lost š
Hi there,
This is exactly I want to do.
RoughTiger69
Have you be able to do it?
AgitatedDove14 Worked!
But a new error raises:
` File "kgraph/pipelines/token_join/train/pipeline.py", line 48, in main
timestamp = pd.to_datetime(data_timestamp) if data_timestamp is not None else get_latest_version(feature_view_name)
File "/root/.clearml/venvs-builds/3.8/task_repository/Data-Science/kgraph/featurestore/query_data.py", line 77, in get_latest_version
fv = store.get_feature_view(fv_name)
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/feast/u...
AgitatedDove14 Thanks for the explanation
I got it.
How I can use force_requirements_env_freeze
with PipelineDecorator()
as I do not have the Task object created.@PipelineDecorator.pipeline(name='training', project='kgraph', version='1.2') def main(feature_view_name, data_timestamp=None, tk_list=None): """Pipeline to train ...
Hi there,
PanickyMoth78
I am having the same issue.
Some steps of the pipeline create huge datasets (some GBs) that I donāt want to upload or save.
Wrap the returns in a dict could be a solution, but honestly, I donāt like it.
AgitatedDove14 Is there any better way to avoid the upload of some artifacts of pipeline steps?
The image above shows an example of the first step of a training pipeline, that queries data from a feature store.
It gets the DataFrame, zip and upload it (this one i...
So, how wrap the returns in a dict could be a solution?
It will serialize the data on the dict? (leading to the same result, data storage somewhere)
I see now.
I didnāt know that each steps runs in a different process
Thus, the return data from step 2 needs to be available somewhere to be used in step 3.
The transformation has nome parameters that we change eventually
I could merge some steps, but as I may want to cache them in the future, I prefer to keep them separate
Found the issue.
For some reason, all parameters on the main functions are passed as strings.
So I have these parameters:
@PipelineDecorator.pipeline(name='Build Embeddings', project='kgraph', version='1.3') def main(tk_list=[], ngram_size=2): ...
The ngram_size variable is a int when using PipelineDecorator.debug_pipeline()
and it is a string when I used PipelineDecorator.run_locally()
Iāve add Python type hints and it fixed the issues:
` def main(tk_list:list = [], ngram...
AgitatedDove14 is that the expect behavior for Pipelines?
Pipelines runs on the same machine.
We already have the feature-store to save all data, thatās why I donāt need to save it (just a reference of version of dataset).
I understand your point.
I can have different steps of the pipeline running on different machines. But this is not my use case.
this will cause them to get serialized to the local machineās file system, wdyt?
I am about the disk space usage that may increase over time.
I just prefer do not worry about that
This is not a valid parameter: https://clear.ml/docs/latest/docs/references/sdk/task#taskinit
Also I did not find any usage example of the setup_upload
method
Thanks anyway
that makes sense, so why donāt you point to the feature store ?
I did, the first step of the pipeline query the feature store. I mean, I set the data version as a parameter, then this steps query the data and return it (to be used in the next step)
I donāt think so AgitatedDove14
Iāve tested with:
PipelineDecorator.debug_pipeline() PipelineDecorator.run_locally() Docker
Iāve got no error
Hi MotionlessCoral18
Are you running the agent inside a container?
Would you mind to share your Dockerfile?