
Reputation
Badges 1
120 × Eureka!AgitatedDove14 I have annotation logs from the end-user that I fetch periodically, I process it and I want to add it as a new version of my dataset where all versions correspond to the data collected during a precise time window, currently I'm doing it by fetching the latest dataset, incrementing the versionmm and creating a new dataset version
I would like instead of having to:
Fetch latest dataset to get the current latest version Increment the version number Create and upload a new version of the datasetTo be able to:
Select a dataset project by name Create a new version of the dataset by choosing what increment in SEMVER standard I would like to add for this version number (major/minor/patch) and upload
Well I think most of the time is took by the setup of the venv installing the packages defined in the imports in the pipeline component which is normal and some of those package have a wheel that takes a long time to build but most of those packages where already included on the Docker image I provided and I get that message in my logs:
:: Python virtual environment cache is disabled. To accelerate spin-up time set
agent.venvs_cache.path=~/.clearml/venvs-cache:::
Well having a network inbcidient at HQ so this doesn't help.... but I'll keep you updqted with the tests I run tommorow
Oh wow, would definitely try it out if there were an Autoscaler App integrating it with ClearML
Okay looks like the call dependency resolver does not supports cross-file calls and relies instead on the local repo cloning feature to handle multiple files so the Task.force_store_standalone_script()
does not allow for a pipeline defined cross multiple files (now that you think of it it was kinda implied by the name), but what is interesting is that calling an auxiliary function in the SAME file from a component also raise a NameError: <function_name> is not defined
, that's ki...
Would have been great if the CLearML resolver would just inline the code of locally defined vanilla functions and execute that inlined code under the import scope of the component from which it is called
Well it is also failing within the same file if you read until the end, but for the cross-file issue, it's mostly because of my repo architecture organized in a v1/v2 scheme and I didn't want to pull a lot of unused files and inject github PATs that frankly lack gralunarity in the worker
Well solved, it's not as beautiful but I guess i can put them in a env file with an arbitrary name in the init script and just pass that file as exec argument...
Yup I already setup my aws configs for clearML that way but I needed to have generally accessible credentials too so I used the init script option in this config menu ^^
And by extension is there a way to upsert a dataset by automatically creating an entry wich a incremented version or create it if it does not exists ? Or am I forced to do a get, check if the latest version is fainallyzed, then increment de version of that version and create my new version ?
Sure but the same pattern can be achieved using explicitly the PipelineController
class and defining steps using .add_step()
pointing to CLearML's Task
objects right ?
The decorators simply abstract away the controller but both methods (decorators or controller/tasks) allows to decouple your pipelines in steps each having an independent compute target, right ?
So basically choosing one method or the other only a question of best-practice or style ?
Nice, that's a great feature! I'm also trying to have a component executing Giskard QA test suites on model and data, is there a planned feature when I can suspend execution of the pipeline, and display on the UI that this pipeline "steps" require a human confirmation to go on or stop while displaying arbitrary text/plot information ?
Ooooo okay I see the @PipelineDecorator.pipeline
decorator you can have a function to orchestrate your components and manipulate their return data
Btw AgitatedDove14 is there a way to define parallel tasks and use pipeline as an acyclic compute graph instead of simply sequential tasks ?
As opposed to the Controller/Task component where the add_step()
only allows to sequentially execute them
Have you identified yet if it was a strictly internal issue or should I continue my investigation on my side ?
Fix confirmed on our side CostlyOstrich36 thanks for everything!
Hey CostlyOstrich36 did you find anything on interest on the issue ?
Okay thanks! Please keep me posted when the hotfix is out on the SaaS
SuccessfulKoala55 Mostly the VM instances types and properties, execution queue and app name.
SmugDolphin23 But the training.py has already a CLearML task created under the hood since its integration with ClearML, beside initing the task before the execution of the file like in my snippet is not sufficient ?
The worker docker image was running on python 3.8 and weare running on a PRO tier SaaS deployment, this failed run is from a few weeks ago and we did not run any pipeline since then
THe image OS and the runner OS were both Ubuntu 22 if I remember
The train.py
is the default YOLOv5 training file, I initiated the task outside the call, should I go edit their training command-line file ?
But the task appeared with the correct name and outputs in the pipeline and the experiment manager
I'm reffering https://clearml.slack.com/archives/CTK20V944/p1668070109678489?thread_ts=1667555788.111289&cid=CTK20V944 mapping the project to ClearML project and https://github.com/ultralytics/yolov5/tree/master/utils/loggers/clearml that when calling the trainin g.py from my machine successfully logged the training on clearML and uploaded the artifact correctly