SubstantialElk6 I just realized 3 weeks passed, wow!
So the good news we have some new examples:
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_functions.py
The bad news the documentation was postponed a bit, as we are still messaging the interface (the community is constantly pushing for great ideas and uses cases , and they are just too good to miss out ๐ )
We added nested components and call backs and a metric/artifacts/model auto logging
https://github.com/allegroai/clearml/blob/b010f775bdd72ba6729f5e1e569626692d7b18af/clearml/automation/controller.py#L454
I'm hopeful that we will be able to push an initial version next week.
Please ping if you hear nothing, we appreciate it, and it really helps with prioritizing things ๐
Would you have an example of this in your code blogs to demonstrate this utilisation?
Yes! I definitely think this is important, and hopefully we will see something there ๐ (or at least in the docs)
Yes! I definitely think this is important, and hopefully we will see something thereย
ย (or at least in the docs)
Hi AgitatedDove14 , any updates in the docs to demonstrate this yet?
Thanks AgitatedDove14 , will take a look.
Hi SubstantialElk6
quick update, once clearml 1.1 is out, we will push the clearml-data improvement, supporting chunks per version (i.e. packaging the changeset into multiple zip files, instead of a single one as the current version does).
regrading (1) storage limit server.
Ideally, we should be able to specify the batch size that we want to download, or even better, tie this in with the training by parallelising the data download, data preprocessing and batch trains.
With the next version you will be able to download partial dataset (i.e. only selected chunks), which should help with the issue.
That said, the best solution is to configure a shared cache foe all instances (both open-source and -Enterprise version support it, with some efficiency improvements on the enterprise version).
- Inefficiency. The time to pull the images is the time when the GPU is not utilised.
This one can be solved with shared cache + pipeline step, refreshing the cache in the shared cache machine.
wdyt ?
This one can be solved with shared cache + pipeline step, refreshing the cache in the shared cache machine.
Would you have an example of this in your code blogs to demonstrate this utilisation?