I have ~100GB of data that I do not wish to upload to the trains-server. Instead, I would like to have them copied only to host machine (azure container) at training time.
The data is in Azure blob storage and will be copied using a custom script just before training starts.
Hi LazyLeopard18
I think that these toy examples will help:
uploading local datasethttps://github.com/allegroai/events/blob/master/odsc20-east/generic/dataset_artifact.py
2. pre-process data
https://github.com/allegroai/events/blob/master/odsc20-east/generic/process_dataset.py
3. Training example:
https://github.com/allegroai/events/blob/master/odsc20-east/scikit-learn/sklearn_jupyter.ipynb
LazyLeopard18 you can point the artifact directly to your azure object storage and have StorageManager download and cache it for you: