Reputation
Badges 1
13 × Eureka!Why does it matter how clearml stores datasets? If you get the dataset locally, all files will be unzipped.
- It takes time to compress. 8 archives , 5gb each , takes half of hour.
- I can stream archives from bucket directly to network for training without getting them locally, which saves storage space
Let’s say I have a dataset from source A, dataset is finalised, upload and looks like this:train_data/data_from_source_A
Each month I receive new batch of data, create new dataset and upload it. And after few months my dataset looks like this:train_data/data_from_source_A train_data/data_from_source_B train_data/data_from_source_C train_data/data_from_source_D train_data/data_from_source_E
Each batch of data was added via creating a new dataset and adding files. Now, I have a large da...
Thank you, it good way to handle it. Of course, it would be great to have such func in clear ml. Only this stops me from deployment.
I resolved issue. Works like a charm. I disabled framework auto logging, and clearml does not try to store local model again.
CostlyOstrich36 hello, thank you! But what if I wanna have it in open-source version? It’s only one feature I want, and I can’t convince my CTO to buy PRO tier only because of it 🙂
Wow, sounds great! Thank you! I’ll do some research on Terraform
Do you mean that in the Model tab when you look into the model details the URL points to a local location (e.g. file:///mnt/something/model) ?
Exactly.
And your goal is to get a copy of that model (file) from your code, is that correct ?
See, it happens when I tried to connect
existed model (in model registry, model is already uploaded to remote storage). I query this model and connect it to the task
model = InputModel.query_models(model_name=name
task.connect(model[0...
It’s sad, but due to security measures we have to use self-hosted version and it seems like PRO
plan does not provide such option
Nothing special
dataset = Dataset.create(dataset_name = 'my_dataset', parent_datasets=None, use_current_task=False)
dataset.add_files(dataset_dir, verbose=False)
dataset.upload(output_url='
)
dataset.finalize(verbose=True)
@<1523701087100473344:profile|SuccessfulKoala55> any hints ?
Have you ever benchmarked clear ml datasets on large datasets ? How good is it on handling them ?
Seems like it does not let to use ability of clearml to track and version datasets. I mean, I can't create next version of dataset from dataset with external files
Its a json manifiest + docker file in repository
None
it'd be great if clearml sessions could clone the repo, set up docker container and open repo.
However, I see that clearml session cant automatically clone the repo and open vscode in it.