Reputation
Badges 1
13 × Eureka!I resolved issue. Works like a charm. I disabled framework auto logging, and clearml does not try to store local model again.
Do you mean that in the Model tab when you look into the model details the URL points to a local location (e.g. file:///mnt/something/model) ?
Exactly.
And your goal is to get a copy of that model (file) from your code, is that correct ?
See, it happens when I tried to connect
existed model (in model registry, model is already uploaded to remote storage). I query this model and connect it to the task
model = InputModel.query_models(model_name=name
task.connect(model[0...
CostlyOstrich36 hello, thank you! But what if I wanna have it in open-source version? It’s only one feature I want, and I can’t convince my CTO to buy PRO tier only because of it 🙂
It’s sad, but due to security measures we have to use self-hosted version and it seems like PRO
plan does not provide such option
Let’s say I have a dataset from source A, dataset is finalised, upload and looks like this:train_data/data_from_source_A
Each month I receive new batch of data, create new dataset and upload it. And after few months my dataset looks like this:train_data/data_from_source_A train_data/data_from_source_B train_data/data_from_source_C train_data/data_from_source_D train_data/data_from_source_E
Each batch of data was added via creating a new dataset and adding files. Now, I have a large da...
Thank you, it good way to handle it. Of course, it would be great to have such func in clear ml. Only this stops me from deployment.
Nothing special
dataset = Dataset.create(dataset_name = 'my_dataset', parent_datasets=None, use_current_task=False)
dataset.add_files(dataset_dir, verbose=False)
dataset.upload(output_url='
)
dataset.finalize(verbose=True)
Have you ever benchmarked clear ml datasets on large datasets ? How good is it on handling them ?
@<1523701087100473344:profile|SuccessfulKoala55> any hints ?
Why does it matter how clearml stores datasets? If you get the dataset locally, all files will be unzipped.
- It takes time to compress. 8 archives , 5gb each , takes half of hour.
- I can stream archives from bucket directly to network for training without getting them locally, which saves storage space
Its a json manifiest + docker file in repository
None
it'd be great if clearml sessions could clone the repo, set up docker container and open repo.
However, I see that clearml session cant automatically clone the repo and open vscode in it.
Wow, sounds great! Thank you! I’ll do some research on Terraform
Seems like it does not let to use ability of clearml to track and version datasets. I mean, I can't create next version of dataset from dataset with external files