Reputation
Badges 1
8 × Eureka!So for this, should I create a proper issue in the Github? or is this being picked up internally AgitatedDove14
Yes a structure similar to shared folder should be the optimal solution. But i don't understand what you mean by "warm"!!
Cause this would again cause the problems which i asked yesterday. Are there any ways to access the parent dataset(assuming its large and i dont want to download it) without using get_local_copy()
as that would solve a lot of problems? If so where can I find them in the docs?
Anyone who is using small dataset can afford to go with the get_local_copy()
yes its a single file from local machine to a remote one.
CostlyOstrich36 could you explain the workflow with MinIO server setup which you just described? Should the data stored in MinIO be treated as a local folder setup and then maybe use this method! https://clear.ml/docs/latest/docs/clearml_data/clearml_data_sdk#syncing-local-storage
oh okay in that case i can use the output_ur params in uploading files methodl!!
CostlyOstrich36 Thanks for the comment. Is there an Issue already opened and is tracking the status?
There should be a method called as read_remote_copy(str:dataset_id, str:dataset_tag,bool:mutable)
and this should return the path of the remote data.
This get_local_copy()
method is only useful for applications which have datasets in the range of < 10gigs and the training machine is the same as dev machine. Most of us(researchers) its not the case, we share GPU time, this is where clearml comes in.
Requirements: The large dataset should only be a single copy preserving the original folder structure which is presumed to be available remotely and the non-mutable access should be provided via dataset_id. This solves everything or atleas...
shared "warm" folder without having to download the dataset locally.
Lets say that this small dataset has a ID and i can use get_local_copy()
method to cache it locally and then i can use the remote servers to train it. But I would like to have the same flow without downloading the full dataset which is stored remotely.
Thanks. Let me try it and get back to you.
and this path should follow linux folder structure not a single file like the current .zip.
Thank you for clarifying the parent-child thing. When i say accessing, it means i want to use the data for training(without actually getting a local copy of it ). The whole dataset(both large and small) could be created and uploaded by admin. As a researcher, i normally work with a smaller dataset similar to what SucculentBeetle7 has stated. You should also note that this whole training happens in a remote server. So this situation applies https://clear.ml/docs/latest/docs/getting_started/...
TimelyPenguin76 Could you please give more clarification about the process? cause I cannot find this in the docs. How to create a parent-child Dataset with a same dataset_id and only access the child?
Feature request for this: https://clearml.slack.com/archives/CTK20V944/p1629407988075800?thread_ts=1629373886.064600&cid=CTK20V944
yes but for the dataset located in the server, so that i can parse them like a normal local copy
seems to work thanks. But its not as handy as .get_local_copy() method. I will try to raise a feature request. Since this again returns a .zip path. I would like to received a local path which is easily parsable like the method describe above.