
Reputation
Badges 1
8 × Eureka!yes its a single file from local machine to a remote one.
oh okay in that case i can use the output_ur params in uploading files methodl!!
CostlyOstrich36 could you explain the workflow with MinIO server setup which you just described? Should the data stored in MinIO be treated as a local folder setup and then maybe use this method! https://clear.ml/docs/latest/docs/clearml_data/clearml_data_sdk#syncing-local-storage
CostlyOstrich36 Thanks for the comment. Is there an Issue already opened and is tracking the status?
There should be a method called as read_remote_copy(str:dataset_id, str:dataset_tag,bool:mutable)
and this should return the path of the remote data.
Feature request for this: https://clearml.slack.com/archives/CTK20V944/p1629407988075800?thread_ts=1629373886.064600&cid=CTK20V944
and this path should follow linux folder structure not a single file like the current .zip.
Thanks. Let me try it and get back to you.
TimelyPenguin76 Could you please give more clarification about the process? cause I cannot find this in the docs. How to create a parent-child Dataset with a same dataset_id and only access the child?
Thank you for clarifying the parent-child thing. When i say accessing, it means i want to use the data for training(without actually getting a local copy of it ). The whole dataset(both large and small) could be created and uploaded by admin. As a researcher, i normally work with a smaller dataset similar to what SucculentBeetle7 has stated. You should also note that this whole training happens in a remote server. So this situation applies https://clear.ml/docs/latest/docs/getting_started/...
This get_local_copy()
method is only useful for applications which have datasets in the range of < 10gigs and the training machine is the same as dev machine. Most of us(researchers) its not the case, we share GPU time, this is where clearml comes in.
Requirements: The large dataset should only be a single copy preserving the original folder structure which is presumed to be available remotely and the non-mutable access should be provided via dataset_id. This solves everything or atleas...
Yes a structure similar to shared folder should be the optimal solution. But i don't understand what you mean by "warm"!!
shared "warm" folder without having to download the dataset locally.
Lets say that this small dataset has a ID and i can use get_local_copy()
method to cache it locally and then i can use the remote servers to train it. But I would like to have the same flow without downloading the full dataset which is stored remotely.
So for this, should I create a proper issue in the Github? or is this being picked up internally AgitatedDove14
Anyone who is using small dataset can afford to go with the get_local_copy()
Cause this would again cause the problems which i asked yesterday. Are there any ways to access the parent dataset(assuming its large and i dont want to download it) without using get_local_copy()
as that would solve a lot of problems? If so where can I find them in the docs?
seems to work thanks. But its not as handy as .get_local_copy() method. I will try to raise a feature request. Since this again returns a .zip path. I would like to received a local path which is easily parsable like the method describe above.
yes but for the dataset located in the server, so that i can parse them like a normal local copy