Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Is There Any Way To Get Dataset Size Without Downloading State.Json? Im Doing Ds = Clearml.Dataset.Get(Dataset_Id=D_Id), But It Instantly Tries To Download State.Json Which Is On S3. Im Only Interested In Size And File Count Which I Then Get From Calling

Is there any way to get dataset size without downloading state.json?
im doing ds = clearml.Dataset.get(dataset_id=d_id), but it instantly tries to download state.json which is on S3. Im only interested in size and file count which i then get from calling ds.get_metadata("state")

"state" comes from Task, so another workaround would be to get Task id straight from knowing the dataset ID

I dont want to download state.json because

  • Its 500+MB
  • I need S3 Creds that I dont want to store on server
Posted 6 months ago
Votes Newest


Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! You could get the Dataset Struct configuration object and get the job_size from there, which is the dataset size in bytes. The task IDs of the datasets are the same as the datasets' IDs by the way, so you can call all the clearml task related function on the task your get by doing Task.get_task("dataset_id")

Posted 6 months ago
1 Answer
6 months ago
6 months ago