Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello Everyone, While Calling Get_Local_Copy Of The Dataset From The Fileserver, I Get The Path To The Local Copy, But The Files Are Not Downloaded And The Folder Is Empty. Tell Me What Could Be The Problem. I Don'T Get Any Additional Errors Or Warnings.

Hello everyone, while calling get_local_copy of the dataset from the fileserver, I get the path to the local copy, but the files are not downloaded and the folder is empty. Tell me what could be the problem. I don't get any additional errors or warnings. Everything looks fine, but the files in the specified path do not appear. Also the same problem with using get_mutable_local_copy function.

from clearml import Dataset

dataset = Dataset.get(
    dataset_name="test OCR dataset",
    dataset_project="Text Recognition"
)

print(dataset.get_local_copy())

image

  
  
Posted one year ago
Votes Newest

Answers 7


@<1578193574506270720:profile|DashingAlligator28> Removed nginx limits

  
  
Posted one year ago

Hi @<1524560082761682944:profile|MammothParrot39> , did you make sure to finalize the dataset you're trying to access?

  
  
Posted one year ago

@<1523701435869433856:profile|SmugDolphin23>
I rechecked on single files, creating new datasets, and everything works properly. I tried to create dataset using original data, and I got the following logs. Could you suggest what could be causing this?
Uploading dataset changes (1497 files compressed to 9.07 MiB) to None
2023-05-12 08:46:03,114 - clearml.storage - ERROR - Exception encountered while uploading Failed uploading object /addudkin2/.datasets/test-addudkin/test-addudkin.19ab55776fed408cab214814543699de/artifacts/data/dataset.19ab55776fed408cab214814543699de.mr7nkbq8.zip (413): <html>
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>

WARNING:root:Failed uploading artifact 'data'. Retrying... (1/3)
2023-05-12 08:46:03,602 - clearml.storage - ERROR - Exception encountered while uploading Failed uploading object /addudkin2/.datasets/test-addudkin/test-addudkin.19ab55776fed408cab214814543699de/artifacts/data/dataset.19ab55776fed408cab214814543699de.mr7nkbq8.zip (413): <html>
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>

WARNING:root:Failed uploading artifact 'data'. Retrying... (2/3)
2023-05-12 08:46:03,920 - clearml.storage - ERROR - Exception encountered while uploading Failed uploading object /addudkin2/.datasets/test-addudkin/test-addudkin.19ab55776fed408cab214814543699de/artifacts/data/dataset.19ab55776fed408cab214814543699de.mr7nkbq8.zip (413): <html>
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>

WARNING:root:Failed uploading artifact 'data'. Retrying... (3/3)
2023-05-12 08:46:04,392 - clearml.storage - ERROR - Exception encountered while uploading Failed uploading object /addudkin2/.datasets/test-addudkin/test-addudkin.19ab55776fed408cab214814543699de/artifacts/data/dataset.19ab55776fed408cab214814543699de.mr7nkbq8.zip (413): <html>
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>

File compression and upload completed: total size 9.07 MiB, 1 chunk(s) stored (average size 9.07 MiB)

  
  
Posted one year ago

Hi @<1524560082761682944:profile|MammothParrot39> ! A few thoughts:
You likely know this, but the files may be downloaded to something like /home/user/.clearml/cache/storage_manager/datasets/ds_e0833955ded140a69b4c9c9d8e84986c . .clearml may be hidden and if you are using an explorer you are not able to see the directory.

If that is not the issue: are you able to download some other datasets, such as our example one: UrbanSounds example ? I'm wondering if the problem only happens for your specific dataset.

  
  
Posted one year ago

How can you solve this problem? I'm with this one too.

  
  
Posted one year ago

problem solved. Removed nginx limits

  
  
Posted one year ago

@<1523701070390366208:profile|CostlyOstrich36> Yes, sure

import pandas as pd
import yaml
import os
from omegaconf import OmegaConf
from clearml import Dataset

config_path = 'configs/structured_docs.yml'

with open(config_path) as f:
    config = yaml.full_load(f)

config = OmegaConf.create(config)
path2images = config.data.images_folder


def get_data(config, split):
    path2annotation = os.path.join(config.data.annotation_folder, f"sample_{split}.csv")
    data = pd.read_csv(path2annotation)
    return data


data_train = get_data(config, 'train')
data_val = get_data(config, 'val')
data = pd.concat([data_val, data_train])

files = [os.path.join(path2images, file) for file in data['filename'].values]


dataset = Dataset.create(
    dataset_name="test OCR dataset",
    dataset_project="Text Recognition"
)
for file in files:
    dataset.add_files(path=file)

dataset.upload()
dataset.finalize()

With this script, I uploaded data to the server. You can also see the final status of the dataset in the screenshot.

  
  
Posted one year ago
1K Views
7 Answers
one year ago
one year ago
Tags
Similar posts