Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello Everyone, While Calling Get_Local_Copy Of The Dataset From The Fileserver, I Get The Path To The Local Copy, But The Files Are Not Downloaded And The Folder Is Empty. Tell Me What Could Be The Problem. I Don'T Get Any Additional Errors Or Warnings.

Hello everyone, while calling get_local_copy of the dataset from the fileserver, I get the path to the local copy, but the files are not downloaded and the folder is empty. Tell me what could be the problem. I don't get any additional errors or warnings. Everything looks fine, but the files in the specified path do not appear. Also the same problem with using get_mutable_local_copy function.

from clearml import Dataset

dataset = Dataset.get(
    dataset_name="test OCR dataset",
    dataset_project="Text Recognition"
)

print(dataset.get_local_copy())

image

  
  
Posted 11 months ago
Votes Newest

Answers 7


@<1523701070390366208:profile|CostlyOstrich36> Yes, sure

import pandas as pd
import yaml
import os
from omegaconf import OmegaConf
from clearml import Dataset

config_path = 'configs/structured_docs.yml'

with open(config_path) as f:
    config = yaml.full_load(f)

config = OmegaConf.create(config)
path2images = config.data.images_folder


def get_data(config, split):
    path2annotation = os.path.join(config.data.annotation_folder, f"sample_{split}.csv")
    data = pd.read_csv(path2annotation)
    return data


data_train = get_data(config, 'train')
data_val = get_data(config, 'val')
data = pd.concat([data_val, data_train])

files = [os.path.join(path2images, file) for file in data['filename'].values]


dataset = Dataset.create(
    dataset_name="test OCR dataset",
    dataset_project="Text Recognition"
)
for file in files:
    dataset.add_files(path=file)

dataset.upload()
dataset.finalize()

With this script, I uploaded data to the server. You can also see the final status of the dataset in the screenshot.

  
  
Posted 11 months ago

@<1523701435869433856:profile|SmugDolphin23>
I rechecked on single files, creating new datasets, and everything works properly. I tried to create dataset using original data, and I got the following logs. Could you suggest what could be causing this?
Uploading dataset changes (1497 files compressed to 9.07 MiB) to None
2023-05-12 08:46:03,114 - clearml.storage - ERROR - Exception encountered while uploading Failed uploading object /addudkin2/.datasets/test-addudkin/test-addudkin.19ab55776fed408cab214814543699de/artifacts/data/dataset.19ab55776fed408cab214814543699de.mr7nkbq8.zip (413): <html>
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>

WARNING:root:Failed uploading artifact 'data'. Retrying... (1/3)
2023-05-12 08:46:03,602 - clearml.storage - ERROR - Exception encountered while uploading Failed uploading object /addudkin2/.datasets/test-addudkin/test-addudkin.19ab55776fed408cab214814543699de/artifacts/data/dataset.19ab55776fed408cab214814543699de.mr7nkbq8.zip (413): <html>
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>

WARNING:root:Failed uploading artifact 'data'. Retrying... (2/3)
2023-05-12 08:46:03,920 - clearml.storage - ERROR - Exception encountered while uploading Failed uploading object /addudkin2/.datasets/test-addudkin/test-addudkin.19ab55776fed408cab214814543699de/artifacts/data/dataset.19ab55776fed408cab214814543699de.mr7nkbq8.zip (413): <html>
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>

WARNING:root:Failed uploading artifact 'data'. Retrying... (3/3)
2023-05-12 08:46:04,392 - clearml.storage - ERROR - Exception encountered while uploading Failed uploading object /addudkin2/.datasets/test-addudkin/test-addudkin.19ab55776fed408cab214814543699de/artifacts/data/dataset.19ab55776fed408cab214814543699de.mr7nkbq8.zip (413): <html>
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>

File compression and upload completed: total size 9.07 MiB, 1 chunk(s) stored (average size 9.07 MiB)

  
  
Posted 11 months ago

Hi @<1524560082761682944:profile|MammothParrot39> ! A few thoughts:
You likely know this, but the files may be downloaded to something like /home/user/.clearml/cache/storage_manager/datasets/ds_e0833955ded140a69b4c9c9d8e84986c . .clearml may be hidden and if you are using an explorer you are not able to see the directory.

If that is not the issue: are you able to download some other datasets, such as our example one: UrbanSounds example ? I'm wondering if the problem only happens for your specific dataset.

  
  
Posted 11 months ago

Hi @<1524560082761682944:profile|MammothParrot39> , did you make sure to finalize the dataset you're trying to access?

  
  
Posted 11 months ago

problem solved. Removed nginx limits

  
  
Posted 11 months ago

@<1578193574506270720:profile|DashingAlligator28> Removed nginx limits

  
  
Posted 10 months ago

How can you solve this problem? I'm with this one too.

  
  
Posted 10 months ago
707 Views
7 Answers
11 months ago
10 months ago
Tags
Similar posts