Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Since V1.4.0, Our

Since v1.4.0, our StorageManager.download_folder(..., local_folder='./') is failing - we've had to revert back to 1.3.2.
I saw the changelist includes a fix to the get_local_copy but I don't see how these are affected. Any ideas? 🤔

  
  
Posted 2 years ago
Votes Newest

Answers 30


SweetBadger76 TimelyPenguin76
We're finally tackling this (since it has kept us back at 1.3.2 even though 1.6.2 is out...), and noticed that now the bucket name is also part of the folder?

So following up from David's latest example:
StorageManager.download_folder(remote_url='s3://****-bucket/david/', local_folder='./')Actually creates a new folder ./****-bucket/david/ and puts it contents there.

EDIT: This is with us using internal MinIO, so I believe ClearML parses that endpoint incorrectly.
Since we use s3://some_ip:9000/clearml , ClearML parses it as:

Ah-ha, the bucket name is obviously some_ip:9000!

... which it isn't. Then we get a weird clearml folder.

  
  
Posted 2 years ago

Adding bucket = clearml in aws.s3.credentials did not help either

  
  
Posted 2 years ago

StorageManager.download_folder(remote_url=' s3://some_ip:9000/clearml/my_folder_of_interest ', local_folder='./') yields a new folder structure, ./clearml/my_folder_of_interest , rather than just ./my_folder_of_interest

  
  
Posted 2 years ago

if i got you, clearml is a bucket, my_folder_of_interest is a sub bucket, inside clearml right ?

  
  
Posted 2 years ago

One more thing that may be helpful SweetBadger76 , I've gone ahead and looked into clearml.storage.helper , and found that at least if I specify the bucket name directly in the aws.s3.credentials configuration for MinIO, then:
In [4]: StorageHelper._s3_configurations.get_config_by_uri(' ') Out[4]: S3BucketConfig(bucket='clearml', host='some_ip:9000', key='xxx', secret='xxx', token='', multipart=False, acl='', secure=False, region='', verify=True, use_credentials_chain=False)That is, the bucket name is accurately fetched from the URI. This would already offload a lot of the code to simply updating the configuration 🙂

  
  
Posted 2 years ago

Do you think that you could send us a bit of code in order to better understand how to reproduce the bug ? In particular about how you use dotenv...
So far, something like that is working normally. with both clearml 1.3.2 & 1.4.0

`
task = Task.init(project_name=project_name, task_name=task_name)

img_path = os.path.normpath("**/Images")
img_path = os.path.join(img_path, "
*.png")

print("==> Uploading to Azure")
remote_url = "azure://****.blob.core.windows.net/**/"
StorageManager.upload_file(local_file=img_path, remote_url=os.path.join(remote_url, '
.png'))

print("==> Downloading")
local_folder = os.path.normpath('*****/ClearML')
StorageManager.download_folder(remote_url=remote_url, local_folder=local_folder) `

  
  
Posted 2 years ago

Hi UnevenDolphin73
I am going to try to reproduce this issue, thanks for the details. I keep you updated

  
  
Posted 2 years ago

this is because the server is thought as a bucket too = the root to be precise. Thus you will always have at least a subfolder created in local_folder - corresponding to the bucket found at the server root

  
  
Posted 2 years ago

You mean the host is considered the bucket, as I wrote in my earlier message as the root cause?

  
  
Posted 2 years ago

Yes

  
  
Posted 2 years ago

Will try later today TimelyPenguin76 and report back, thanks! Does this revert the behavior to the 1.3.x one?

  
  
Posted 2 years ago

Hi UnevenDolphin73 , the fix is ready, can you try it with the latest rc?

pip install clearml==1.4.2rc0

  
  
Posted 2 years ago

I have no tried to download the entire file, but yes

  
  
Posted 2 years ago

but in the other hand, when you parse your minio console, you have all the buckets shown as directories right ? there is no file in the root dir. So we used the same logic and decided to reproduce that very same structure. Thus when you will parse the local_folder, you will have the same structure as shown in the console

  
  
Posted 2 years ago

Hi UnevenDolphin73

I have reproduced the error :
Here is the behavior of that line, according to the version : StorageManager. download_folder( s3://mybucket/my_sub_dir/files , local_dir='./')

1.3.2 download the my_sub_dir content directly in ./
1.4.x download the my_sub_dir content in ./my_sub_dir/ (so the dotenv module cant find the file)

please keep in touch if you still have some issues, or if it helps you to solve the issue

  
  
Posted 2 years ago

This means that the function will create a directory structure at local_folder , which structure will be the same as the minio's. That is to say that it will create directories corresponding to the buckets there - thus your clearml directory, which is the bucket the function found in the server root

  
  
Posted 2 years ago

Sounds like incorrect parsing on ClearML side then, doesn't it? At least, it does not fully support MinIO then

I don't imagine AWS users get a new folder named aws-key-region-xyz-bucket-hostname when they download_folder(...) from an AWS S3 bucket, or do they? 🤔

  
  
Posted 2 years ago

Hi UnevenDolphin73
Let me resume, so that i ll be sure that i got it 🙂

I have a minio server somewhere like some_ip on port 9000 , that contains a clearml bucket
If I do StorageManager.download_folder(remote_url=' s3://some_ip:9000/clearml ', local_folder='./', overwrite=True)
Then i ll have a clearml bucket directory created in ./ (local_folder), that will contain the bucket files

  
  
Posted 2 years ago

Still the same issue it seems (or maybe the behaviour is the new behaviour?)

  
  
Posted 2 years ago

That's weird -- the concept of "root directory" is defined to a bucket. There is no "root dir" in S3, is there? It's only within a bucket itself.
And since the documentation states:

If we have a remote file

then StorageManager.download_folder(‘

’, ‘~/folder/’) will create ~/folder/sub/file.ext

Then I would have expected the same outcome from MinIO as I do with S3, or Azure, or any other blob container

  
  
Posted 2 years ago

the fact that the minio server is called "bucket" in the doc (

) is for sure confusing. i will check the reason of this choice, and also why we dont begin to build the structure from the bucket (the real one

)
i keep you updated

  
  
Posted 2 years ago

The bucket is not a folder, it's just a container. Whether it's implemented as a folder in MinIO should be transparent, shouldn't it?

Since the "fix" in 1.4.0 onwards, we now have to download the folder, and then move all the downloaded files/folders to the correct level.
This now entails we also have to check which storage is used, so we can check if the downloaded folder will contain the bucket name or not, which seems very inconsistent?

  
  
Posted 2 years ago

Okay so the new functionality is maintained; thanks!

  
  
Posted 2 years ago

Thanks David! I appreciate that, it would be very nice to have a consistent pattern in this!

  
  
Posted 2 years ago

Interesting. We are opening a discussion to weight the pros and cons of those different approaches - i ll of course keep you updated>
Could you please open a github issue abot that topic ? 🙏
http://github.com/allegroai/clearml/issues

  
  
Posted 2 years ago

It could be related to ClearML agent or server then. We temporarily upload a given .env file to internal S3 bucket (cache), then switch to remote execution. When the remote execution starts, it first looks for this .env file, downloads it using StorageManager, uses dotenv, and then continues the execution normally

  
  
Posted 2 years ago

Hi UnevenDolphin73
The difference between v1.3.2 and v1.4.x (about download_folder) is that in 1.4.x, the subfolder structure is maintened, so the .env file would not be downloaded directly into the provided local folder (hence "./") if it is not into the bucket's main folder. The function will reproduce the subdir structure of the bucket. So you will need to specify to load_env() the path to the .env file (full path, including the env filename)

For example, if i do :
StorageManager.download_folder(remote_url='s3://****-bucket/david/', local_folder='./')then i will have to invoke load_env this way :
dotenv.load_dotenv('./david/.env')

  
  
Posted 2 years ago

Could also be that the use of ./ is the issue? I'm not sure what else I can provide you with, SweetBadger76

  
  
Posted 2 years ago
1K Views
30 Answers
2 years ago
one year ago
Tags