Hi all,
I am having trouble using the StorageManager to upload files to GCP bucket. Do I need to pass bucket path as output_uri while calling Task.init ? Without that, it gives me this error message:
NoneType has not attribute upload()

I tried to look for source code, but this method is called by CacheManager.

Now I added the bucket path as output_uri . But the clearml.conf on my local does not have credentials to GCP bucket. How can I add this?

The GCP access is configures on all agents in the cluster. How can I run this task remotely? Even if I try, it gives me this error :
ValueError: Could not get access credentials for 'gs://<bucket-name>/' , check configuration file ~/clearml.conf

Posted 3 years ago
Try with local file having the full path to the file

Posted 3 years ago

I copied the Google Storage part clearml.conf and creds.json in the agent to my local and now it gets stuck here if I run:

Posted 3 years ago

What about output_uri?

If you are using StorageManager directly, output_uri is not relevant

Posted 3 years ago

Now I am passing it the same way you have mentioned, but my code still gets stuck as in above screenshot.

The screenshot shows warning from pyplot (matplotlib) not ClearML, or am I mising something ?

My guess is that it can't resolve credentials. It does not give me any pop up to login also

If it fails, you will get an error, there will never a popup from code 🙂

... We need a more permanent place to store data

FYI you can store the "Dataset" itself on GS (instead of the default clearml file server (basically pass it as output_uri to the Dataset when creating it).

Posted 3 years ago

Actually the values of separated_file_posix_path and separated_file are same

Posted 3 years ago

Notice: dataset_rgb.list_files() will list the content of the dataset, Not the local files:
e.g.: /folder/myfile.ext and not /hone/user/cache/folder/myfile.ext
So basically i think you are just not passing actual files, you should probably do:
for local_file in Path(folder_rgb).rglob('*'): ...

Posted 3 years ago

I need files in different folders. So I cannot upload whole folder

Posted 3 years ago

Just added. What about output_uri?

Posted 3 years ago

Yes I was passing that. Now I am passing it the same way you have mentioned, but my code still gets stuck as in above screenshot. My guess is that it can't resolve credentials. It does not give me any pop up to login also

Posted 3 years ago

So you are uploading a local file (stored in a Dataset) into GS bucket? may I ask why ?
Regrading usage (I might have a typo but this is the gist):
torageManager.upload_file( local_file=separated_file_posix_path, remote_url=remote_file_path + separated_file_posix_path.relative_to(files_rgb) )Notice that you need to provide the full upload URL (including path and file name to be used on your GS storage)

Posted 3 years ago

Hi StraightDog31

I am having trouble using the 


  to upload files to GCP bucket

Are you using the storagemanager directly ? or are you using task.upload_artifact ?
Did you provide the GS credentials in the clearml.conf file, see example here:

Posted 3 years ago

Yes, my bad. It was VPN issue and I fixed it. Now I have one last error:
[Errno 2] No such file or directory: '<filename.extension>'

Posted 3 years ago

For separated_file_posix_path

Posted 3 years ago

remote_url = 'gs://<bucket>/6141af10b81a76709db962fd/6141b0ee5f627d0020819489/6147c3db1f0d7f48ddf8e952/camera/6141b0ee5f627d0020819489_14.png' local_file = '6141b0ee5f627d0020819489_14.png'

Posted 3 years ago

Can you share the storagemanager usage, and error you are getting ?

Posted 3 years ago

from clearml import Task, Dataset, StorageManager dataset_rgb = Dataset.get(dataset_id=args['dataset_id_rgb']) folder_rgb = dataset_rgb.get_local_copy() files_rgb = dataset_rgb.list_files() for separated_file in files_rgb: separated_file_posix_path = PosixPath(separated_file) remote_file_path = f"gs://<bucket-name>/<folder-name>" StorageManager.upload_file(local_file=separated_file_posix_path, remote_url=remote_file_path)

Posted 3 years ago

Notice both needs to be str
btw, if you need the entire folder just use StorageManager.upload_folder

Posted 3 years ago

We are uploading it to Google Bucket because this is the last step of processing after image acquisition and before ML modelling. We need a more permanent place to store data

Posted 3 years ago

Exception has occurred: ValueError Upload failed
# Upload separated files for separated_file in Path(dataset_folder).rglob(f'*{batch_id}*'): remote_file_path = f"gs://<bucket>/{hatchery_id}/{batch_id}/{single_egg_object_id}/{folder_type}/{str(PurePath(separated_file)).split('/')[-1]}" StorageManager.upload_file(local_file=separated_file, remote_url=remote_file_path)

Posted 3 years ago

Can you print the actual values you are passing? (i.e. local_file remote_url )

Posted 3 years ago