Notice: dataset_rgb.list_files()
will list the content of the dataset, Not the local files:
e.g.: /folder/myfile.ext
and not /hone/user/cache/folder/myfile.ext
So basically i think you are just not passing actual files, you should probably do:for local_file in Path(folder_rgb).rglob('*'): ...
I copied the Google Storage part clearml.conf
and creds.json
in the agent to my local and now it gets stuck here if I run:
Exception has occurred: ValueError Upload failed
Code:# Upload separated files for separated_file in Path(dataset_folder).rglob(f'*{batch_id}*'): remote_file_path = f"gs://<bucket>/{hatchery_id}/{batch_id}/{single_egg_object_id}/{folder_type}/{str(PurePath(separated_file)).split('/')[-1]}" StorageManager.upload_file(local_file=separated_file, remote_url=remote_file_path)
Hi StraightDog31
I am having trouble using the
StorageManager
to upload files to GCP bucket
Are you using the storagemanager
directly ? or are you using task.upload_artifact
?
Did you provide the GS credentials in the clearml.conf file, see example here:
https://github.com/allegroai/clearml/blob/c9121debc2998ec6245fe858781eae11c62abd84/docs/clearml.conf#L110
So you are uploading a local file (stored in a Dataset) into GS bucket? may I ask why ?
Regrading usage (I might have a typo but this is the gist):torageManager.upload_file( local_file=separated_file_posix_path, remote_url=remote_file_path + separated_file_posix_path.relative_to(files_rgb) )
Notice that you need to provide the full upload URL (including path and file name to be used on your GS storage)
Can you share the storagemanager usage, and error you are getting ?
Notice both needs to be str
btw, if you need the entire folder just use StorageManager.upload_folder
Actually the values of separated_file_posix_path
and separated_file
are same
Try with local file having the full path to the file
Yes I was passing that. Now I am passing it the same way you have mentioned, but my code still gets stuck as in above screenshot. My guess is that it can't resolve credentials. It does not give me any pop up to login also
Can you print the actual values you are passing? (i.e. local_file
remote_url
)
What about output_uri?
If you are using StorageManager directly, output_uri
is not relevant
Now I am passing it the same way you have mentioned, but my code still gets stuck as in above screenshot.
The screenshot shows warning from pyplot (matplotlib) not ClearML, or am I mising something ?
My guess is that it can't resolve credentials. It does not give me any pop up to login also
If it fails, you will get an error, there will never a popup from code 🙂
... We need a more permanent place to store data
FYI you can store the "Dataset" itself on GS (instead of the default clearml file server (basically pass it as output_uri to the Dataset when creating it).
Yes, my bad. It was VPN issue and I fixed it. Now I have one last error:[Errno 2] No such file or directory: '<filename.extension>'
from clearml import Task, Dataset, StorageManager dataset_rgb = Dataset.get(dataset_id=args['dataset_id_rgb']) folder_rgb = dataset_rgb.get_local_copy() files_rgb = dataset_rgb.list_files() for separated_file in files_rgb: separated_file_posix_path = PosixPath(separated_file) remote_file_path = f"gs://<bucket-name>/<folder-name>" StorageManager.upload_file(local_file=separated_file_posix_path, remote_url=remote_file_path)
remote_url = 'gs://<bucket>/6141af10b81a76709db962fd/6141b0ee5f627d0020819489/6147c3db1f0d7f48ddf8e952/camera/6141b0ee5f627d0020819489_14.png' local_file = '6141b0ee5f627d0020819489_14.png'
We are uploading it to Google Bucket because this is the last step of processing after image acquisition and before ML modelling. We need a more permanent place to store data
I need files in different folders. So I cannot upload whole folder