Hi, I'm trying to upload data from my s3 bucket to clearml dataset where i can start versioning it all for my ML project. I have connected successfully to my s3, correctly configured my clearml.conf file, but I am struggling with some task initialization when it comes to uploading subfolders of s3 bucket directory.
I am receiving this error log message
ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
Dataset 'VisionAI_data' found, creating a new version...
Adding files from:
2024-07-01 17:04:24,711 - clearml.storage - INFO - Uploading: 5.00MB / 32.85MB @ 8.16MBs to /var/folders/zm/vf43rrfs5y5f4tsfqhb0tgdc0000gn/T/state.2m6gxtp_.json
2024-07-01 17:04:24,774 - clearml.storage - INFO - Uploading: 10.00MB / 32.85MB @ 79.69MBs to /var/folders/zm/vf43rrfs5y5f4tsfqhb0tgdc0000gn/T/state.2m6gxtp_.json
2024-07-01 17:04:24,829 - clearml.storage - INFO - Uploading: 15.00MB / 32.85MB @ 91.88MBs to /var/folders/zm/vf43rrfs5y5f4tsfqhb0tgdc0000gn/T/state.2m6gxtp_.json
2024-07-01 17:04:24,897 - clearml.storage - INFO - Uploading: 20.00MB / 32.85MB @ 73.79MBs to /var/folders/zm/vf43rrfs5y5f4tsfqhb0tgdc0000gn/T/state.2m6gxtp_.json
2024-07-01 17:04:24,995 - clearml.storage - INFO - Uploading: 25.00MB / 32.85MB @ 51.01MBs to /var/folders/zm/vf43rrfs5y5f4tsfqhb0tgdc0000gn/T/state.2m6gxtp_.json
2024-07-01 17:04:25,343 - clearml.storage - INFO - Uploading: 30.00MB / 32.85MB @ 14.38MBs to /var/folders/zm/vf43rrfs5y5f4tsfqhb0tgdc0000gn/T/state.2m6gxtp_.json
2024-07-01 17:04:28,106 - clearml.storage - INFO - Uploading: 32.85MB / 32.85MB @ 1.03MBs to /var/folders/zm/vf43rrfs5y5f4tsfqhb0tgdc0000gn/T/state.2m6gxtp_.json
2024-07-01 17:04:28,791 - clearml.Task - ERROR - Action failed <400/110: tasks.add_or_update_artifacts/v2.10 (Invalid task status: expected=created, status=completed)> (task=d685ecee84434b469bca416fafb8bc48, artifacts=[{'key': 'state', 'type': 'dict', 'uri': '
to ClearML/.datasets/VisionAI_data/VisionAI_data.d685ecee84434b469bca416fafb8bc48/artifacts/state/state.json', 'content_size': 34450423, 'hash': 'a59aae25c98cc9a251ff989768e5c622b475516ce52ec4b030cb837a65d41a4f', 'timestamp': 1719878668, 'type_data': {'preview': 'Dataset state\nFiles added/modified: 112716 - total size 70.64 GB\nCurrent dependency graph: {\n "d685ecee84434b469bca416fafb8bc48": []\n}\n', 'content_type': 'application/json'}, 'display_data': [('files added', '112716'), ('files removed', '0'), ('files modified', '2')]}], force=True)
Traceback (most recent call last):
File "/Users/rishiarjun/Desktop/VisionAI/Vision-ML/DataEngineering/S3Connect.py", line 44, in <module>
create_or_update_dataset_from_s3(bucket_name, dataset_name, dataset_project)
File "/Users/rishiarjun/Desktop/VisionAI/Vision-ML/DataEngineering/S3Connect.py", line 37, in create_or_update_dataset_from_s3
dataset.finalize()
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/clearml/datasets/dataset.py", line 828, in finalize
raise ValueError("Cannot finalize dataset, status '{}' is not valid".format(status))
ValueError: Cannot finalize dataset, status 'completed' is not valid
Heres my script. Its fairly straightforward- establish connection, create task, check if dataset exists, then upload 3 folders from the VisionAI1 bucket in s3