hi it is me again, this time trying to upload a single file as Dataset but met with the following error. The file is 13.42GB and of Apache Arrow format. Any idea how to solve this error please? Thank you.
Generating SHA2 hash for 1 files 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:38<00:00, 38.55s/it] Hash generation completed 0%| | 0/1 [00:00<?, ?it/s] Compressing local files, chunk 1 [remaining 1 files] 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1/1 [15:37<00:00, 937.45s/it] File compression completed: total size 5.34 GB, 1 chunked stored (average size 5.34 GB) Uploading compressed dataset changes 1/1 (1 files 5.34 GB) to
2022-02-18 01:07:04,908 - clearml.storage - ERROR - Exception encountered while uploading string longer than 2147483647 bytes Traceback (most recent call last): File "project-x/upload-dataset-from-local.py", line 65, in <module> dataset.upload() File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/clearml/datasets/dataset.py", line 445, in upload delete_after_upload=True, wait_on_upload=True) File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/clearml/task.py", line 1685, in upload_artifact auto_pickle=auto_pickle, preview=preview, wait_on_upload=wait_on_upload) File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/clearml/binding/artifacts.py", line 617, in upload_artifact wait_on_upload=wait_on_upload) File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/clearml/binding/artifacts.py", line 795, in _upload_local_file StorageManager.upload_file(local_file.as_posix(), uri, wait_for_upload=True, retries=ev.retries) File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/clearml/storage/manager.py", line 80, in upload_file retries=retries, File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/clearml/storage/cache.py", line 81, in upload_file local_file, remote_url, async_enable=not wait_for_upload, retries=retries, File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/clearml/storage/helper.py", line 575, in upload res = self._do_upload(src_path, dest_path, extra, cb, verbose=False, retries=retries) File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/clearml/storage/helper.py", line 979, in _do_upload raise last_ex File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/clearml/storage/helper.py", line 963, in _do_upload if not self._upload_from_file(local_path=src_path, dest_path=dest_path, extra=extra): File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/clearml/storage/helper.py", line 941, in _upload_from_file extra=extra) File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/clearml/storage/helper.py", line 1174, in upload_object object_name=object_name, extra=extra, callback=callback, **kwargs) File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/clearml/storage/helper.py", line 1094, in upload_object_via_stream headers=container.get_headers(full_url)) File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/requests/sessions.py", line 577, in post return self.request('POST', url, data=data, json=json, **kwargs) File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/requests/sessions.py", line 529, in request resp = self.send(prep, **send_kwargs) File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/clearml/backend_api/utils.py", line 85, in send return super(SessionWithTimeout, self).send(request, **kwargs) File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/requests/sessions.py", line 645, in send r = adapter.send(request, **kwargs) File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/requests/adapters.py", line 450, in send timeout=timeout File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 710, in urlopen chunked=chunked, File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 398, in _make_request conn.request(method, url, **httplib_request_kw) File "/Users/derek/.pyenv/versions/py37/lib/python3.7/site-packages/urllib3/connection.py", line 239, in request super(HTTPConnection, self).request(method, url, body=body, headers=headers) File "/Users/derek/.pyenv/versions/3.7.12/lib/python3.7/http/client.py", line 1281, in request self._send_request(method, url, body, headers, encode_chunked) File "/Users/derek/.pyenv/versions/3.7.12/lib/python3.7/http/client.py", line 1327, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/Users/derek/.pyenv/versions/3.7.12/lib/python3.7/http/client.py", line 1276, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/Users/derek/.pyenv/versions/3.7.12/lib/python3.7/http/client.py", line 1075, in _send_output self.send(chunk) File "/Users/derek/.pyenv/versions/3.7.12/lib/python3.7/http/client.py", line 997, in send self.sock.sendall(data) File "/Users/derek/.pyenv/versions/3.7.12/lib/python3.7/ssl.py", line 1034, in sendall v = self.send(byte_view[count:]) File "/Users/derek/.pyenv/versions/3.7.12/lib/python3.7/ssl.py", line 1003, in send return self._sslobj.write(data) OverflowError: string longer than 2147483647 bytes
dataset = Dataset.create("C4_realnewslike_filtered", "project-x") dataset.add_files("/Users/derek/Desktop/project-x-artifacts/filtered_dataset") dataset.upload() dataset.finalize()