SuccessfulKoala55
The dataset file URL is set on upload and stored on the server
This might be a reason. I think server IP in machine A is set to "localhost:port"
Then, after I change IP "localhost" to "<server IP>" in server A and re upload Dataset, Is it accessible remotely?
Oh, wait - scratch that - is it possible that the dataset was uploaded from server A where you used localhost:8081
as the address?
SuccessfulKoala55 created file run after clearml-init
on cli
SuccessfulKoala55 Thanks a lot. problem solved. Have a good day!
but on which machine? and what's the clearml.conf
configuration there (just to make sure the generated file is OK)
>>> d = Dataset.get(dataset_name="Anonymous task (user@beryl 2022-03-23 04:05:19)", dataset_project="test_project").get_local _copy() Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0c23e4b490>: Failed to establish a new connection: [Errno 111] Connection refused')': /test_project/Anonymous%20task%20%28user%2540beryl%202022-03-23%2004%253A05%253A19%29.c05641c2e1c74389b471fbc9110c302d/artifacts/state/state.json Retrying (Retry(total=1, connect=1, read=5, redirect=5, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0c23e4b6d0>: Failed to establish a new connection: [Errno 111] Connection refused')': /test_project/Anonymous%20task%20%28user%2540beryl%202022-03-23%2004%253A05%253A19%29.c05641c2e1c74389b471fbc9110c302d/artifacts/state/state.json Retrying (Retry(total=0, connect=0, read=5, redirect=5, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0c23e4b910>: Failed to establish a new connection: [Errno 111] Connection refused')': /test_project/Anonymous%20task%20%28user%2540beryl%202022-03-23%2004%253A05%253A19%29.c05641c2e1c74389b471fbc9110c302d/artifacts/state/state.json 2022-03-23 14:25:51,073 - clearml.storage - ERROR - Could not download
` , err: HTTPConnectionPool(host='localhost', port=8081): Max retries exceeded with url: /test_project/Anonymous%20task%20%28user%2540beryl%202022-03-23%2004%253A05%253A19%29.c05641c2e1c74389b471fbc9110c302d/artifacts/state/state.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0c01f7b070>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/.conda/envs/test_project/lib/python3.8/site-packages/clearml/datasets/dataset.py", line 968, in get
raise ValueError('Could not load Dataset id={} state'.format(task.id))
ValueError: Could not load Dataset id=c05641c2e1c74389b471fbc9110c302d state
d = Dataset.get(dataset_name="Anonymous task (user@beryl 2022-03-23 04:05:19)", dataset_project="test_project").get_mutable_local_copy()
Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0c23e63790>: Failed to establish a new connection: [Errno 111] Connection refused')': /test_project/Anonymous%20task%20%28user%2540beryl%202022-03-23%2004%253A05%253A19%29.c05641c2e1c74389b471fbc9110c302d/artifacts/state/state.json
Retrying (Retry(total=1, connect=1, read=5, redirect=5, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0c23e639d0>: Failed to establish a new connection: [Errno 111] Connection refused')': /test_project/Anonymous%20task%20%28user%2540beryl%202022-03-23%2004%253A05%253A19%29.c05641c2e1c74389b471fbc9110c302d/artifacts/state/state.json
Retrying (Retry(total=0, connect=0, read=5, redirect=5, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0c23e63c10>: Failed to establish a new connection: [Errno 111] Connection refused')': /test_project/Anonymous%20task%20%28user%2540beryl%202022-03-23%2004%253A05%253A19%29.c05641c2e1c74389b471fbc9110c302d/artifacts/state/state.json
2022-03-23 14:27:03,162 - clearml.storage - ERROR - Could not download, err: HTTPConnectionPool(host='localhost', port=8081): Max retries exceeded with url: /test_project/Anonymous%20task%20%28user%2540beryl%202022-03-23%2004%253A05%253A19%29.c05641c2e1c74389b471fbc9110c302d/artifacts/state/state.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0c25ee7f70>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/.conda/envs/test_project/lib/python3.8/site-packages/clearml/datasets/dataset.py", line 968, in get
raise ValueError('Could not load Dataset id={} state'.format(task.id))
ValueError: Could not load Dataset id=c05641c2e1c74389b471fbc9110c302d state `
So for some reason it tries to use "localhost" instead of the actual address - do you have any other config file there? (perhaps an old trainf.conf
file?), or maybe some ClearML-related environment variables?
The dataset file URL is set on upload and stored on the server - you can't have different machines using a different server address for the same server
is this log from running on machine "B"?
Hi MagnificentWorm7 ,
but not working with created and uploaded from different server
What is the clearml.conf
configuration in the different server?