Hello, I am having issues using the dataset service pointing to the wrong fileserver url when running tasks remotely on the ClearML Agents running in k8s. I was following the onboarding guides available on Youtube. In the video they run a command to upload the data from your local laptop with
clearml-data create --project "Full Overview" --name "Fashion MNIST"
clearml-data add --files fashion_mnist
clearml-data close
In the following video they clone the task and run it remotely. I keep getting errors from the agent worker that it is unable to download the dataset but it's referencing the localhost address:
2025-06-03 01:27:39,638 - clearml.storage - ERROR - Could not download
, err: HTTPConnectionPool(host='localhost', port=8081): Max retries exceeded with url: /Full%20Overview/.datasets/Fashion%20MNIST/Fashion%20MNIST.c4a325608ecd4c40a12535e9eed0f54d/artifacts/state/state.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f385e826480>: Failed to establish a new connection: [Errno 111] Connection refused'))
Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.12/task_repository/ml-clearml-poc.git/scripts/scratch/train_xgboost_dataver.py", line 19, in <module>
data_path = Dataset.get(dataset_name="Fashion MNIST", alias="Fashion MNIST").get_local_copy()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.clearml/venvs-builds/3.12/lib/python3.12/site-packages/clearml/datasets/dataset.py", line 1806, in get
instance = get_instance(dataset_id)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.clearml/venvs-builds/3.12/lib/python3.12/site-packages/clearml/datasets/dataset.py", line 1718, in get_instance
raise ValueError("Could not load Dataset id={} state".format(task.id))
ValueError: Could not load Dataset id=c4a325608ecd4c40a12535e9eed0f54d state
I have been trying to fix this, including setting the env vars in the agent CLEARML_FILES_HOST
to point to the k8s address for the service. This still does not work. When I look in the ClearML UI, I see the dataset includes references to localhost.
I'm confused as to how this is setup, given that I was under the impression local/remote exec was supposed to be seamless?