Hi all, I’ve been running into an issue lately when using get_local_copy
. The way I use it is: I use get_tasks
to find the previous relevant experiments on my project, I then iterate over them and download the output models using get_local_copy
.
This is the error I get:
File "...", line 69, in fetch_previous_models download_path = model.get_local_copy() File "...\site-packages\clearml\model.py", line 483, in get_local_copy return self.get_weights(raise_on_error=raise_on_error) File "...\site-packages\clearml\model.py", line 338, in get_weights return self._get_base_model().download_model_weights(raise_on_error=raise_on_error) File "...\site-packages\clearml\backend_interface\model.py", line 406, in download_model_weights if Path(dl_file).exists(): File "...\site-packages\pathlib2\__init__.py", line 1718, in exists self.stat() File "...\site-packages\pathlib2\__init__.py", line 1523, in stat return self._accessor.stat(self) File "...\site-packages\pathlib2\__init__.py", line 646, in wrapped return strfunc(str(pathobj), *args) OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: "('<path_to_temp_folder>', '<model_name>'): <model_name>_0.pt"
When I check the <path_to_temp_folder>
, I noticed that the model available there was called <model_name>_<num_epochs_model_was_trained_for>.pt
(it is looking for _
http://0.pt instead of the model from the last epoch)
I checked the web UI and the name is displayed correctly there (… _<num_epochs_model_was_trained_for>.pt
)
Also, I set the experiment to keep a single checkpoint, so there is a single output model in the task ( {'input': [<clearml.model.Model object at 0x00000170A789C2B0>], 'output': [<clearml.model.Model object at 0x00000170A788C850>]}
)
Any ideas what could be causing this?
(This is on a self-hosted server 1.4.0 running on Ubuntu - the script that causes that error is running on Windows using clearml 1.4.1)