I Have Am Issue Getting A Model From The Model Repository When Running A Task In A Remote Worker. I Have A Custom Model That Was Saved With Outputmodel:

Answered

I have am issue getting a model from the model repository when running a task in a remote worker.
I have a custom model that was saved with OutputModel:

model = OutputModel(task=task, label_enumeration={"Image" : 0, "Title" : 1},framework="layout-parser", tags=["for inference"], comment="Initial model")
model.update_weights_package(weights_path="../models/initial/", auto_delete_file=False)

In another script, I query the model registry to get the model and use it for inference:

model_from_clearml = Model.query_models(project_name="Layout", tags=["for inference"])
print(model_from_clearml)
model_folder = model_from_clearml[0].get_local_copy()
print(model_folder)
model_folder = Path(model_folder)
model = lp.Detectron2LayoutModel(
        config_path=str(model_folder / "config.yaml"),
        model_path=str(model_folder / "model_0019999.pth"))
model.detect(img)
...

I ran this script locally and it works perfectly.
But when I ran it remotely, the environment is properly created, the task is started and I see it in the ClearML UI, but the downloading of the model fails. The output shows:

[<clearml.model.Model object at 0x7fc8acc85ea0>] # result of model_from_clearml = Model.query_models(project_name="Layout", tags=["for inference"])
None # result of model_folder = model_from_clearml[0].get_local_copy()

Does anyone know why it cannot download the model in the remote worker?
The remote worker is actually my own PC where I ran "clearml-agent daemon --queue default --docker --detached"

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					YummyLion54
				
					0
					 × 1

Votes Newest

Answers 5

@<1523701070390366208:profile|CostlyOstrich36>
Sure, here they are
original local run log:

2023-08-30 09:45:02
ClearML Task: overwriting (reusing) task id=ff7a3e0849bb4ce09c86f9ce1d31b5bf
2023-08-30 09:45:05
ClearML results page:


2023-08-30 09:45:09
[<clearml.model.Model object at 0x7f937df010f0>]
2023-08-30 09:45:11
/tmp/model_package__enbe91c
/tmp/model_package__enbe91c
2023-08-30 09:45:24
Downloading from s3
2023-08-30 09:45:37
Download done
2023-08-30 09:45:37
0it [00:00, ?it/s]
2023-08-30 09:45:41
/home/ghisso/miniconda3/envs/sankei/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
2023-08-30 09:45:47
15it [00:10,  2.58it/s]
2023-08-30 09:45:57
43it [00:20,  2.78it/s]
2023-08-30 09:46:07
70it [00:30,  2.33it/s]
2023-08-30 09:46:27
ClearML results page:


ClearML dataset page:


2023-08-30 09:46:28
Generating SHA2 hash for 71 files
2023-08-30 09:46:28
  0%|                                                                                            | 0/71 [00:00<?, ?it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 71/71 [00:00<00:00, 839.19it/s]
2023-08-30 09:46:28
Hash generation completed
2023-08-30 09:46:32
Pending uploads, starting dataset upload to


2023-08-30 09:46:34
Uploading dataset changes (71 files compressed to 49.06 MiB) to


2023-08-30 09:46:34,744 - clearml.storage - INFO - Uploading: 5.06MB / 49.06MB @ 12.97MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:35,121 - clearml.storage - INFO - Uploading: 10.06MB / 49.06MB @ 13.26MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:35
2023-08-30 09:46:35,508 - clearml.storage - INFO - Uploading: 15.06MB / 49.06MB @ 12.92MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:35,893 - clearml.storage - INFO - Uploading: 20.06MB / 49.06MB @ 13.01MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:36,266 - clearml.storage - INFO - Uploading: 25.06MB / 49.06MB @ 13.39MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:36
2023-08-30 09:46:36,781 - clearml.storage - INFO - Uploading: 30.06MB / 49.06MB @ 9.71MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:37,167 - clearml.storage - INFO - Uploading: 35.06MB / 49.06MB @ 12.95MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:37
2023-08-30 09:46:37,801 - clearml.storage - INFO - Uploading: 40.06MB / 49.06MB @ 7.89MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:38,172 - clearml.storage - INFO - Uploading: 45.06MB / 49.06MB @ 13.45MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:40
File compression and upload completed: total size 49.06 MiB, 1 chunk(s) stored (average size 49.06 MiB)

remote run log (at start of run):

Environment setup completed successfully
Starting Task Execution:
2023-08-30 10:17:24
ClearML results page:


2023-08-30 10:17:30
[<clearml.model.Model object at 0x7fc8acc85ea0>]
None
Traceback (most recent call last):
  File "/root/.clearml/venvs-builds/3.10/task_repository/Sankei.git/src/infer.py", line 95, in <module>
    infer()
  File "/root/.clearml/venvs-builds/3.10/task_repository/Sankei.git/src/infer.py", line 22, in infer
    model_folder = Path(model_folder)
  File "/usr/lib/python3.10/pathlib.py", line 960, in __new__
    self = cls._from_parts(args)
  File "/usr/lib/python3.10/pathlib.py", line 594, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/usr/lib/python3.10/pathlib.py", line 578, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
2023-08-30 10:17:48
Process failed, exit code 1

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					YummyLion54
				
					0
					 × 1

Hi @<1523711002288328704:profile|YummyLion54> , can you please add a full log of both runs for reference?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Hi @<1523711002288328704:profile|YummyLion54> ! By default, we don't upload the models to our file server, so in the remote run we will try to pull the file from you local machine which will fail most of the time. Specify the upload_uri to the api.files_server entry in your clearml.conf if you want to upload it to the clearml server, or any s3/gs/azure links if you prefer a cloud provider

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

Thank you! Hadn’t thought about checking that the model was actually uploaded remotely! Will try it out!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					YummyLion54
				
					0
					 × 1

Here is the full log of the remote run with the access keys redacted

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					YummyLion54
				
					0
					 × 1

Write your answer

2K Views

5 Answers

2 years ago