Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Have Am Issue Getting A Model From The Model Repository When Running A Task In A Remote Worker. I Have A Custom Model That Was Saved With Outputmodel:

I have am issue getting a model from the model repository when running a task in a remote worker.
I have a custom model that was saved with OutputModel:

model = OutputModel(task=task, label_enumeration={"Image" : 0, "Title" : 1},framework="layout-parser", tags=["for inference"], comment="Initial model")
model.update_weights_package(weights_path="../models/initial/", auto_delete_file=False)

In another script, I query the model registry to get the model and use it for inference:

model_from_clearml = Model.query_models(project_name="Layout", tags=["for inference"])
print(model_from_clearml)
model_folder = model_from_clearml[0].get_local_copy()
print(model_folder)
model_folder = Path(model_folder)
model = lp.Detectron2LayoutModel(
        config_path=str(model_folder / "config.yaml"),
        model_path=str(model_folder / "model_0019999.pth"))
model.detect(img)
...

I ran this script locally and it works perfectly.
But when I ran it remotely, the environment is properly created, the task is started and I see it in the ClearML UI, but the downloading of the model fails. The output shows:

[<clearml.model.Model object at 0x7fc8acc85ea0>] # result of model_from_clearml = Model.query_models(project_name="Layout", tags=["for inference"])
None # result of model_folder = model_from_clearml[0].get_local_copy()

Does anyone know why it cannot download the model in the remote worker?
The remote worker is actually my own PC where I ran "clearml-agent daemon --queue default --docker --detached"

  
  
Posted one year ago
Votes Newest

Answers 5


Hi @<1523711002288328704:profile|YummyLion54> , can you please add a full log of both runs for reference?

  
  
Posted one year ago

@<1523701070390366208:profile|CostlyOstrich36>
Sure, here they are
original local run log:

2023-08-30 09:45:02
ClearML Task: overwriting (reusing) task id=ff7a3e0849bb4ce09c86f9ce1d31b5bf
2023-08-30 09:45:05
ClearML results page: 

2023-08-30 09:45:09
[<clearml.model.Model object at 0x7f937df010f0>]
2023-08-30 09:45:11
/tmp/model_package__enbe91c
/tmp/model_package__enbe91c
2023-08-30 09:45:24
Downloading from s3
2023-08-30 09:45:37
Download done
2023-08-30 09:45:37
0it [00:00, ?it/s]
2023-08-30 09:45:41
/home/ghisso/miniconda3/envs/sankei/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
2023-08-30 09:45:47
15it [00:10,  2.58it/s]
2023-08-30 09:45:57
43it [00:20,  2.78it/s]
2023-08-30 09:46:07
70it [00:30,  2.33it/s]
2023-08-30 09:46:27
ClearML results page: 

ClearML dataset page: 

2023-08-30 09:46:28
Generating SHA2 hash for 71 files
2023-08-30 09:46:28
  0%|                                                                                            | 0/71 [00:00<?, ?it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 71/71 [00:00<00:00, 839.19it/s]
2023-08-30 09:46:28
Hash generation completed
2023-08-30 09:46:32
Pending uploads, starting dataset upload to 

2023-08-30 09:46:34
Uploading dataset changes (71 files compressed to 49.06 MiB) to 

2023-08-30 09:46:34,744 - clearml.storage - INFO - Uploading: 5.06MB / 49.06MB @ 12.97MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:35,121 - clearml.storage - INFO - Uploading: 10.06MB / 49.06MB @ 13.26MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:35
2023-08-30 09:46:35,508 - clearml.storage - INFO - Uploading: 15.06MB / 49.06MB @ 12.92MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:35,893 - clearml.storage - INFO - Uploading: 20.06MB / 49.06MB @ 13.01MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:36,266 - clearml.storage - INFO - Uploading: 25.06MB / 49.06MB @ 13.39MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:36
2023-08-30 09:46:36,781 - clearml.storage - INFO - Uploading: 30.06MB / 49.06MB @ 9.71MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:37,167 - clearml.storage - INFO - Uploading: 35.06MB / 49.06MB @ 12.95MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:37
2023-08-30 09:46:37,801 - clearml.storage - INFO - Uploading: 40.06MB / 49.06MB @ 7.89MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:38,172 - clearml.storage - INFO - Uploading: 45.06MB / 49.06MB @ 13.45MBs from /tmp/dataset.a75425cb49314c41a249b31d5d66e2ea.kmcn21h6.zip
2023-08-30 09:46:40
File compression and upload completed: total size 49.06 MiB, 1 chunk(s) stored (average size 49.06 MiB)

remote run log (at start of run):

Environment setup completed successfully
Starting Task Execution:
2023-08-30 10:17:24
ClearML results page: 

2023-08-30 10:17:30
[<clearml.model.Model object at 0x7fc8acc85ea0>]
None
Traceback (most recent call last):
  File "/root/.clearml/venvs-builds/3.10/task_repository/Sankei.git/src/infer.py", line 95, in <module>
    infer()
  File "/root/.clearml/venvs-builds/3.10/task_repository/Sankei.git/src/infer.py", line 22, in infer
    model_folder = Path(model_folder)
  File "/usr/lib/python3.10/pathlib.py", line 960, in __new__
    self = cls._from_parts(args)
  File "/usr/lib/python3.10/pathlib.py", line 594, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/usr/lib/python3.10/pathlib.py", line 578, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
2023-08-30 10:17:48
Process failed, exit code 1
  
  
Posted one year ago

Hi @<1523711002288328704:profile|YummyLion54> ! By default, we don't upload the models to our file server, so in the remote run we will try to pull the file from you local machine which will fail most of the time. Specify the upload_uri to the api.files_server entry in your clearml.conf if you want to upload it to the clearml server, or any s3/gs/azure links if you prefer a cloud provider

  
  
Posted one year ago

Thank you! Hadn’t thought about checking that the model was actually uploaded remotely! Will try it out!

  
  
Posted one year ago

Here is the full log of the remote run with the access keys redacted

  
  
Posted one year ago
1K Views
5 Answers
one year ago
one year ago
Tags
Similar posts