Hi Guys, I Am Trying To Upload And Serve A Pre-Existing 3-Rdparty Pytorch Model Inside My Clearml Cluster. However, After Proceeding With The Suggested Sequence Of Operations By Official Docs And Later Even Gpt O3, I Am Having Errors Which I Cannot Solve.

Unanswered

AgitatedDove14 ClearML server itself and all of its components (API server etc.) are on x.x.x.69 machine.
Agents and serving are on x.x.x.68 worker machine. My model files are also there, just placed in some usual non-shared linux directory.
And I didn't do any specific configurations of the clearml fileserver docker - everything is on its defaults without a single line changed except the IP address of the ClearML server.

I tried a couple of approaches to upload my preexisting models into ClearML:

To send them directly from .68 via the following script:

from clearml import Task, InputModel

task = Task.init(project_name='LogSentinel', task_name='Register remote model from .68')

model_file_path = "file:///10.14.158.68/home/lab-usr/logsentinel/deeplog-bestloss.pth

model = InputModel.import_model(
    name="deeplog_bilstm",
    weights_url=model_file_path,
    project="LogSentinel",
    framework="pytorch"
)

task.connect(model)

It registers the model without any visible errors, it appears in the model repository.

To copy the model.pth file itself to the .69 machine, then run the script for LOCAL model file upload:

from clearml import Task, InputModel

task = Task.init(project_name='LogSentinel', task_name='Register model')

model_file_path = "file:///home/lab-usr/logsentinel/deeplog-bestloss.pth

model = InputModel.import_model(name="deeplog_bilstm", weights_url=model_file_path, project="LogSentinel", framework="pytorch")

task.connect(model)

It registers it in model storage, also no errors, but neither of them works when clearml-serving is directed to use them via clearml-serving model add , because triton serving fails with error of not being able to find model file, and requests to the endpoint return "error 405 - method not allowed".

  				
Posted 
	one month ago

					More  		
  Report
		
					PungentRobin32
				
					0
					 × 1

29 Views

0 Answers

one month ago