 
			Reputation
Badges 1
93 × Eureka!AgitatedDove14
Ok, after configuration file huge detour, we are now back to fixing genuine issues here.
To recap, in order to get the Triton container to run and to be able to connect to Azure Blob Storage, the following changes were made to the  launch_engine  method of the  ServingService  class:
For the task creation call:
The docker string was changed remove the port specifications [to avoid the port conflicts error]. The addition of packages argument was required, as the doc...
Looking at the  _resolve_base_url()   method of the  StorageHelper class I can see that it is using  furl  to handle the path splitting for getting at the Azure storage account and container names.
Replicating the commands, the first one to get the Storage Account seems to have worked ok:
f = furl.furl(uri) account_name = f.host.partition(".")[0]Replicating above manually seems to give the same answer for both and it looks correct to me:
` >>> import furl
f_a = furl.fu...
My bad you are correct, it is as you say.
We all remember the days of  dataset_v1.2.34_alpha_with_that_thingy_change_-2.zip
AgitatedDove14  in this remote session on the compute node, where I am manually importing the  clearml  sdk, what's the easiest way to confirm that the Azure credentials are being imported correctly?
I assume from our discussions yesterday on the dockers, that when the orchestration agent daemon is run with a given  clearml.conf , I can see that the docker run command has various flags being used to pass certain files and environment variables from the host operating system of the co...
SuccessfulKoala55 I may have made some progress with this bug, but have stumbled onto another issue in getting the Triton service up and running.
See comments in the github issue.
I have rerun the serving example with my PyTorch job, but this time I have followed the MNIST Keras example.
I appended a GPU compute resource to the  default  queue and then executed the service on the default queue.
This resulted in a Triton serving engine container spinning up on the compute resource, however it failed due to the previous issue with ports conflicts:
` 2021-06-08 16:28:49
task f2fbb3218e8243be9f6ab37badbb4856 pulled from 2c28e5db27e24f348e1ff06ba93e80c5 by worker ecm-clear...
I am bit confused because I can see configuration sections Azure storage in the clearml.conf files, but these are on the client pc and the clearml-agent compute nodes.
So do these parameters have to be set on the clients and compute nodes individually, or is something that can be set on the server?
I was thinking that I can run on the compute node in the environment that the agent is executed from, but actually it is the environment inside the docker container that the Triton server is executing in.
Could I use the  clearml-agent build   command and the  Triton serving engine  task ID to create a docker container that I could then use interactively to run these tests?
` Starting Task Execution:
usage: train_clearml_pytorch_ignite_caltech_birds.py [-h] [--config FILE]
[--opts ...]
PyTorch Image Classification Trainer - Ed Morris (c) 2021
optional arguments:
-h, --help     show this help message and exit
--config FILE  Path and name of configuration file for training. Should be a
.yaml file.
--opts ...     Modify config options using the command-line 'KEY VALUE'
p...
Mr  AgitatedDove14  Good spot sir!
Sounds like a good candidate, I will test now and report back.
This potentially might be a silly question, but in order to get the inference working, I am assuming that no specific inference script has to be written for handling the model?
This is what the clearml-serving package takes care of, correct?
Thanks  CostlyOstrich36 , you can also get access to the keys in the Azure Storage Explorer.
Looking at the  Properties  section gives the secure keys.
This job did download pre-trained weights, so the only difference between them is the local dataset cache.
Right, I am still a bit confused to be honest.
AgitatedDove14
So can you verify it can download the model ?
Unfortunately it's still falling over, but then I got the same result for the credentials using both URI strings, the original, and the modified version, so it points to something else going on.
I note that the  StorageHelper.get()  method has a call which modifies the URI prior to it being passed to the function which gets the storage account and container name. However, when I run this locally, it doesn't seem to do a...
AgitatedDove14  Thanks for that.
I suppose the same would need to be done for any  client  PC running  clearml  such that you are submitting dataset upload jobs?
That is, the dataset is perhaps local to my laptop, or on a development VM that is not in the clearml system, but I from there I want to submit a copy of a dataset, then I would need to configure the storage section in the same way as well?
I assume the account name and key refers to the storage account credentials that you can f...
SuccessfulKoala55
Good news!
It looks like pulling the new  clearml-server   version has solved the problem.
I can happily publish models.
Interestingly, I was able to publish models before using this server, so I must have inadvertently updated something that has caused a conflict.
Fixes and identified issues can be found in these github comments.
Closing the discussion here.
AgitatedDove14 ,
Often a question is asked about a data science project at the beginning, which are like "how long will that take?" or "what are the chances it will work to this accuracy?".
To the uninitiated, these would seem like relatively innocent and easy to answer questions. If a person has a project management background, with more clearly defined technical tasks like software development or mechanical engineering, then often work packages and uncertainties relating to outcomes are m...
So I've been testing bits and pieces individually.
For example, I made a custom image for the VMSS nodes, which is based on Ubuntu and has multiple CUDA versions installed, as well as conda and docker pre-installed.
I'm managed to test the setup script, so that it executes on a pristine node, and results in a compute node being added to the relevant queue, but that's been executed manually by me, as I have the credentials to log on via SSH.
And I had to do things get the clearml-server the ma...
AgitatedDove14
Just compared two uploads of the same dataset, one to Azure Blob and the other to local storage on clearml-server.
The local storage didn't report any statistics, so it might be confined to the cloud storage method, and specifically Azure.
I dip in and out of Docker, and that one gets me almost every time!
Ah ok, so it's the query string you use with the SAS box. Great.
I have managed to create a docker container from the Triton task, and run it interactive mode, however I get a different set of errors, but I think these are related to command line arguments I used to spin up the docker container, compared to the command used by the clearml orchestration system.
My simplified docker command was:  docker run -it --gpus all --ipc=host task_id_2cde61ae8b08463b90c3a0766fffbfe9
However, looking at the Triton inference server object logging, I can see there...
Hmmmm, I thought it logged it with the terminal results when it was uploading weights, but perhaps that's only the live version and the saved version is pruned? Or my memory is wrong.... it is Friday after all!
Can't find anymore reference to it, sorry.
The following code is the training script that was used to setup the experiment. This code has been executed on the server in a separate conda environment and verified to run fine (minus the clearml code).
` from future import print_function, division
import os, pathlib
Clear ML experiment
from clearml import Task, StorageManager, Dataset
Local modules
from cub_tools.trainer import Ignite_Trainer
from cub_tools.args import get_parser
from cub_tools.config import get_cfg_defaults
#...
After finally getting the model to be recognized by the Triton server, it now fails with the attached error messages.
Any ideas  AgitatedDove14 ?
Ok I think I managed to create a docker image of the Triton instance server, just putting the kids to bed, will have a play afterwards.