
Reputation
Badges 1
93 × Eureka!I don’t have a scooby doo what that pickle file is.
I was thinking that I can run on the compute node in the environment that the agent is executed from, but actually it is the environment inside the docker container that the Triton server is executing in.
Could I use the clearml-agent build
command and the Triton serving engine
task ID to create a docker container that I could then use interactively to run these tests?
My bad you are correct, it is as you say.
AgitatedDove14 Ok I can do that.
I was just thinking it through.
Would this be best if it were executed in the Triton execution environment?
AgitatedDove14 in this remote session on the compute node, where I am manually importing the clearml
sdk, what's the easiest way to confirm that the Azure credentials are being imported correctly?
I assume from our discussions yesterday on the dockers, that when the orchestration agent daemon is run with a given clearml.conf
, I can see that the docker run command has various flags being used to pass certain files and environment variables from the host operating system of the co...
Right, I am still a bit confused to be honest.
Yup, I can confirm that's the case.
I have just literally installed the latest commit via the master branch and it works.
Mr AgitatedDove14 Good spot sir!
Sounds like a good candidate, I will test now and report back.
We all remember the days of dataset_v1.2.34_alpha_with_that_thingy_change_-2.zip
It’s an ignite framework trained PyTorch model using one of the three well known vision model packages, TIMM, PYTORCHCV or TORCHVISION,
Just ran a model which pulled the dataset from the Azure Blob Storage and that seemed to looked correct.
2021-06-04 13:34:21,708 - clearml.storage - INFO - Downloading: 13.00MB / 550.10MB @ 32.59MBs from
Birds%2FDatasets/cub200_2011_train_dataset.37a8f00931b04952a1500e3ada831022/artifacts/data/dataset.37a8f00931b04952a1500e3ada831022.zip 2021-06-04 13:34:21,754 - clearml.storage - INFO - Downloading: 21.00MB / 550.10MB @ 175.54MBs from
` Birds%2FDatasets/cub200_2011_train_dataset...
Like AnxiousSeal95 says, clearml server will version a dataset for you and push it to a unified storage place, as well as make it differenceable.
I’ve written a workshop on how to train image classifiers for the problem of bird species identification and recently I’ve adapted it to work with clearml.
There is an example workbook on how to upload a dataset to clearml server, in this a directory of images. See here: https://github.com/ecm200/caltech_birds/blob/master/notebooks/clearml_add...
You need to make sure the user is part of the docker
group.
Follow these commands post install of Docker engine, and don't forget to restart the terminal session for the changes to take full effect .
` sudo groupadd docker
sudo usermod -aG docker ${USER} `Don't install Docker engine with root, your sysadmin will have kittens!
AgitatedDove14
So can you verify it can download the model ?
Unfortunately it's still falling over, but then I got the same result for the credentials using both URI strings, the original, and the modified version, so it points to something else going on.
I note that the StorageHelper.get()
method has a call which modifies the URI prior to it being passed to the function which gets the storage account and container name. However, when I run this locally, it doesn't seem to do a...
AgitatedDove14
Ok, after configuration file huge detour, we are now back to fixing genuine issues here.
To recap, in order to get the Triton container to run and to be able to connect to Azure Blob Storage, the following changes were made to the launch_engine
method of the ServingService
class:
For the task creation call:
The docker string was changed remove the port specifications [to avoid the port conflicts error]. The addition of packages argument was required, as the doc...
SuccessfulKoala55 WearyLeopard29 could this be a potential idea?
It appears here the setup is for apps on different ports, and it seems to me to be exactly the clearml problem?
So could extrapolate and put in an API app and a FILESERVER app description with the correct ports?
https://gist.github.com/apollolm/23cdf72bd7db523b4e1c
` # the IP(s) on which your node server is running. I chose port 3000.
upstream app_geoforce {
server 127.0.0.1:3000;
}
upstream app_pcodes{
server 12...
Ohhhhhhhhhhhhhhhhhhhh......that makes sense,
SuccessfulKoala55 I am not that familiar with AWS. Is that essentially a port forwarding service, where you have a secure end point that redirects to the actual server?
In my case it's a Tesla P40, which has 24 GB VRAM.
Crawls out from under the table and takes a deep breath
AgitatedDove14 you remember we talked about it being a bug or a stupid.....
Well, it's a stupid by me.... somehow I managed to propagate irregularities in the clearml.conf
file such that it successfully loaded, but the expected nested structure was not there.
When the get_local_copy()
method requested the model, it correctly got the azure credentials, however when the StorageHelper
class tries to get the azure cr...
AgitatedDove14 Thanks for that.
I suppose the same would need to be done for any client PC running clearml such that you are submitting dataset upload jobs?
That is, the dataset is perhaps local to my laptop, or on a development VM that is not in the clearml system, but I from there I want to submit a copy of a dataset, then I would need to configure the storage section in the same way as well?
I assume the account name and key refers to the storage account credentials that you can f...
I have changed the configuration file created by Certbot to listen on port 8080 instead of port 80, however, when I restart the NGINX service, I get errors relating to bindings.
server { listen 8080 default_server; listen [::]:8080 ipv6only=on default_server;
Restarting the service results in the following errors:
` ● nginx.service - A high performance web server and a reverse proxy server
Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: ...
SuccessfulKoala55
SUCCESS!!!
This appears to be working.
Setup certifications us sudo certbot --nginx
.
Then edit the default configuration file in /etc/nginx/sites-available
` server {
listen 80;
return 301 https://$host$request_uri;
}
server {
listen 443;
server_name your-domain-name;
ssl_certificate /etc/letsencrypt/live/your-domain-name/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/your-domain-name/privkey.pem;
...
Thanks CostlyOstrich36 , you can also get access to the keys in the Azure Storage Explorer.
Looking at the Properties section gives the secure keys.
EnviousStarfish54 interesting thoughts thank you for sharing.
We are looking at a hybrid platform like you, but have chosen Prefect for the pipeline orchestration, and we are considering what system to adopt for experiment and model tracking, and ease of deployment.
Oh it's a load balancer, so it does that and more.
But I suppose the point holds though, it provides an end-point for external locations, and then handles the routing to the correct resources.
Fixes and identified issues can be found in these github comments.
Closing the discussion here.
WearyLeopard29 no I wasn’t able to do that although I didn’t explicitly try.
I was wondering if this was as a high a security risk then the web portal?
Access is controlled by keys, whereas the web portal is not.
I admit I’m a data scientist, so any proper IT security person would probably end up a shivering wreck in the corner of the room if they saw some of my common security practises. I do try to be secure, but I am not sure how good I am at it.