
Reputation
Badges 1
93 × Eureka!I am bit confused because I can see configuration sections Azure storage in the clearml.conf files, but these are on the client pc and the clearml-agent compute nodes.
So do these parameters have to be set on the clients and compute nodes individually, or is something that can be set on the server?
AgitatedDove14 I would love to help the project.
I am just about to move house, which is stressful enough without a global pandemic(!), so until that's completed I won't commit to anything. However, once settled in the new place, and I have a bit more time, I would very much welcome contributing.
AgitatedDove14 I think the major issue is working out how to get the setup of the node dynamically passed to the VMSS so when it creates a node it does the following:
Provisions the correct environment for the clearml-agent
. Installs the clearml-agent
and sets up the clearml.conf
file with the access credentials for the server and file storage. Executes the clearml-agent
on the correct queue, ready for accepting jobs.
In Azure VMSS, there is a method called "Cust...
What I really like about ClearML is the potential for capturing development at an early stage, as it requires only minimal adjustment of code for it be in the very least captured as an experiment, even if it is run locally on ones machine.
What we would like ideally, is a system where development, training, and deployment are almost one and the same thing, to reduce the lead time from development code to production models. Removing as many translation layers as you can between the developmen...
Or even better dataset_v1.2.34_alpha_with_that_thingy_change_-2_copy_copy.zip
AgitatedDove14
Ok, after configuration file huge detour, we are now back to fixing genuine issues here.
To recap, in order to get the Triton container to run and to be able to connect to Azure Blob Storage, the following changes were made to the launch_engine
method of the ServingService
class:
For the task creation call:
The docker string was changed remove the port specifications [to avoid the port conflicts error]. The addition of packages argument was required, as the doc...
AgitatedDove14 in this remote session on the compute node, where I am manually importing the clearml
sdk, what's the easiest way to confirm that the Azure credentials are being imported correctly?
I assume from our discussions yesterday on the dockers, that when the orchestration agent daemon is run with a given clearml.conf
, I can see that the docker run command has various flags being used to pass certain files and environment variables from the host operating system of the co...
I don’t have a scooby doo what that pickle file is.
Thanks for the last tip, "easy mistaker to maker"
So, AgitatedDove14 what I really like about the approach with ClearML is that you can genuinely bring the architecture into the development process early. That has a lot of desirable outcomes, including versioning and recording of experiments, dataset versioning etc. Also it would enforce a bit more structure in project development, if things are required to fit into a bit more of a defined box (or boxes). However, it also seems to be not too prescriptive, such that I would worry that a lot...
AgitatedDove14
So can you verify it can download the model ?
Unfortunately it's still falling over, but then I got the same result for the credentials using both URI strings, the original, and the modified version, so it points to something else going on.
I note that the StorageHelper.get()
method has a call which modifies the URI prior to it being passed to the function which gets the storage account and container name. However, when I run this locally, it doesn't seem to do a...
AnxiousSeal95
I think I can definitely see value in that.
I found that once you go beyond the easy examples, where you are largely using datasets that curated as part of a python package, then it took a bit of effort to get my head around the dataset tools.
Likewise with the deployment side of things, and the Triton inference engine, there are certain aspects of that which I am relatively new to, so to go from the simple Keras example, to getting a feeling that the tool will cover the use ...
AgitatedDove14 that started out a lot shorter, and I read it twice, but I think it answers your question..... 😉
AgitatedDove14
Just compared two uploads of the same dataset, one to Azure Blob and the other to local storage on clearml-server.
The local storage didn't report any statistics, so it might be confined to the cloud storage method, and specifically Azure.
When I run the commands above you suggested, if I run them on the compute node but on the host system within conda environment I installed to run the agent daemon from, I get the issues as we appear to have seen when executing the Triton inference service.
` (py38_clearml_serving_git_dev) edmorris@ecm-clearml-compute-gpu-002:~$ python
Python 3.8.10 (default, May 19 2021, 18:05:58)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
...
I should say, the company I am working Malvern Panalytical, we are developing an internal MLOps capability, and we are starting to develop a containerized deployment system, for developing, training and deploying machine learning models. Right now we are at the early stages of development, and our current solution is based on using Azure MLOps, which I personally find very clunky.
So I have been tasked with investigating alternatives to replace the training and model deployment side of thing...
Right, I am still a bit confused to be honest.
Yes, there's an internal provisioning for it from the Azure VMSS.
However, that would mean passing back the hostname to the Autoscaler class.
Right now as it's written, the spin_up_worker
method doesn't update the class attributes. Following the AWS example that is also the case, where I can see it merely takes the arguments given, such as worker id, and constructs a node with those parameters e.g. hostname etc.
Looking at the supervisor
method of the base AutoScaler
clas...
I think perhaps as standard, the group docker
is already created.
The bit that isn't done is making your user part of that group.
Crawls out from under the table and takes a deep breath
AgitatedDove14 you remember we talked about it being a bug or a stupid.....
Well, it's a stupid by me.... somehow I managed to propagate irregularities in the clearml.conf
file such that it successfully loaded, but the expected nested structure was not there.
When the get_local_copy()
method requested the model, it correctly got the azure credentials, however when the StorageHelper
class tries to get the azure cr...
It’s an ignite framework trained PyTorch model using one of the three well known vision model packages, TIMM, PYTORCHCV or TORCHVISION,
This was the code:
` import os
import argparse
# ClearML modules
from clearml import Dataset
parser = argparse.ArgumentParser(description='CUB200 2011 ClearML data uploader - Ed Morris (c) 2021')
parser.add_argument(
'--dataset-basedir',
dest='dataset_basedir',
type=str,
help='The directory to the root of the dataset',
default='/home/edmorris/projects/image_classification/caltech_birds/data/images')
parser.add_argument(
'--clearml-project',
dest='clearml_projec...
EnviousStarfish54 we are at the beginning phases exploring potential solutions to MLops. So I have only been playing with the tools, including the dataset side of things. However, I think that an integral part of capturing a model in its entirety is being able to make sure that you know what into making it. So I see being able to version and difference datasets as just as important as the code, or the environment in which it is run.
I think so.
I am doing this with one hand tied behind my back at the moment because I waiting to get an Azure AD App and Services policy setup, to enable the autoscaler to authenticate with the Azure VMSS via the Python SDK.
If my memory serves me correctly, I think it happened on weights saving as well, let me just check an experiment log and see.
Good question, SuccessfulKoala55
My thoughts are orbiting around environment orchestration and having a bit more control over how an environment is created. I understand that the easiest form of the configuration is to implement it on the clearml-agent side and run a daemon with the configuration as required, whether that be using venv's or docker containers. Of course this limits the deployment type to the queue that the daemon is listening to.
I was considering if that by exposing the...
This potentially might be a silly question, but in order to get the inference working, I am assuming that no specific inference script has to be written for handling the model?
This is what the clearml-serving package takes care of, correct?