VivaciousPenguin66

17 Questions, 107 Answers

Active since 10 January 2023

Last activity 10 months ago

Reputation

Badges 1

93 × Eureka!

Questions 17
Answers 107

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

///[Please note, all the below was executed on the command line of the compute node, not the server head node]/// I've been following the example on Keras, b...

clearml

3 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

This Wasn'T A Big Deal, But I Noticed When Pushing A Dataset To The Server, With Cloud Storage, That The Upload Information Looked A Bit Bonkers In Terms Of Units:

This wasn't a big deal, but I noticed when pushing a dataset to the server, with cloud storage, that the upload information looked a bit bonkers in terms of ...

clearml

3 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

I Have Been Successfully Deploying And Training A Pytorch Cnn On A

I have been successfully deploying and training a PyTorch CNN on a clearml-agent managed compute resource and have been testing some the capabilities, includ...

clearml

3 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

I Have Just Installed The Pypi Version Of

I have just installed the PYPI version of clearml-serving and I get the following error at the command line. clearml-serving --help clearml-serving - CLI for...

clearml

3 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

When I Setup My Local Virtual Environment I Use A Combination Of Conda And Pip. I Use Conda As My Environment Manager, And Then Use Pip For Packages That Are Not In The Conda Repositories.

When I setup my local virtual environment I use a combination of Conda and pip. I use conda as my environment manager, and then use pip for packages that are...

azure

3 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Silly Question Alert...... Really Simple One To Start With. If I Have The More Or Less The Default Settings For A

Silly question alert...... Really simple one to start with. If I have the more or less the default settings for a clearml-agent on a compute node, so therefo...

clearml

3 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Are There Any Tips For How To Set These Boxes In The Profile For Access To

Are there any tips for how to set these boxes in the profile for access to Azure Blob Storage using SAS? I can create a Shared Access Key (SAS) through the A...

clearml

3 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

I Have Setup A

I have setup a clearml-server running on a Azure VM instance and have used default parameters when it comes to specifying storage locations for data and arte...

clearml

3 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hi Everyone, Does Anyone Have Any Pointers On How To Make The Clearml-Server Web Service Secure Using Ssl By Setting Up Nginx? I Have Played Around With It A Bit In Relation To Getting A Jupyterhub Setup Working Over Https, However, I Think That Was Mor

Hi everyone, Does anyone have any pointers on how to make the ClearML-Server web service secure using SSL by setting up NGINX? I have played around with it a...

clearml

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

I Have Got Experiments Training Pytorch Networks On A Remote Compute Run By

I have got experiments training PyTorch networks on a remote compute run by clearml-agent . I am using the Ignite framework to train image classification net...

tensorboard

3 years ago

0 Votes

15 Answers

1K Views

0 Votes 15 Answers 1K Views

I Have Built A Custom Docker Image And Execution Script So That I Can Use Conda As The Package Manager When Installing Python Packages For Job Execution. Everything Is Working Fine In Terms Of Environment Installation, However, On Execution Of The Model T

I have built a custom docker image and execution script so that I can use Conda as the package manager when installing python packages for job execution. Eve...

clearml

3 years ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

I Am Having An Issue Publishing A Completed Model Training. The Model Has Been Deployed On Remote Compute, Using A Docker Image, And The Datasets Have Been Served From An Azure Blob Storage Account. The Model Trains Successfully, And Completes, After The

I am having an issue publishing a completed model training. The model has been deployed on remote compute, using a docker image, and the datasets have been s...

clearml

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

I Was Wondering, If I Want To Use

I was wondering, if I want to use Task.create() instead of Task.init() to create a new experiment object, I am aware that automatic logging will not be done....

clearml

3 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

I buried this issue in another thread to do with deployment, but I was wondering if anyone else has had problems using clearml-serving package to serve a PyT...

clearml

3 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Does Anyone Have An Example Of How To Use The Services Queue To Start A Load Balancer On Azure? Virtual Machine Scale Sets Through The Azure Management Python Sdk Would Be The Way To Do It, I Am Just Wondering If Anyone Has An Example To Share?

Does anyone have an example of how to use the services queue to start a load balancer on Azure? Virtual Machine Scale Sets through the Azure Management Pytho...

clearml

3 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

With

With clearml-serving could someone explain to me what a config.pbtxt file is and its format? When executing a PyTorch model for serving I get an error pasted...

clearml

3 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Question When Using Remote Storage Blobs (E.G. Azure). I Am Using It As A Output_Url Location, And It Is Storing Both Datasets, And Also Experiment Artefacts, Mostly Cnn Weights Files (Pytorch Pt Files).

Question when using remote storage blobs (e.g. Azure). I am using it as a output_url location, and it is storing both datasets, and also experiment artefacts...

azure

3 years ago

0 I Have Setup A

I am bit confused because I can see configuration sections Azure storage in the clearml.conf files, but these are on the client pc and the clearml-agent compute nodes.

So do these parameters have to be set on the clients and compute nodes individually, or is something that can be set on the server?

3 years ago

0 Hi, I Have A Few Questions Regards To

AnxiousSeal95 absolutely agree with you!

When you are put in a situation when a production model has failed, or is not performing how is expected, then if you as a company your deriving revenue off that service, you very quickly have to diagnose what the severity of the problem is, and what is potentially causing it. As you clearly make out, the degrees of freedom which go into why a given model may behave differently include the code itself, the data, the pre-processing steps, the training...

3 years ago

0 I Was Wondering, If I Want To Use

Good question, SuccessfulKoala55

My thoughts are orbiting around environment orchestration and having a bit more control over how an environment is created. I understand that the easiest form of the configuration is to implement it on the clearml-agent side and run a daemon with the configuration as required, whether that be using venv's or docker containers. Of course this limits the deployment type to the queue that the daemon is listening to.

I was considering if that by exposing the...

3 years ago

0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

I have managed to create a docker container from the Triton task, and run it interactive mode, however I get a different set of errors, but I think these are related to command line arguments I used to spin up the docker container, compared to the command used by the clearml orchestration system.

My simplified docker command was: docker run -it --gpus all --ipc=host task_id_2cde61ae8b08463b90c3a0766fffbfe9

However, looking at the Triton inference server object logging, I can see there...

3 years ago

0 Hi, I Have A Few Questions Regards To

EnviousStarfish54 we are at the beginning phases exploring potential solutions to MLops. So I have only been playing with the tools, including the dataset side of things. However, I think that an integral part of capturing a model in its entirety is being able to make sure that you know what into making it. So I see being able to version and difference datasets as just as important as the code, or the environment in which it is run.

3 years ago

0 With

I think I failed in explaining my self, I meant instead of multiple CUDA versions installed on the same host/docker, wouldn't it make sense to just select a different out-of-the-box docker with the right CUDA, directly from the public nvidia dockerhub offering ? (This is just another argument on the Task that you can adjust), wouldn't that be easier for users?

Absolutely aligned with you there AgitatedDove14 . I understood you correctly.
My default is to work with native VM images, a...

3 years ago

0 Hi Everyone, Does Anyone Have Any Pointers On How To Make The Clearml-Server Web Service Secure Using Ssl By Setting Up Nginx? I Have Played Around With It A Bit In Relation To Getting A Jupyterhub Setup Working Over Https, However, I Think That Was Mor

I have changed the configuration file created by Certbot to listen on port 8080 instead of port 80, however, when I restart the NGINX service, I get errors relating to bindings.

server { listen 8080 default_server; listen [::]:8080 ipv6only=on default_server;
Restarting the service results in the following errors:

` ● nginx.service - A high performance web server and a reverse proxy server
Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: ...

3 years ago

0 With

I should say, the company I am working Malvern Panalytical, we are developing an internal MLOps capability, and we are starting to develop a containerized deployment system, for developing, training and deploying machine learning models. Right now we are at the early stages of development, and our current solution is based on using Azure MLOps, which I personally find very clunky.

So I have been tasked with investigating alternatives to replace the training and model deployment side of thing...

3 years ago

0 Hi, I Have A Few Questions Regards To

AnxiousSeal95 , I would also warmly second what EnviousStarfish54 says regarding end to end use cases of real case studies, with a dataset that is more realistic than say MNIST or the like, so it is easier to see how to structure things.

I understand one of the drivers has been flexibility with robustness when you need it, however as a reference point from the people who made it, then examples of how you the creators would structure things would help in our thinking of how we might use it....

3 years ago

0 With

This is very cool, any reason for not using dockers the multiple CUDA versions?

AgitatedDove14 my inexperience in using them a lot until recently. I can see how that is a better solution and it's something I am actively getting trying to improve my understanding of, and use of.
I am now relatively comfortable with producing a Dockerfile for example, although I've not got as far as making any docker-compose related things yet.

3 years ago

0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

When I run the commands above you suggested, if I run them on the compute node but on the host system within conda environment I installed to run the agent daemon from, I get the issues as we appear to have seen when executing the Triton inference service.

` (py38_clearml_serving_git_dev) edmorris@ecm-clearml-compute-gpu-002:~$ python
Python 3.8.10 (default, May 19 2021, 18:05:58)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

...

3 years ago

0 I Have Been Successfully Deploying And Training A Pytorch Cnn On A

SuccessfulKoala55 A second queued job which executed on the same node, but didn't this time need to cache the dataset locally as it was done by the previous experiment, hasn't had this issue.

That all being said, apart from the console reporting looking messy, it doesn't appear to have impacted the training, or indeed the metric collection of the first experiment where it occurred.

3 years ago

0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

AgitatedDove14

Ok, after configuration file huge detour, we are now back to fixing genuine issues here.

To recap, in order to get the Triton container to run and to be able to connect to Azure Blob Storage, the following changes were made to the launch_engine method of the ServingService class:

For the task creation call:

The docker string was changed remove the port specifications [to avoid the port conflicts error]. The addition of packages argument was required, as the doc...

3 years ago

0 Are There Any Tips For How To Set These Boxes In The Profile For Access To

Ah ok, so it's the query string you use with the SAS box. Great.

3 years ago

0 I Have Been Successfully Deploying And Training A Pytorch Cnn On A

SuccessfulKoala55 However, this was the first time an experiment with this dataset was executed on this compute node. I have been doing a lot of trial and error with this setup to get the models training, and so on my first compute node, I had the data downloading locally quite early on, so I haven't seen the script have to download a local dataset cache as it was already done.

3 years ago

0 Hi, I Have A Few Questions Regards To

AnxiousSeal95
I think I can definitely see value in that.

I found that once you go beyond the easy examples, where you are largely using datasets that curated as part of a python package, then it took a bit of effort to get my head around the dataset tools.

Likewise with the deployment side of things, and the Triton inference engine, there are certain aspects of that which I am relatively new to, so to go from the simple Keras example, to getting a feeling that the tool will cover the use ...

3 years ago

0 With

So, AgitatedDove14 what I really like about the approach with ClearML is that you can genuinely bring the architecture into the development process early. That has a lot of desirable outcomes, including versioning and recording of experiments, dataset versioning etc. Also it would enforce a bit more structure in project development, if things are required to fit into a bit more of a defined box (or boxes). However, it also seems to be not too prescriptive, such that I would worry that a lot...

3 years ago

Show more results