VivaciousPenguin66

17 Questions, 107 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

93 × Eureka!

Questions 17
Answers 107

0 Votes

15 Answers

2K Views

0 Votes 15 Answers 2K Views

I Have Built A Custom Docker Image And Execution Script So That I Can Use Conda As The Package Manager When Installing Python Packages For Job Execution. Everything Is Working Fine In Terms Of Environment Installation, However, On Execution Of The Model T

I have built a custom docker image and execution script so that I can use Conda as the package manager when installing python packages for job execution. Eve...

clearml

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Does Anyone Have An Example Of How To Use The Services Queue To Start A Load Balancer On Azure? Virtual Machine Scale Sets Through The Azure Management Python Sdk Would Be The Way To Do It, I Am Just Wondering If Anyone Has An Example To Share?

Does anyone have an example of how to use the services queue to start a load balancer on Azure? Virtual Machine Scale Sets through the Azure Management Pytho...

clearml

4 years ago

0 Votes

18 Answers

3K Views

0 Votes 18 Answers 3K Views

Hi Everyone, Does Anyone Have Any Pointers On How To Make The Clearml-Server Web Service Secure Using Ssl By Setting Up Nginx? I Have Played Around With It A Bit In Relation To Getting A Jupyterhub Setup Working Over Https, However, I Think That Was Mor

Hi everyone, Does anyone have any pointers on how to make the ClearML-Server web service secure using SSL by setting up NGINX? I have played around with it a...

clearml

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

I Have Just Installed The Pypi Version Of

I have just installed the PYPI version of clearml-serving and I get the following error at the command line. clearml-serving --help clearml-serving - CLI for...

clearml

4 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

This Wasn'T A Big Deal, But I Noticed When Pushing A Dataset To The Server, With Cloud Storage, That The Upload Information Looked A Bit Bonkers In Terms Of Units:

This wasn't a big deal, but I noticed when pushing a dataset to the server, with cloud storage, that the upload information looked a bit bonkers in terms of ...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

I Was Wondering, If I Want To Use

I was wondering, if I want to use Task.create() instead of Task.init() to create a new experiment object, I am aware that automatic logging will not be done....

clearml

4 years ago

0 Votes

1 Answers

3K Views

0 Votes 1 Answers 3K Views

Question When Using Remote Storage Blobs (E.G. Azure). I Am Using It As A Output_Url Location, And It Is Storing Both Datasets, And Also Experiment Artefacts, Mostly Cnn Weights Files (Pytorch Pt Files).

Question when using remote storage blobs (e.g. Azure). I am using it as a output_url location, and it is storing both datasets, and also experiment artefacts...

azure

4 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

I Have Been Successfully Deploying And Training A Pytorch Cnn On A

I have been successfully deploying and training a PyTorch CNN on a clearml-agent managed compute resource and have been testing some the capabilities, includ...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

I Have Got Experiments Training Pytorch Networks On A Remote Compute Run By

I have got experiments training PyTorch networks on a remote compute run by clearml-agent . I am using the Ignite framework to train image classification net...

tensorboard

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

I buried this issue in another thread to do with deployment, but I was wondering if anyone else has had problems using clearml-serving package to serve a PyT...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Are There Any Tips For How To Set These Boxes In The Profile For Access To

Are there any tips for how to set these boxes in the profile for access to Azure Blob Storage using SAS? I can create a Shared Access Key (SAS) through the A...

clearml

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

With

With clearml-serving could someone explain to me what a config.pbtxt file is and its format? When executing a PyTorch model for serving I get an error pasted...

clearml

4 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

I Am Having An Issue Publishing A Completed Model Training. The Model Has Been Deployed On Remote Compute, Using A Docker Image, And The Datasets Have Been Served From An Azure Blob Storage Account. The Model Trains Successfully, And Completes, After The

I am having an issue publishing a completed model training. The model has been deployed on remote compute, using a docker image, and the datasets have been s...

clearml

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

///[Please note, all the below was executed on the command line of the compute node, not the server head node]/// I've been following the example on Keras, b...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

I Have Setup A

I have setup a clearml-server running on a Azure VM instance and have used default parameters when it comes to specifying storage locations for data and arte...

clearml

4 years ago

0 Votes

10 Answers

3K Views

0 Votes 10 Answers 3K Views

When I Setup My Local Virtual Environment I Use A Combination Of Conda And Pip. I Use Conda As My Environment Manager, And Then Use Pip For Packages That Are Not In The Conda Repositories.

When I setup my local virtual environment I use a combination of Conda and pip. I use conda as my environment manager, and then use pip for packages that are...

azure

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Silly Question Alert...... Really Simple One To Start With. If I Have The More Or Less The Default Settings For A

Silly question alert...... Really simple one to start with. If I have the more or less the default settings for a clearml-agent on a compute node, so therefo...

clearml

4 years ago

0 I Have Been Successfully Deploying And Training A Pytorch Cnn On A

SuccessfulKoala55 However, this was the first time an experiment with this dataset was executed on this compute node. I have been doing a lot of trial and error with this setup to get the models training, and so on my first compute node, I had the data downloading locally quite early on, so I haven't seen the script have to download a local dataset cache as it was already done.

4 years ago

0 I Am Having An Issue Publishing A Completed Model Training. The Model Has Been Deployed On Remote Compute, Using A Docker Image, And The Datasets Have Been Served From An Azure Blob Storage Account. The Model Trains Successfully, And Completes, After The

Hi SuccessfulKoala55
Thanks for the input.
I was actually about to grab the new docker_compose.yml and pull the new images.
Weirdly it was working before, so what's changed?
I don't believe I've updated the agents or the clearml sdk on the experiment submission vm either.
I will definitely update the server now, and report back.

4 years ago

0 Hi Everyone, Does Anyone Have Any Pointers On How To Make The Clearml-Server Web Service Secure Using Ssl By Setting Up Nginx? I Have Played Around With It A Bit In Relation To Getting A Jupyterhub Setup Working Over Https, However, I Think That Was Mor

Understood.
SuccessfulKoala55 I point you to my disclaimer above......😬

4 years ago

0 I Have Been Successfully Deploying And Training A Pytorch Cnn On A

I've not seen this before.

4 years ago

0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

Crawls out from under the table and takes a deep breath

AgitatedDove14 you remember we talked about it being a bug or a stupid.....

Well, it's a stupid by me.... somehow I managed to propagate irregularities in the clearml.conf file such that it successfully loaded, but the expected nested structure was not there.

When the get_local_copy() method requested the model, it correctly got the azure credentials, however when the StorageHelper class tries to get the azure cr...

4 years ago

0 ///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

This potentially might be a silly question, but in order to get the inference working, I am assuming that no specific inference script has to be written for handling the model?

This is what the clearml-serving package takes care of, correct?

4 years ago

0 I Have Built A Custom Docker Image And Execution Script So That I Can Use Conda As The Package Manager When Installing Python Packages For Job Execution. Everything Is Working Fine In Terms Of Environment Installation, However, On Execution Of The Model T

I'll just take a screenshot from my companies daily standup of data scientists and software developers..... that'll be enough!

4 years ago

0 Are There Any Tips For How To Set These Boxes In The Profile For Access To

Thanks CostlyOstrich36 , you can also get access to the keys in the Azure Storage Explorer.
Looking at the Properties section gives the secure keys.

4 years ago

0 ///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

I have rerun the serving example with my PyTorch job, but this time I have followed the MNIST Keras example.
I appended a GPU compute resource to the default queue and then executed the service on the default queue.
This resulted in a Triton serving engine container spinning up on the compute resource, however it failed due to the previous issue with ports conflicts:

` 2021-06-08 16:28:49
task f2fbb3218e8243be9f6ab37badbb4856 pulled from 2c28e5db27e24f348e1ff06ba93e80c5 by worker ecm-clear...

4 years ago

SuccessfulKoala55 New issue on securing server ports opened on clearml-server repo.

https://github.com/allegroai/clearml-server/issues/78

4 years ago

If I did that, I am pretty sure that's the last thing I'd ever do...... 🤣

4 years ago

0 This Wasn'T A Big Deal, But I Noticed When Pushing A Dataset To The Server, With Cloud Storage, That The Upload Information Looked A Bit Bonkers In Terms Of Units:

Hmmmm, I thought it logged it with the terminal results when it was uploading weights, but perhaps that's only the live version and the saved version is pruned? Or my memory is wrong.... it is Friday after all!
Can't find anymore reference to it, sorry.

4 years ago

0 ///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

SuccessfulKoala55
I can see the issue your are referring to regarding the execution of the triton docker image, however as far as I am aware, this was not something I explicitly specified. The ServingService.launch_service() method from the ServingService Class from the clearml-serving package would appear to have both specified:

` def launch_engine(self, queue_name, queue_id=None, verbose=True):
# type: (Optional[str], Optional[str], bool) -> None
"""
...

4 years ago

I checked the apiserver.log file in /opt/clearml/logs and this appears to be the related error when I try to publish an experiment:

` [2021-06-07 13:43:40,239] [9] [ERROR] [clearml.service_repo] ValidationError (Task:8a4a13bad8334d8bb53d7edb61671ba9) (setup_shell_script.StringField only accepts string values: ['container'])
Traceback (most recent call last):
File "/opt/clearml/apiserver/bll/task/task_operations.py", line 325, in publish_task
raise ex
File "/opt/clearml/a...

4 years ago

0 Greetings And Hello

I love the new design of the site.

When is clearml-deploy coming to the open source release?
Or is this a commercial only part?

4 years ago

0 I Was Wondering, If I Want To Use

Good question, SuccessfulKoala55

My thoughts are orbiting around environment orchestration and having a bit more control over how an environment is created. I understand that the easiest form of the configuration is to implement it on the clearml-agent side and run a daemon with the configuration as required, whether that be using venv's or docker containers. Of course this limits the deployment type to the queue that the daemon is listening to.

I was considering if that by exposing the...

4 years ago

0 With

AgitatedDove14 apologies, I read my previous message, I think perhaps it came across as way more passive aggressive than I was intending. Amazing how missing a few words from a sentence can change the entire meaning! 😀

What I meant to say was, it's going to be a busy few months for us whilst we move house, so I didn't want to say I'd contribute and then disappear for two months!

I've been working on a Azure load balancer example, heavily based on the AWS example. The load balanc...

4 years ago

0 With

So I've been testing bits and pieces individually.
For example, I made a custom image for the VMSS nodes, which is based on Ubuntu and has multiple CUDA versions installed, as well as conda and docker pre-installed.
I'm managed to test the setup script, so that it executes on a pristine node, and results in a compute node being added to the relevant queue, but that's been executed manually by me, as I have the credentials to log on via SSH.
And I had to do things get the clearml-server the ma...

4 years ago

0 With

AgitatedDove14 I think the major issue is working out how to get the setup of the node dynamically passed to the VMSS so when it creates a node it does the following:

Provisions the correct environment for the clearml-agent . Installs the clearml-agent and sets up the clearml.conf file with the access credentials for the server and file storage. Executes the clearml-agent on the correct queue, ready for accepting jobs.
In Azure VMSS, there is a method called "Cust...

4 years ago

In my case it's a Tesla P40, which has 24 GB VRAM.

4 years ago

This appears to confirm it as well.

https://github.com/pytorch/pytorch/issues/1158

Thanks AgitatedDove14 , you're very helpful.

4 years ago

0 With

Oops, forgot this was a forum!

4 years ago

Pffff security.

Data scientist be like....... 😀

Network infrastructure person be like ...... 😱

4 years ago

I believe the standard shared allocation for a docker container is 64 MB, which is obviously not enough for training deep learning image classification networks, but I am unsure of the best solution to fix the problem.

4 years ago

0 ///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

SuccessfulKoala55 I may have made some progress with this bug, but have stumbled onto another issue in getting the Triton service up and running.

See comments in the github issue.

4 years ago

0 Hello Clearml Friends. I'M Trying To Setup A Clearml Agent On My Workstation To Queue Jobs On My Gpu.

I think perhaps as standard, the group docker is already created.

The bit that isn't done is making your user part of that group.

4 years ago

0 I Have Got Experiments Training Pytorch Networks On A Remote Compute Run By

AgitatedDove14 Brilliant!
I will try this, thank you sir!

4 years ago

WearyLeopard29 no I wasn’t able to do that although I didn’t explicitly try.
I was wondering if this was as a high a security risk then the web portal?
Access is controlled by keys, whereas the web portal is not.
I admit I’m a data scientist, so any proper IT security person would probably end up a shivering wreck in the corner of the room if they saw some of my common security practises. I do try to be secure, but I am not sure how good I am at it.

4 years ago

0 With

So, AgitatedDove14 what I really like about the approach with ClearML is that you can genuinely bring the architecture into the development process early. That has a lot of desirable outcomes, including versioning and recording of experiments, dataset versioning etc. Also it would enforce a bit more structure in project development, if things are required to fit into a bit more of a defined box (or boxes). However, it also seems to be not too prescriptive, such that I would worry that a lot...

4 years ago

0 With

AgitatedDove14 ,

Often a question is asked about a data science project at the beginning, which are like "how long will that take?" or "what are the chances it will work to this accuracy?".

To the uninitiated, these would seem like relatively innocent and easy to answer questions. If a person has a project management background, with more clearly defined technical tasks like software development or mechanical engineering, then often work packages and uncertainties relating to outcomes are m...

4 years ago

Show more results