Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
VivaciousPenguin66
Moderator
17 Questions, 106 Answers
  Active since 10 January 2023
  Last activity 8 months ago

Reputation

0

Badges 1

92 × Eureka!
0 Votes
1 Answers
202 Views
0 Votes 1 Answers 202 Views
Silly question alert...... Really simple one to start with. If I have the more or less the default settings for a clearml-agent on a compute node, so therefo...
2 years ago
0 Votes
7 Answers
190 Views
0 Votes 7 Answers 190 Views
///[Please note, all the below was executed on the command line of the compute node, not the server head node]/// I've been following the example on Keras, b...
2 years ago
0 Votes
2 Answers
183 Views
0 Votes 2 Answers 183 Views
I was wondering, if I want to use Task.create() instead of Task.init() to create a new experiment object, I am aware that automatic logging will not be done....
2 years ago
0 Votes
4 Answers
209 Views
0 Votes 4 Answers 209 Views
I have just installed the PYPI version of clearml-serving and I get the following error at the command line. clearml-serving --help clearml-serving - CLI for...
2 years ago
0 Votes
10 Answers
240 Views
0 Votes 10 Answers 240 Views
When I setup my local virtual environment I use a combination of Conda and pip. I use conda as my environment manager, and then use pip for packages that are...
2 years ago
0 Votes
4 Answers
196 Views
0 Votes 4 Answers 196 Views
Are there any tips for how to set these boxes in the profile for access to Azure Blob Storage using SAS? I can create a Shared Access Key (SAS) through the A...
2 years ago
0 Votes
1 Answers
261 Views
0 Votes 1 Answers 261 Views
Question when using remote storage blobs (e.g. Azure). I am using it as a output_url location, and it is storing both datasets, and also experiment artefacts...
2 years ago
0 Votes
15 Answers
205 Views
0 Votes 15 Answers 205 Views
2 years ago
0 Votes
1 Answers
232 Views
0 Votes 1 Answers 232 Views
Does anyone have an example of how to use the services queue to start a load balancer on Azure? Virtual Machine Scale Sets through the Azure Management Pytho...
2 years ago
0 Votes
2 Answers
235 Views
0 Votes 2 Answers 235 Views
I have got experiments training PyTorch networks on a remote compute run by clearml-agent . I am using the Ignite framework to train image classification net...
2 years ago
0 Votes
6 Answers
181 Views
0 Votes 6 Answers 181 Views
I have been successfully deploying and training a PyTorch CNN on a clearml-agent managed compute resource and have been testing some the capabilities, includ...
2 years ago
0 Votes
8 Answers
256 Views
0 Votes 8 Answers 256 Views
2 years ago
0 Votes
10 Answers
219 Views
0 Votes 10 Answers 219 Views
This wasn't a big deal, but I noticed when pushing a dataset to the server, with cloud storage, that the upload information looked a bit bonkers in terms of ...
2 years ago
0 Votes
30 Answers
192 Views
0 Votes 30 Answers 192 Views
With clearml-serving could someone explain to me what a config.pbtxt file is and its format? When executing a PyTorch model for serving I get an error pasted...
2 years ago
0 Votes
30 Answers
189 Views
0 Votes 30 Answers 189 Views
I buried this issue in another thread to do with deployment, but I was wondering if anyone else has had problems using clearml-serving package to serve a PyT...
2 years ago
0 Votes
17 Answers
324 Views
0 Votes 17 Answers 324 Views
2 years ago
0 Votes
5 Answers
185 Views
0 Votes 5 Answers 185 Views
I have setup a clearml-server running on a Azure VM instance and have used default parameters when it comes to specifying storage locations for data and arte...
2 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

Just another thought, this couldn’t be caused by using a non default location for clearml.conf ?

I have a clearml.conf in the default location which is configured for training agents and I created a separate one for the inference service and put it in a sub folde of my home dir. The agent on the default queue to be used for inference serving was execute using clearml-agent daemon —config-file /path/to/clearml.conf

2 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

AgitatedDove14

Ok so I ran both variations and I got the same results.

` >>> from clearml import StorageManager

uri_a = ' Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt'
uri_b = ' Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_...

2 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

It’s an ignite framework trained PyTorch model using one of the three well known vision model packages, TIMM, PYTORCHCV or TORCHVISION,

2 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

AgitatedDove14 Ok I can do that.
I was just thinking it through.
Would this be best if it were executed in the Triton execution environment?

2 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

Ok I think I managed to create a docker image of the Triton instance server, just putting the kids to bed, will have a play afterwards.

2 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

So moving onto the container name.
Original code has the following calls:

if not f.path.segments: raise ValueError( "URI {} is missing a container name (expected " "[https/azure]://<account-name>.../<container-name>)".format( uri ) ) container = f.path.segments[0]
Repeating the same commands locally results in the following:

` >>> f_a.path.segments
['artefacts', 'Caltech Birds%2FTraining', 'TRAIN...

2 years ago
0 Are There Any Tips For How To Set These Boxes In The Profile For Access To

Ah ok, so it's the query string you use with the SAS box. Great.

2 years ago
0 ///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

SuccessfulKoala55
I can see the issue your are referring to regarding the execution of the triton docker image, however as far as I am aware, this was not something I explicitly specified. The ServingService.launch_service() method from the ServingService Class from the clearml-serving package would appear to have both specified:

` def launch_engine(self, queue_name, queue_id=None, verbose=True):
# type: (Optional[str], Optional[str], bool) -> None
"""
...

2 years ago
0 ///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

SuccessfulKoala55 I may have made some progress with this bug, but have stumbled onto another issue in getting the Triton service up and running.

See comments in the github issue.

2 years ago
0 ///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

I have rerun the serving example with my PyTorch job, but this time I have followed the MNIST Keras example.
I appended a GPU compute resource to the default queue and then executed the service on the default queue.
This resulted in a Triton serving engine container spinning up on the compute resource, however it failed due to the previous issue with ports conflicts:

` 2021-06-08 16:28:49
task f2fbb3218e8243be9f6ab37badbb4856 pulled from 2c28e5db27e24f348e1ff06ba93e80c5 by worker ecm-clear...

2 years ago
0 Are There Any Tips For How To Set These Boxes In The Profile For Access To

Thanks CostlyOstrich36 , you can also get access to the keys in the Azure Storage Explorer.
Looking at the Properties section gives the secure keys.

2 years ago
0 I Am Having An Issue Publishing A Completed Model Training. The Model Has Been Deployed On Remote Compute, Using A Docker Image, And The Datasets Have Been Served From An Azure Blob Storage Account. The Model Trains Successfully, And Completes, After The

Hi SuccessfulKoala55
Thanks for the input.
I was actually about to grab the new docker_compose.yml and pull the new images.
Weirdly it was working before, so what's changed?
I don't believe I've updated the agents or the clearml sdk on the experiment submission vm either.
I will definitely update the server now, and report back.

2 years ago
0 Greetings And Hello

I love the new design of the site.

When is clearml-deploy coming to the open source release?
Or is this a commercial only part?

2 years ago
0 With

AgitatedDove14 apologies, I read my previous message, I think perhaps it came across as way more passive aggressive than I was intending. Amazing how missing a few words from a sentence can change the entire meaning! 😀

What I meant to say was, it's going to be a busy few months for us whilst we move house, so I didn't want to say I'd contribute and then disappear for two months!

I've been working on a Azure load balancer example, heavily based on the AWS example. The load balanc...

2 years ago
0 With

I think so.
I am doing this with one hand tied behind my back at the moment because I waiting to get an Azure AD App and Services policy setup, to enable the autoscaler to authenticate with the Azure VMSS via the Python SDK.

2 years ago
0 With

What I really like about ClearML is the potential for capturing development at an early stage, as it requires only minimal adjustment of code for it be in the very least captured as an experiment, even if it is run locally on ones machine.

What we would like ideally, is a system where development, training, and deployment are almost one and the same thing, to reduce the lead time from development code to production models. Removing as many translation layers as you can between the developmen...

2 years ago
0 With

So I've been testing bits and pieces individually.
For example, I made a custom image for the VMSS nodes, which is based on Ubuntu and has multiple CUDA versions installed, as well as conda and docker pre-installed.
I'm managed to test the setup script, so that it executes on a pristine node, and results in a compute node being added to the relevant queue, but that's been executed manually by me, as I have the credentials to log on via SSH.
And I had to do things get the clearml-server the ma...

2 years ago
0 With

This is very cool, any reason for not using dockers the multiple CUDA versions?

AgitatedDove14 my inexperience in using them a lot until recently. I can see how that is a better solution and it's something I am actively getting trying to improve my understanding of, and use of.
I am now relatively comfortable with producing a Dockerfile for example, although I've not got as far as making any docker-compose related things yet.

2 years ago
0 With

So, AgitatedDove14 what I really like about the approach with ClearML is that you can genuinely bring the architecture into the development process early. That has a lot of desirable outcomes, including versioning and recording of experiments, dataset versioning etc. Also it would enforce a bit more structure in project development, if things are required to fit into a bit more of a defined box (or boxes). However, it also seems to be not too prescriptive, such that I would worry that a lot...

2 years ago
0 With

I think I failed in explaining my self, I meant instead of multiple CUDA versions installed on the same host/docker, wouldn't it make sense to just select a different out-of-the-box docker with the right CUDA, directly from the public nvidia dockerhub offering ? (This is just another argument on the Task that you can adjust), wouldn't that be easier for users?

Absolutely aligned with you there AgitatedDove14 . I understood you correctly.
My default is to work with native VM images, a...

2 years ago
0 With

Absolutely AgitatedDove14 !

2 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

AgitatedDove14

Ok, after configuration file huge detour, we are now back to fixing genuine issues here.

To recap, in order to get the Triton container to run and to be able to connect to Azure Blob Storage, the following changes were made to the launch_engine method of the ServingService class:

For the task creation call:

The docker string was changed remove the port specifications [to avoid the port conflicts error]. The addition of packages argument was required, as the doc...

2 years ago
0 I Have Been Successfully Deploying And Training A Pytorch Cnn On A

SuccessfulKoala55 A second queued job which executed on the same node, but didn't this time need to cache the dataset locally as it was done by the previous experiment, hasn't had this issue.

That all being said, apart from the console reporting looking messy, it doesn't appear to have impacted the training, or indeed the metric collection of the first experiment where it occurred.

2 years ago
0 I Have Been Successfully Deploying And Training A Pytorch Cnn On A

SuccessfulKoala55 However, this was the first time an experiment with this dataset was executed on this compute node. I have been doing a lot of trial and error with this setup to get the models training, and so on my first compute node, I had the data downloading locally quite early on, so I haven't seen the script have to download a local dataset cache as it was already done.

2 years ago
0 I Have Been Successfully Deploying And Training A Pytorch Cnn On A

` Starting Task Execution:

usage: train_clearml_pytorch_ignite_caltech_birds.py [-h] [--config FILE]
[--opts ...]

PyTorch Image Classification Trainer - Ed Morris (c) 2021

optional arguments:
-h, --help show this help message and exit
--config FILE Path and name of configuration file for training. Should be a
.yaml file.
--opts ... Modify config options using the command-line 'KEY VALUE'
p...

2 years ago
0 Hi, I Have A Few Questions Regards To

AnxiousSeal95 absolutely agree with you!

When you are put in a situation when a production model has failed, or is not performing how is expected, then if you as a company your deriving revenue off that service, you very quickly have to diagnose what the severity of the problem is, and what is potentially causing it. As you clearly make out, the degrees of freedom which go into why a given model may behave differently include the code itself, the data, the pre-processing steps, the training...

2 years ago
Show more results compactanswers