Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
VivaciousPenguin66
Moderator
17 Questions, 107 Answers
  Active since 10 January 2023
  Last activity 6 months ago

Reputation

0

Badges 1

93 × Eureka!
0 Votes
2 Answers
860 Views
0 Votes 2 Answers 860 Views
I was wondering, if I want to use Task.create() instead of Task.init() to create a new experiment object, I am aware that automatic logging will not be done....
3 years ago
0 Votes
1 Answers
915 Views
0 Votes 1 Answers 915 Views
Silly question alert...... Really simple one to start with. If I have the more or less the default settings for a clearml-agent on a compute node, so therefo...
3 years ago
0 Votes
30 Answers
885 Views
0 Votes 30 Answers 885 Views
I buried this issue in another thread to do with deployment, but I was wondering if anyone else has had problems using clearml-serving package to serve a PyT...
3 years ago
0 Votes
7 Answers
876 Views
0 Votes 7 Answers 876 Views
///[Please note, all the below was executed on the command line of the compute node, not the server head node]/// I've been following the example on Keras, b...
3 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Question when using remote storage blobs (e.g. Azure). I am using it as a output_url location, and it is storing both datasets, and also experiment artefacts...
3 years ago
0 Votes
10 Answers
920 Views
0 Votes 10 Answers 920 Views
This wasn't a big deal, but I noticed when pushing a dataset to the server, with cloud storage, that the upload information looked a bit bonkers in terms of ...
3 years ago
0 Votes
8 Answers
1K Views
0 Votes 8 Answers 1K Views
3 years ago
0 Votes
30 Answers
934 Views
0 Votes 30 Answers 934 Views
With clearml-serving could someone explain to me what a config.pbtxt file is and its format? When executing a PyTorch model for serving I get an error pasted...
3 years ago
0 Votes
5 Answers
881 Views
0 Votes 5 Answers 881 Views
Are there any tips for how to set these boxes in the profile for access to Azure Blob Storage using SAS? I can create a Shared Access Key (SAS) through the A...
3 years ago
0 Votes
2 Answers
957 Views
0 Votes 2 Answers 957 Views
I have got experiments training PyTorch networks on a remote compute run by clearml-agent . I am using the Ignite framework to train image classification net...
3 years ago
0 Votes
4 Answers
902 Views
0 Votes 4 Answers 902 Views
I have just installed the PYPI version of clearml-serving and I get the following error at the command line. clearml-serving --help clearml-serving - CLI for...
3 years ago
0 Votes
6 Answers
907 Views
0 Votes 6 Answers 907 Views
I have been successfully deploying and training a PyTorch CNN on a clearml-agent managed compute resource and have been testing some the capabilities, includ...
3 years ago
0 Votes
10 Answers
1K Views
0 Votes 10 Answers 1K Views
When I setup my local virtual environment I use a combination of Conda and pip. I use conda as my environment manager, and then use pip for packages that are...
3 years ago
0 Votes
15 Answers
977 Views
0 Votes 15 Answers 977 Views
3 years ago
0 Votes
18 Answers
1K Views
0 Votes 18 Answers 1K Views
3 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Does anyone have an example of how to use the services queue to start a load balancer on Azure? Virtual Machine Scale Sets through the Azure Management Pytho...
3 years ago
0 Votes
5 Answers
869 Views
0 Votes 5 Answers 869 Views
I have setup a clearml-server running on a Azure VM instance and have used default parameters when it comes to specifying storage locations for data and arte...
3 years ago
0 ///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

SuccessfulKoala55
I can see the issue your are referring to regarding the execution of the triton docker image, however as far as I am aware, this was not something I explicitly specified. The ServingService.launch_service() method from the ServingService Class from the clearml-serving package would appear to have both specified:

` def launch_engine(self, queue_name, queue_id=None, verbose=True):
# type: (Optional[str], Optional[str], bool) -> None
"""
...

3 years ago
0 ///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

SuccessfulKoala55 I may have made some progress with this bug, but have stumbled onto another issue in getting the Triton service up and running.

See comments in the github issue.

3 years ago
0 With

AgitatedDove14 ,

Often a question is asked about a data science project at the beginning, which are like "how long will that take?" or "what are the chances it will work to this accuracy?".

To the uninitiated, these would seem like relatively innocent and easy to answer questions. If a person has a project management background, with more clearly defined technical tasks like software development or mechanical engineering, then often work packages and uncertainties relating to outcomes are m...

3 years ago
0 With

So, AgitatedDove14 what I really like about the approach with ClearML is that you can genuinely bring the architecture into the development process early. That has a lot of desirable outcomes, including versioning and recording of experiments, dataset versioning etc. Also it would enforce a bit more structure in project development, if things are required to fit into a bit more of a defined box (or boxes). However, it also seems to be not too prescriptive, such that I would worry that a lot...

3 years ago
0 With

AgitatedDove14 I would love to help the project.
I am just about to move house, which is stressful enough without a global pandemic(!), so until that's completed I won't commit to anything. However, once settled in the new place, and I have a bit more time, I would very much welcome contributing.

3 years ago
0 With

Absolutely AgitatedDove14 !

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

AgitatedDove14

So can you verify it can download the model ?

Unfortunately it's still falling over, but then I got the same result for the credentials using both URI strings, the original, and the modified version, so it points to something else going on.

I note that the StorageHelper.get() method has a call which modifies the URI prior to it being passed to the function which gets the storage account and container name. However, when I run this locally, it doesn't seem to do a...

3 years ago
0 With

AgitatedDove14 apologies, I read my previous message, I think perhaps it came across as way more passive aggressive than I was intending. Amazing how missing a few words from a sentence can change the entire meaning! 😀

What I meant to say was, it's going to be a busy few months for us whilst we move house, so I didn't want to say I'd contribute and then disappear for two months!

I've been working on a Azure load balancer example, heavily based on the AWS example. The load balanc...

3 years ago
0 With

I think so.
I am doing this with one hand tied behind my back at the moment because I waiting to get an Azure AD App and Services policy setup, to enable the autoscaler to authenticate with the Azure VMSS via the Python SDK.

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

It’s an ignite framework trained PyTorch model using one of the three well known vision model packages, TIMM, PYTORCHCV or TORCHVISION,

3 years ago
0 With

Oh cool!
So when the agent fire up it get's the hostname, which you can then get from the API, and pass it back to take down a specific resource if it is deemed idle?

3 years ago
0 With

What I really like about ClearML is the potential for capturing development at an early stage, as it requires only minimal adjustment of code for it be in the very least captured as an experiment, even if it is run locally on ones machine.

What we would like ideally, is a system where development, training, and deployment are almost one and the same thing, to reduce the lead time from development code to production models. Removing as many translation layers as you can between the developmen...

3 years ago
0 With

AgitatedDove14 that started out a lot shorter, and I read it twice, but I think it answers your question..... 😉

3 years ago
0 With

Oops, forgot this was a forum!

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

When I run the commands above you suggested, if I run them on the compute node but on the host system within conda environment I installed to run the agent daemon from, I get the issues as we appear to have seen when executing the Triton inference service.

` (py38_clearml_serving_git_dev) edmorris@ecm-clearml-compute-gpu-002:~$ python
Python 3.8.10 (default, May 19 2021, 18:05:58)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

...

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

I have managed to create a docker container from the Triton task, and run it interactive mode, however I get a different set of errors, but I think these are related to command line arguments I used to spin up the docker container, compared to the command used by the clearml orchestration system.

My simplified docker command was: docker run -it --gpus all --ipc=host task_id_2cde61ae8b08463b90c3a0766fffbfe9

However, looking at the Triton inference server object logging, I can see there...

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

AgitatedDove14

Ok so I ran both variations and I got the same results.

` >>> from clearml import StorageManager

uri_a = ' Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt'
uri_b = ' Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_...

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

Looking at the _resolve_base_url() method of the StorageHelper class I can see that it is using furl to handle the path splitting for getting at the Azure storage account and container names.

Replicating the commands, the first one to get the Storage Account seems to have worked ok:

f = furl.furl(uri) account_name = f.host.partition(".")[0]Replicating above manually seems to give the same answer for both and it looks correct to me:

` >>> import furl

f_a = furl.fu...

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

I was thinking that I can run on the compute node in the environment that the agent is executed from, but actually it is the environment inside the docker container that the Triton server is executing in.

Could I use the clearml-agent build command and the Triton serving engine task ID to create a docker container that I could then use interactively to run these tests?

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

Crawls out from under the table and takes a deep breath

AgitatedDove14 you remember we talked about it being a bug or a stupid.....

Well, it's a stupid by me.... somehow I managed to propagate irregularities in the clearml.conf file such that it successfully loaded, but the expected nested structure was not there.

When the get_local_copy() method requested the model, it correctly got the azure credentials, however when the StorageHelper class tries to get the azure cr...

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

AgitatedDove14 in this remote session on the compute node, where I am manually importing the clearml sdk, what's the easiest way to confirm that the Azure credentials are being imported correctly?

I assume from our discussions yesterday on the dockers, that when the orchestration agent daemon is run with a given clearml.conf , I can see that the docker run command has various flags being used to pass certain files and environment variables from the host operating system of the co...

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

Just another thought, this couldn’t be caused by using a non default location for clearml.conf ?

I have a clearml.conf in the default location which is configured for training agents and I created a separate one for the inference service and put it in a sub folde of my home dir. The agent on the default queue to be used for inference serving was execute using clearml-agent daemon —config-file /path/to/clearml.conf

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

After finally getting the model to be recognized by the Triton server, it now fails with the attached error messages.
Any ideas AgitatedDove14 ?

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

AgitatedDove14

Ok, after configuration file huge detour, we are now back to fixing genuine issues here.

To recap, in order to get the Triton container to run and to be able to connect to Azure Blob Storage, the following changes were made to the launch_engine method of the ServingService class:

For the task creation call:

The docker string was changed remove the port specifications [to avoid the port conflicts error]. The addition of packages argument was required, as the doc...

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

So moving onto the container name.
Original code has the following calls:

if not f.path.segments: raise ValueError( "URI {} is missing a container name (expected " "[https/azure]://<account-name>.../<container-name>)".format( uri ) ) container = f.path.segments[0]
Repeating the same commands locally results in the following:

` >>> f_a.path.segments
['artefacts', 'Caltech Birds%2FTraining', 'TRAIN...

3 years ago
Show more results compactanswers