Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
VivaciousPenguin66
Moderator
17 Questions, 107 Answers
  Active since 10 January 2023
  Last activity 7 months ago

Reputation

0

Badges 1

93 × Eureka!
0 Votes
18 Answers
1K Views
0 Votes 18 Answers 1K Views
3 years ago
0 Votes
15 Answers
1K Views
0 Votes 15 Answers 1K Views
3 years ago
0 Votes
6 Answers
945 Views
0 Votes 6 Answers 945 Views
I have been successfully deploying and training a PyTorch CNN on a clearml-agent managed compute resource and have been testing some the capabilities, includ...
3 years ago
0 Votes
1 Answers
944 Views
0 Votes 1 Answers 944 Views
Silly question alert...... Really simple one to start with. If I have the more or less the default settings for a clearml-agent on a compute node, so therefo...
3 years ago
0 Votes
5 Answers
922 Views
0 Votes 5 Answers 922 Views
Are there any tips for how to set these boxes in the profile for access to Azure Blob Storage using SAS? I can create a Shared Access Key (SAS) through the A...
3 years ago
0 Votes
2 Answers
896 Views
0 Votes 2 Answers 896 Views
I was wondering, if I want to use Task.create() instead of Task.init() to create a new experiment object, I am aware that automatic logging will not be done....
3 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Does anyone have an example of how to use the services queue to start a load balancer on Azure? Virtual Machine Scale Sets through the Azure Management Pytho...
3 years ago
0 Votes
10 Answers
1K Views
0 Votes 10 Answers 1K Views
When I setup my local virtual environment I use a combination of Conda and pip. I use conda as my environment manager, and then use pip for packages that are...
3 years ago
0 Votes
30 Answers
934 Views
0 Votes 30 Answers 934 Views
I buried this issue in another thread to do with deployment, but I was wondering if anyone else has had problems using clearml-serving package to serve a PyT...
3 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
I have got experiments training PyTorch networks on a remote compute run by clearml-agent . I am using the Ignite framework to train image classification net...
3 years ago
0 Votes
7 Answers
920 Views
0 Votes 7 Answers 920 Views
///[Please note, all the below was executed on the command line of the compute node, not the server head node]/// I've been following the example on Keras, b...
3 years ago
0 Votes
10 Answers
954 Views
0 Votes 10 Answers 954 Views
This wasn't a big deal, but I noticed when pushing a dataset to the server, with cloud storage, that the upload information looked a bit bonkers in terms of ...
3 years ago
0 Votes
1 Answers
1K Views
0 Votes 1 Answers 1K Views
Question when using remote storage blobs (e.g. Azure). I am using it as a output_url location, and it is storing both datasets, and also experiment artefacts...
3 years ago
0 Votes
4 Answers
938 Views
0 Votes 4 Answers 938 Views
I have just installed the PYPI version of clearml-serving and I get the following error at the command line. clearml-serving --help clearml-serving - CLI for...
3 years ago
0 Votes
5 Answers
904 Views
0 Votes 5 Answers 904 Views
I have setup a clearml-server running on a Azure VM instance and have used default parameters when it comes to specifying storage locations for data and arte...
3 years ago
0 Votes
8 Answers
1K Views
0 Votes 8 Answers 1K Views
3 years ago
0 Votes
30 Answers
978 Views
0 Votes 30 Answers 978 Views
With clearml-serving could someone explain to me what a config.pbtxt file is and its format? When executing a PyTorch model for serving I get an error pasted...
3 years ago
0 I Have Been Successfully Deploying And Training A Pytorch Cnn On A

` Starting Task Execution:

usage: train_clearml_pytorch_ignite_caltech_birds.py [-h] [--config FILE]
[--opts ...]

PyTorch Image Classification Trainer - Ed Morris (c) 2021

optional arguments:
-h, --help show this help message and exit
--config FILE Path and name of configuration file for training. Should be a
.yaml file.
--opts ... Modify config options using the command-line 'KEY VALUE'
p...

3 years ago
0 ///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

SuccessfulKoala55
I can see the issue your are referring to regarding the execution of the triton docker image, however as far as I am aware, this was not something I explicitly specified. The ServingService.launch_service() method from the ServingService Class from the clearml-serving package would appear to have both specified:

` def launch_engine(self, queue_name, queue_id=None, verbose=True):
# type: (Optional[str], Optional[str], bool) -> None
"""
...

3 years ago
0 I Have Built A Custom Docker Image And Execution Script So That I Can Use Conda As The Package Manager When Installing Python Packages For Job Execution. Everything Is Working Fine In Terms Of Environment Installation, However, On Execution Of The Model T

I believe the standard shared allocation for a docker container is 64 MB, which is obviously not enough for training deep learning image classification networks, but I am unsure of the best solution to fix the problem.

3 years ago
0 With

Absolutely AgitatedDove14 !

3 years ago
0 I Was Wondering, If I Want To Use

Good question, SuccessfulKoala55

My thoughts are orbiting around environment orchestration and having a bit more control over how an environment is created. I understand that the easiest form of the configuration is to implement it on the clearml-agent side and run a daemon with the configuration as required, whether that be using venv's or docker containers. Of course this limits the deployment type to the queue that the daemon is listening to.

I was considering if that by exposing the...

3 years ago
0 ///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

I have rerun the serving example with my PyTorch job, but this time I have followed the MNIST Keras example.
I appended a GPU compute resource to the default queue and then executed the service on the default queue.
This resulted in a Triton serving engine container spinning up on the compute resource, however it failed due to the previous issue with ports conflicts:

` 2021-06-08 16:28:49
task f2fbb3218e8243be9f6ab37badbb4856 pulled from 2c28e5db27e24f348e1ff06ba93e80c5 by worker ecm-clear...

3 years ago
0 ///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

SuccessfulKoala55 I may have made some progress with this bug, but have stumbled onto another issue in getting the Triton service up and running.

See comments in the github issue.

3 years ago
0 With

AgitatedDove14 I would love to help the project.
I am just about to move house, which is stressful enough without a global pandemic(!), so until that's completed I won't commit to anything. However, once settled in the new place, and I have a bit more time, I would very much welcome contributing.

3 years ago
0 I Have Setup A

AgitatedDove14 Thanks for that.
I suppose the same would need to be done for any client PC running clearml such that you are submitting dataset upload jobs?

That is, the dataset is perhaps local to my laptop, or on a development VM that is not in the clearml system, but I from there I want to submit a copy of a dataset, then I would need to configure the storage section in the same way as well?

I assume the account name and key refers to the storage account credentials that you can f...

3 years ago
0 I Have Just Installed The Pypi Version Of

Was that literally committed just 4 hours ago?

3 years ago
0 I Am Having An Issue Publishing A Completed Model Training. The Model Has Been Deployed On Remote Compute, Using A Docker Image, And The Datasets Have Been Served From An Azure Blob Storage Account. The Model Trains Successfully, And Completes, After The

I checked the apiserver.log file in /opt/clearml/logs and this appears to be the related error when I try to publish an experiment:

` [2021-06-07 13:43:40,239] [9] [ERROR] [clearml.service_repo] ValidationError (Task:8a4a13bad8334d8bb53d7edb61671ba9) (setup_shell_script.StringField only accepts string values: ['container'])
Traceback (most recent call last):
File "/opt/clearml/apiserver/bll/task/task_operations.py", line 325, in publish_task
raise ex
File "/opt/clearml/a...

3 years ago
0 I Am Having An Issue Publishing A Completed Model Training. The Model Has Been Deployed On Remote Compute, Using A Docker Image, And The Datasets Have Been Served From An Azure Blob Storage Account. The Model Trains Successfully, And Completes, After The

SuccessfulKoala55
Good news!
It looks like pulling the new clearml-server version has solved the problem.
I can happily publish models.

Interestingly, I was able to publish models before using this server, so I must have inadvertently updated something that has caused a conflict.

3 years ago
0 ///[Please Note, All The Below Was Executed On The Command Line Of The Compute Node,

This potentially might be a silly question, but in order to get the inference working, I am assuming that no specific inference script has to be written for handling the model?

This is what the clearml-serving package takes care of, correct?

3 years ago
0 I Have Been Successfully Deploying And Training A Pytorch Cnn On A

SuccessfulKoala55 A second queued job which executed on the same node, but didn't this time need to cache the dataset locally as it was done by the previous experiment, hasn't had this issue.

That all being said, apart from the console reporting looking messy, it doesn't appear to have impacted the training, or indeed the metric collection of the first experiment where it occurred.

3 years ago
0 I Have Been Successfully Deploying And Training A Pytorch Cnn On A

This job did download pre-trained weights, so the only difference between them is the local dataset cache.

3 years ago
0 I Am Having An Issue Publishing A Completed Model Training. The Model Has Been Deployed On Remote Compute, Using A Docker Image, And The Datasets Have Been Served From An Azure Blob Storage Account. The Model Trains Successfully, And Completes, After The

Hi SuccessfulKoala55
Thanks for the input.
I was actually about to grab the new docker_compose.yml and pull the new images.
Weirdly it was working before, so what's changed?
I don't believe I've updated the agents or the clearml sdk on the experiment submission vm either.
I will definitely update the server now, and report back.

3 years ago
3 years ago
0 I Have Just Installed The Pypi Version Of

Yup, I can confirm that's the case.
I have just literally installed the latest commit via the master branch and it works.

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

AgitatedDove14

Ok, after configuration file huge detour, we are now back to fixing genuine issues here.

To recap, in order to get the Triton container to run and to be able to connect to Azure Blob Storage, the following changes were made to the launch_engine method of the ServingService class:

For the task creation call:

The docker string was changed remove the port specifications [to avoid the port conflicts error]. The addition of packages argument was required, as the doc...

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

Just another thought, this couldn’t be caused by using a non default location for clearml.conf ?

I have a clearml.conf in the default location which is configured for training agents and I created a separate one for the inference service and put it in a sub folde of my home dir. The agent on the default queue to be used for inference serving was execute using clearml-agent daemon —config-file /path/to/clearml.conf

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

Ok I think I managed to create a docker image of the Triton instance server, just putting the kids to bed, will have a play afterwards.

3 years ago
0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

After finally getting the model to be recognized by the Triton server, it now fails with the attached error messages.
Any ideas AgitatedDove14 ?

3 years ago
Show more results compactanswers