AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 [Clearml Serving] Hi Everyone! I Am Trying To Automatically Generate An Online Endpoint For Inference When Manually Adding Tag

Hi @<1636175432829112320:profile|PlainSealion45>

I used this initial model to create the endpoint with

model add

command.

I think that the initial model needs to be added with model auto-aupdate Not with model add
basically do not call model add - this is static, always using the model ID specified (you can deploy new models with manually callign model add on the same endpoint and specifying diffrent model ID , but again manual)

To Automatically have the m...

10 months ago

0 Hello! Since Today I Get

I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)

My thinking is the issue might be on the env file we are passing to conda, I can't find any other diff.
BTW:
@<1523701868901961728:profile|ReassuredTiger98> Can I send a specific wheel with mode debug prints for you to check (basically it will print the conda env YAML it is using)?

3 years ago

0 Good Evening! Can You Please Tell Me If It Is Possible To Set Up Slack Monitoring In Clearml?

but it is not possible to write to a private channel in which the bot is added.

Is this a Slack limitation ?

2 years ago

0 I'M A Little Confused As To How Force_Requirements_Env_Freeze Works When No Requirements File Is Supplied. Is It Supposed To Store The Full Reqs Of The Environment That Calls It?

Correct (basically pip freeze results)

2 years ago

0 Is There A Way To Get Tasks By Hyperparameters Values? When I Use The Search In The Ui I Get The Relevant Task, But When I Try The Following I Get An Empty List:

and of course:
task.set_parameters_as_dict(params)

4 years ago

0 Is There A Way To Get Tasks By Hyperparameters Values? When I Use The Search In The Ui I Get The Relevant Task, But When I Try The Following I Get An Empty List:

Guys FYI:
params = task.get_parameters_as_dict()

4 years ago

0 Is There A Way To Get Tasks By Hyperparameters Values? When I Use The Search In The Ui I Get The Relevant Task, But When I Try The Following I Get An Empty List:

HandsomeCrow5 Ideas on improvement are always welcome 🙂

4 years ago

0 Hi, I Am Using Clearml By Building It As My Own Server. After The Message Below Was Displayed, The Operation Stopped Without Progress. In Clearml Server, It Is In “Running” State. “Clearml.Task - Info - No Repository Found, Storing Script Code Instead”

Does it wok if you remove the Task.init call?

11 months ago

0 Hey Guys, Sorry For The Rapid Fire Questions In The Past Few Days. I Have Another Issue Though. I Initially Ran A Task, Directly From A Repo. It Succesfully Installed The Requirements From The Requirements File In The Repo And Ran The Task Without Any Iss

Okay let me see if I can think of something...
Basically crashing on the assertion here ?
https://github.com/ultralytics/yolov5/blob/d95978a562bec74eed1d42e370235937ab4e1d7a/train.py#L495
Could it be your are passing "Args/resume" True, but not specifying the checkpoint ?
https://github.com/ultralytics/yolov5/blob/d95978a562bec74eed1d42e370235937ab4e1d7a/train.py#L452
I think I know what's going on:
https://github.com/ultralytics/yolov5/blob/d95978a562bec74eed1d42e370235937ab4e1d7a/train...

2 years ago

0 Hi, Can You Help Me Pls, I Got: Environment Setup Completed Successfully Starting Task Execution: Traceback (Most Recent Call Last): File "Agro_Api.Py", Line 13, In From Help_Models.Consts Import Urls Importerror: No Module Named 'Help_Models'

And what is exactly missing from the "installed packages" ? Is "help_models" an additional wheel you have to install ?
Just making sure here, but remember that if your original code did not have a git repo, the only thing that is "copied" to the trains-server is the initial script, so any accompanying scripts will be missing in the trains-agent environment

4 years ago

0 Hello. Recently Installed Packages Behavior Has Been Changed. Previously It Was Following: 1. If Installed Packages Is Empty, Packages Should Be Installed From Requirements.Txt. 2. If Installed Packages Is Not Empty, They Should Be Installed. Now It'S Fol

Hi ItchyJellyfish73
The behavior should not have changed.

"force_repo_requirements_txt" was always a "catch all option" to set a behavior for an agent, but should generally be avoided

That said, I think there was an issue with v1.0 (cleaml-server) where when you cleared the "Installed Packages" it did not actually cleared it, but set it to empty.
It sounds like the issue you are describing.
Could you upgrade the clearml-server and test?

3 years ago

0 How Do I Restart Trains-Agents? How Do I Stop Them?

Maybe something similar to dockers

I like this approach maybe we could add --name as well, so it is easier to name them.
trains-agent daemon stop --gpus all
trains-agent daemon stop --cpu-only
trains-agent daemon stop --gpus 0
What do you think?

Also being able to separate their configurations files would be good (maybe there is and I don't know?)

This is already supported --config-file , see trains-agent --help for details 🙂

4 years ago

0 Hi Everyone, I Have Questions Related To Clearml-Serving.

EmbarrassedPeacock82 are you using keras/pytorch etc for serving (i.e. Triton) ?

2 years ago

0 I’M Trying To Use

yey 🙂 notice that when executed by the agent the call execute_remotely is skipped, and so does the If statement I added (since running_locally will return False when the process is executed by the agent)

3 years ago

0 Hi, I Noted That Clearml-Serving Does Not Support Spacy Models Out Of The Box And That Clearml-Serving Only Supports Following;

2 and 3 - I want to manage access control over the RestAPI

Long story short, put a load-balancer in front of the entire thing (see the k8s setup), and have the load-balancer verify JWT token as authentication (this is usually the easiest)

1- Exactly, custom code

Yes, we need to add a custom example there (somehow forgotten)
Could you open an Issue for that?
in the meantime:
` #

Preprocess class Must be named "Preprocess"

No need to inherit or to implement all methods

lass P...

2 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

Merged 🙂

4 years ago

0 Hi, Can I Run An

RoundMosquito25 this is a good point, I mean in theory it could be done, the question is the actual Bayesian optimization you are using.
Is it optuna (OptimizerOptuna) or OptimizerBOHB?

one year ago

0 Hello, Everyone! I Have A Question Regarding Clearml Features. We Run Into The Situation When Some Of The Agents That Are Working On A Hpo Die Due To Variable Reasons. Some Workers Go Offline Or Resources Need Temporarily Be Detached For Other Needs. Thu

The main reason we need the above mentioned functionality is because there are some experiments that need to run for a long time. Let's say weeks.

Good point!

. We need to temporarily pause(kill or something else) running HPO task and reassign the resource for other needs.

Oh I see now....

Later, when more important experiments has been completed, we can continue HPO task from the same state.

Quick question when you say the HPO Task, you mean the HPO controller logic Task...

one year ago

0 Hey Guys, I Believe

Bake to the error:

clearml_agent: ERROR: Failed getting token (error 401 from

): Unauthorized (invalid credentials) (failed to locate provided credentials)

See here:
https://github.com/allegroai/clearml-server/blob/3f2b96266bc51bfce680bd759c7fa9d635ae36d3/docker/docker-compose.yml#L131
You need to provide an access key so it can actually "talk" to the server next to it.

2 years ago

0 I Have An On-Prem/Free Clearml-Server Setup With Custom S3 Back-End Storage. I'M Trying Out The Clearml-Serving Capability And Not Sure What'S Failing. When I Start The Serving Containers It Can'T Retrieve The Model:

When I start the serving containers it can't retrieve the model:

Hi BrightRabbit75
I think you need to pass the credentials for your S3 account to the clearml-serving containers
Basically just add AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY to your docker compose:

https://github.com/allegroai/clearml-serving/blob/4b52103636bc7430d4a6666ee85fd126fcb49e2e/docker/docker-compose-triton-gpu.yml#L110
https://github.com/allegroai/clearml-serving/blob/4b52103636bc7430d4a6666e...

2 years ago

0 Hi Folks, Is It Possible To Use An Aws P3 Instance (Which As Several Gpus) With One Agent Per Gpu, All Controlled Through Clearml Aws Autoscheduler? So Clearml Aws Autoscheduler Would Know In Advance How Much Agents To Start In The Instances (Can Be An Op

WDYT?

3 years ago

0 Hi I Came Across Some Inconsistency In The Iteration Reporting In The Clearml With Pytorch-Lightning When Calling Trainer.Fit Multiple Times, Before I Dive In I Wondered If There Is A Known Issue Related To This?

Thanks RipeGoose2 !

clearml logging starts from n+n (thats how it seems) for non explicit

I have to say it looks like the expected behavior , I think.
Basically matching the TB, no?

3 years ago

0 Hi Everyone! Quick Question: I Have A Script That Allows The Model To Be Saved Out In Case Of An Early Exit. At The Moment The Script Is Catching The Sigint And Sigterm Signals, Ending The Training And Writing Out The Model. I Understand I Could Use Check

Many thanks 🙂

4 years ago

0 Hi All, There Is A Way To Get From A Task-Object The Experiment Source Code? In Other Words, Assume I Have Access To A Specific Trains Server And Want To Store From A Particular Task The Experiment Source Code In A Temp File. There Is A Convenient Way To

It should be under script.diff:
'script': {'binary': '', 'repository': '', 'tag': '', 'branch': '', 'version_num': '', 'entry_point': '', 'working_dir': '', 'requirements': {'pip': ''}, 'diff': ''}For some reason this is empty in your case, are you seeing it in the UI?
If you are querying the current task (i.e. running) it might not be there yet.
You can call this internal function that returns only after the repo detection is done.
task._wait_for_repo_detection()

3 years ago

0 Hi Team! Is There A Way To Make Clearml’S Aws Autoscaler And Queues Resource-Aware Please? I.E. If We Can Say, As We Enqueue Our Job, How Much Ram Or Gpu-Ram Or Even Gpus It Needs, Have The Scheduler/Autoscaler Dispatch The Job To Instances That Are Of Th

Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance)

This is essentially a "queue". Basically a queue is a way to abstract a specific type of resource, so that you can achieve exactly what you descibed.

open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake).

Yes, that's exactly how clearml is designed, a...

one year ago

0 Hello, I'M Not Getting Training Metrics Tracked By Clearml When I Execute The A Training Script Remotely, But I Get Them If I Run Locally. Is It Because I Have A Task.Init() In The File? What Happens When You Remotely Run A Script Which Has An Init() In I

IrritableOwl63 in the profile page, look at the bottom right corner

3 years ago

0 With

So if you set it, then all nodes will be provisioned with the same execution script.

This is okay in a way, since the actual "agent ID" is by default set based on the machine hostname, which I assume is unique ?

3 years ago

0 Pytorch Lightning Question About Logging A Figure. I Have The Following Code:

I mean using Trains:
Logger.current_logger().report_confusion_matrix(...)

3 years ago

0 Hi Fam! Sorry For The Potential Dumb Question, But I Couldn’T Find Anything On The Interwebs About It. I’M Hosting A Clearml Server On Aws, Using S3 As A Backend For Artifact Storage. I Find That Whenever I Delete Archived Artifacts In The Web App, I Get

. Would you have any suggestions about where I could look to debug? Maybe the docker logs of the web server?

Let me check, we had the same issue reported today, Let me double check with front-end people and get back to you

2 years ago

0 With

Dm me 🙂

3 years ago

Show more results