AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 8 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8051

0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

Hmm, #790 should be solved in 1.7.2
Yes, I always see the "model uploaded completed" for such stuck tasksAny chance this is reproducible ?
How many processes do you see running (i.e. ps -Af | grep python) ?
What is the training framework? is it multiprocess ? how are you launching the process itself? is it Linux OS? is it running inside a specific container ?

2 years ago

0 Is There A Quicker Way To Abort All Running Experiments In A Project? I Have Over A Thousand Running Anonymous Data Tasks In A Specific Project And I Want To Abort Them Before Debugging Them.

Is there a quicker way to abort all running experiments in a project? I have over a thousand running anonymous data tasks in a specific project and I want to abort them before debugging them.

We are adding "select" all in the next UI version to do that as quickly as possible 🙂

3 years ago

0 How Do People Solve This? If I Am Pip Installing A Custom Package From .Tar.Gz, How Can I Ensure That If I Run The Experiment (Initially Run From A Notebook) Via The Queueing It Can Be Properly Installed Steps - Notebook -> Get A Tar.Gz From S3 -> Pip I

What's the pip verison?

3 years ago

0 How Do I Think About Tasks/Task_Name-S? Do I See Right If I Run The Same Task With The Same Name, It Overwrites The Previous Run? Is It Possible To Fail If The Task Already Exists And Need

it overwrites the previous run?

It will overwrite the previous if
Under 72h from last execution no artifact/model was createdYou can control it with "reuse_last_task_id=False" passed to Task.init
Task name itself is Not unique in the system, think of it as short description
Make sense ?

2 years ago

0 Can Someone Help Me With Deploying This Example Model (From Triton Inference Server) Deployed In Clearml-Serving? Too Many Random Errors For Me To Figure It Out

Should I use

update_weights_package

Yes
BTW, config.pbtxt you should pass when "registering" the endpoint with the CLI

3 years ago

0 Hey, Could You Help Me? I’Ve Tried Update Clearml-Server In K8S Old And New Clearml In The Different Namespaces, But After Migrate I Got The Error Error 101 : Inconsistent Data Encountered In Document: Document=Output, Field=Model How It Fix?

Can you share the modified help/yaml ?
Did you run any specific migration script after the upgrade ?
How many apiserver instances do you have ?
How did you configure the elastic container? is it booting?

3 years ago

0 Hey, I'M Trying To Run The Aws Autoscaler And Pull A Docker Image From Ecr (Private Repository). I'M Currently Getting The Error:

Then you have to pass the .ssh into the remote server, probably the easiest is to have it in the "extra bash script"

3 years ago

0 Hey Since Hydra Does Not Work With

TrickyFox41 are you saying that if you add Task.init inthe code it works, but when you are calling "clearml-task" it does not work? (in both cases editing the Args/overrides ?

2 years ago

0 Hi All, I Am Running Into Ssl Verification Issues With Trying To Upload Model Artifacts To Minio. We Are Running The Clearml Agent In A Container, Have Mounted A Ca Bundle To The Container And Referenced It On Env Vars So That Aws Cli/Boto And Requests Us

I can but that is not a configuration we would want to run with in production
Agreed, I just want to isolate the issue. I think this is the bottom python interface missing some configuration or environment variables

3 years ago

0 Hello Folks! I'Ve Deployed Clearml, Helm Chart Version

The address is valid. If i just go to the files server address on my browser,

@<1729309131241689088:profile|MistyFly99> what is the exact address of those files? (including the http prefix) and what is the address of the web application ?

4 months ago

0 Hello Clearml Community, Does Anyone Have An Idea How I Could Integrate/Manager Carla (

Runtime, every time the add_step needs to create a New Task to be enqueued

3 years ago

0 Hello If I Try To Create A Dataset From Code, As Shown In This Example I Have Two Questions:

Closing the data doesnt work: dataset.close() AttributeError: 'Dataset' object has no attribute 'close'

Hi @<1523714677488488448:profile|NastyOtter17> could you send he full exception ?

3 years ago

0 Hi, Is There A Simple Way To Make

Are you suggesting just taking the

read_and_process_file

function out of the

read_dataset

method,

Yes 🙂

As for the second option, you mean create the task in the

init

method of the NetCDFReader class?

correct

It would be a great idea to make the Task picklelizable,

Adding that to the next version to do list 😉

3 years ago

0 How Can I Add My Requirements.Txt File To The Pipeline Instead Of Each Tasks?

Yes exactly like a Task (pipeline is a type of task)
'''
clonedpipeline=Task.clone(pipeline_uid_here)
Task.enqueue(...)
'''

one year ago

0 How Can I Add My Requirements.Txt File To The Pipeline Instead Of Each Tasks?

Yeah you can ignore those, this is some python GC stuff, seems to be related with the OS and python version

one year ago

0 How Can I Add My Requirements.Txt File To The Pipeline Instead Of Each Tasks?

For running the pipeline remotely I want the path to be like /Users/adityachaudhry/.clearml/cache/......

I'm not sure I follow, if you are getting a path with all your folders from get_local_copy , that's exactly what you are looking for, no?

one year ago

0 Hi All. I Am Struggling With Integrating Plots Into My Task. Without The Plotting Code, The Task Never Completes The Execution And Seems To Hang. Also, The Plots Are Not Visible In The Plots Tab. I Am Running A For Loop For Different Models And Attemptin

Seems like it is working (including seaborn)

3 years ago

0 Hi, Where Can I Find Documentation Of The Full, Paid Version Of Allegro? (Including The Data Management Section)

PlainSquid19 yes the link is available on in the actual paid product 😞
I don't think they have the documentation open yet...
My recommendation is to fill the contact us form, you'll get a free online tour as well 😉

4 years ago

0 Hello, Has Anyone Know Any Solutions To This?

Thanks @<1523702652678967296:profile|DeliciousKoala34> I think I know what the issue is!
The container has 1.3.0a and you need 1.3.0 this is why it is re-downloading (I'll make sure the agent can sort it out, becuase this is Nvidia's version in reality it should be a perfect match)

one year ago

0 I Am Seeing That Some Steps In A Pipeline Are Being Skipped. Like For Example, In A Pipeline With 4 Steps, It’S Directly Starting At Step 3. Is There Some Reason For This, Some Optimization Kicking In?

Hi TrickySheep9
Could you post the pipeline code here?
Also which clearml version are you using ?

3 years ago

0 How Can I Serve My Custom Yolov8 Model On Clearml?

Hi @<1691258563357315072:profile|ColorfulKitten60>
I think we need some context for this question 🙂

8 months ago

0 After I Finish Training A Model, I Want To Call Logger.Report_Scalars To Help Monitor Inferencing Status (We Do A Lot Of Batch) But After The Model Finishes Training, Scalars Are No Longer Accepted By The Task As It Is Considered Completed. Help!

What's your clearml version (python and server) ?

It seems that once the job as completed once, it doesn't accept any new report...

completed can be forced, published cannot ...
What's the error you are getting ?

one year ago

0 Hi Guys, Until Today I Always Requested Data Scientists To Use Cli To Create Tasks. After That I Usually Reconfigure Them So They Can Be Pointed On Git Repo And So On. Unfortunately This Is Becoming A Big Task Since Now We Have Pipelines With Many Tasks A

😂

I'm trying to create a task that is not in repository root folder.

JuicyFox94 If the Task is not in a repo folder, you mean in a remote repository right ?
This means the repo should be in the form of " https://github.com/ " or "ssh://"
It failed in deducing this is a remote repository (maybe we can improve the auto detection?!)

3 years ago

0 Hi! I Am Getting The Following Error On An Agent:

Hi GrievingTurkey78
Can you test with the latest clearml-agent RC (I remember a fix just for that)
pip install clearml-agent==1.2.0rc0

3 years ago

0 Hi There, I’Ve Been Trying To Play Around With The Model Inference Pipeline Following

what do you have here in your docker compose :
None

one year ago

0 Hi! Is There Something Happening With The

ModelCheckpoint('best_model', save_best_only=True)That worked for me now, what's the diff

3 years ago

0 , This Is A Great Tool For Visualizing All Your Experiments. I Wanted To Know That When I Am Logging Scalar Plots With Title As Train Loss And Test Loss They Are Getting Diplayed As Train Loss And Test Loss In The Scalar Tab. I Wanted That The Title Shoul

make sense ?

4 years ago

0 Hi Guys, I Have Many Questions To Ask, Sorry If This Questions Were Posted Already - If The Answer Exist, Please, Point Me To It. Thank You For Your Help. I'M Training Object Detection Model Using Tf 2.3 Object Detection Api And Use Clearml On Local Serve

I have to admit, I haven't had the time 😞
Trying to get pip to be twice as fast 🤞
https://github.com/pypa/pip/pull/8215
Please keep pinging me, I would really like to follow on it.

3 years ago

0 Sometimes I Notice That At The End Of An Experiment Clearml Keeps Hanging (Something With Repository Detection?) And The Script Does Not End. Do More People See This? Especially In Our Continuous Integration Pipeline This Give Problems Because Tests Are G

GreasyPenguin14 GrittyKangaroo27 the new release contains a fix, could you verify it solves the issue in your scenario as well (there is now a smart timeout to detect the inconsistent state, that means the close/exit procedure might be delayed (10sec) instead of hanging in these specific rare scenarios)

3 years ago

0 Hi Everyone, I Have A Question That Is Not Directly Related To Clearml: What Is The Best Way To Start Clearml-Server And Clearml-Agents At Startup Of A Ubuntu Machine? I Do Not Need An Explanation, Just Need To Know What I Need To Read About.

Actually doesn't matter (systemd and init.d are diff ways to spin services on diff linux distros) you can pick whatever seems more continent for you, and whichever is supported by the linux you are running (in most cases both are) 🙂

3 years ago

Show more results