AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi Everyone! I'Ve Had A Problem. But When I Was Describing It Here It Was Solved. Maybe It Will Help Someone. I Use Pytorch And Training Accidentally Freezes After Weights Uploading By Trains. Don'T Know Exactly What'S Wrong, But It Was Somehow Connected

It doesn't not seem to be related to the upload. The upload itself finished... What's your Trains version?

4 years ago

PungentLouse55 could you test with 0.15.2rc0 see if there is any difference ?

4 years ago

PungentLouse55 hmmm
Do you have an idea on how we could quickly reproduce it?

4 years ago

0 Hi Community! I'M Currently Trying To Serve My Ai Model Using Clearml-Serving So I Can Access And Try My Model Through The Model Endpoint. Currently The Dataflow Of Clearml-Serving I Know Looks Like On This Diagram 1 (Model As A Rest Service). How Ever I

And is Exectuer actually runs something, or is it IO?

2 years ago

0 Hello! I'M Trying To Make A Simple Eval.Py Script That Will Go Pull The Best Model Of A Given Experiment, Load It Locally And Evaluate It On Whatever Data I Give. Question 1: Is There A Standard Way Documented Somewhere To Do This? Question 2: I'M Loadin

Well it seems we forgot that one 😞 I'll quickly make sure it is there.
As a quick solution (no need to upgrade)
task.models["output"]._models.keys()

one year ago

0 I Wanted To Suggest Something. We'Re Creating A Lot Of Projects And It Starts Getting A Bit Difficult To Navigate Through Them. I Think An Option To Have A Hierarchy In The Projects Can Be Very Useful.

let's call it an applicative project which has experiments and an abstract/parent project, or some other name that group applicative projects.

That was my way of thinking, the guys argued it will soon "deteriorate" into the first option :)

4 years ago

PompousBeetle71 that actually brings me to another question, how do you feel about "parent" experiment ?

4 years ago

0 Thanks For Releasing This Awesome Experiment Manager! I Was Logging A Single Training Session On Multiple Gpus (Using Detectron2), And Torch.Mp Is Called For Each Gpu. This Creates A Separate Task In Trains For Each Gpu, And Only One Of The Tasks Has The

Since this fix is all about synchronizing different processes, we wanted to be extra careful with the release. That said I think that what we have now should be quite stable. Plan is to have the RC available right after the weekend.

4 years ago

Hi VexedKangaroo32 , funny enough this is one of the fixes we will be releasing soon. There is a release scheduled for later this week, right after that I'll put here a link to an RC containing a fix to this exact issue.

4 years ago

BTW, VexedKangaroo32 are you using torch launch ?

4 years ago

0 Dear Clearml Community, I Am Looking For A Way To Properly Resume A Training In A Way That Initial Scalars Get Reused And Expanded. Clearml Feature For Reusing The Same Task Works Fine (When Using

Just one more question, do you have any idea about how I could change the x-axis label from "Iterations" to "Epochs"

You mean in the UI (i.e. just the title) ? or are you actually reporting iterations instead of epochs? and if so is this auto connected to tensorboard or is it reported manually ?

8 months ago

0 Hello, I Am Currently Trying To Install Unsloth On My Clearml Agent. However After Trying Many Different Approaches, There Seems To Be An Issue With Installing It From Github. The Closest I Come To An Installation Is With The Following Code:

Hi @<1637624975324090368:profile|ElatedBat21>
I think that what you want is:

Task.add_requirements("unsloth", "@ git+

")
task = Task.init(...)

after you do that, what are you seeing in the Task "Installed Packages" ?

3 months ago

0 Hi All. How Do I Read Out The Pipeline Configuration (Attached To The Pipeline Via Connect_Configuration) Inside A Task, Which I Created Via Add_Function_Step() For The Same Pipeline. We Have Tried Use A Pre_Execute_Callback, And Then Acess `Node.Job.Task

Hi @<1543766544847212544:profile|SorePelican79>
You want the pipeline configuration itself, not the pipeline component, correct?

pipeline = Task.get_task(Task.current_task().parent)
conf_text = pipeline.get_configuration_object(name="config name")

conf_dict = pipeline.get_configuration_object_as_dict(name="config name")

11 months ago

0 Hello All , Good Morning ! Can You Help Better Understand The Distinction Of Cleargpt? How Is It Different From Chatgpt And What Gpt Model Are We Using In Clearml ? Thank You In Advance !

still it is a chatgpt interface correct ?

Actually, no. And we will change the wording on the website so it is more intuitive to understand.
The idea is you actually train your own model (not chatgpt/openai) and use that model internally, which means everything is done inside your organisation, from data through training and ending with deployment. Does that make sense ?

11 months ago

0 Crazy Idea:

got to love the ascii art 😍

11 months ago

0 Crazy Idea:

This is awesome man !

11 months ago

0 For Clearml Serving, If I Am Trying To Deploy 100 Models On A Gpu That Can Handle 5 Concurrently, But Each One Will Be Sporadically Used (Fine Tuned Models Trained For Different Customers), Can Clearml-Serving Automatically Load And Unload Models Based Up

Triton server does not support saving models off to normal RAM for faster loading/unloadingCorrect, the enterprise version also does not support RAM caching

Therefore, currently, we can deploy 100 models when only 5 can be concurrently loaded, but when they are unloaded/loaded (automatically by ClearML), it will take a few seconds because it is being read from the the SSD, depending on the size.

Correct, there is also deserializing CPU time (imaging unpickling 20GB file, this takes ...

11 months ago

0 Hi

Thanks SarcasticSparrow10 !
I'll later reply the Github issue (for better visibility)
But my initial thoughts:
(1) I think this was suggested, and hopefully we will get to implementing it, I can definitely see the value. Meanwhile you can achieve some of the functionality with the experiment table and custom columns 🙂
(2) "Don't display the performance metric" -> isn't that important? what am I missing?
(3) Hmm you mean just extra columns?
(4) sounds like a bug
(5) is this a plotly issue?...

3 years ago

0 Hi Guys, I Managed To Set Up A Kubernetes Cluster And Install Trains Into It. While Testing My Set-Up I Run The Test_Reporting.Py Example

Hi WickedGoat98

"Failed uploading to //:8081/files_server:"

Seems like the problem. what do you have defined as files_server in the trains.conf

3 years ago

0 Hi All - I Have A Question To Ask (And Not Sure If There Is A Channel For Faqs So Sorry For Putting It Here) ... I Am Using Trains In Combination With Pycharm'S Remote Debugging. I Have The Pycharm Plugin Installed. When The Experiment Ends, I Get

Hmm, yes this fits the message. Which basically says that it gave up on analyzing the code because it run out of time. Is the execution very short? Or the repo very large?

4 years ago

Hi VexedKangaroo32 , there is now an RC with a fix:
pip install trains==0.13.4rc0Let me know if it solved the problem

4 years ago

Thanks VexedKangaroo32 , this is great news :)

4 years ago

0 Hi Everybody. When I Want To Force The Agent To Not Reproduce My Local Pip Environment, I Add

My question is what should be the path to the requirements.txt file?
Is it relative to the repo base?

This is actually in runtime (i.e. when running the code), so relative to the working directory. Make sense ? (you can specify absolute path, probably something I would avoid in the code base though...)

2 years ago

0 So I Bumped Onto This Comparison Shared By Dagshub. It Kinda Placed Clearml Is A Rather Bad Position Compared To Everything Else In The Industry.

SubstantialElk6 feel free to tweet them on their very inaccurate comparison table 🙂

3 years ago

0 Hi There! Some Background Info Before I Put Forward My Question: I'M Writing-Up A Small Script To Help Me Manage My Tasks. Specifically I Often Need To Abort (And Archive) A

Hi StickyMonkey98

a

very

large number of running and pending tasks, and doing that kind of thing via the web-interface by clicking away one-by-one is not a viable solution.

Bulk operations are now supported , upgrade the clearml-server to 1.0.2 🙂

Is it possible to fetch a list of tasks via Task.get_tasks,

Sure:
Task.get_tasks(project_name='example', task_filter=dict(system_tags=['-archived']))

3 years ago

0 Hi There! Some Background Info Before I Put Forward My Question: I'M Writing-Up A Small Script To Help Me Manage My Tasks. Specifically I Often Need To Abort (And Archive) A

Glad to hear 🙂

3 years ago

0 Hi All, I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?

So in a simple "all-or-nothing"

Actually this is the only solution unless preemption is supported, i.e. abort running Task to free-up an agent...
There is no "magic" solution for complex multi-node scheduling, even SLURM will essentially do the same ...

3 years ago

0 Hey, I Have Many Python Files. In The First Python File I Use The Following Line. Parameters = Task.Connect(Input) Now I Change The Hyperparameters On The Graphical Interface. But Now I Need The Hyperparameters In Every Python File. How Do I Have Access T

Hi UnsightlySeagull42

But now I need the hyperparameters in every python file.

You can always get the Task from anywhere?
main_task = Task.current_task()

3 years ago

The problem is not really for the agents to wait (this is easily solved by additional high priority queue) the problem is will you have a "free" agent... you see my point ?

3 years ago

Hi ProudChicken98
task.connect(input) preserves the types based on the "input" dict types, on the flip side get_parameters returns the string representation (as stored on the clearml-server).
Is there a specific reason for using get_parameters over connect ?

3 years ago

Show more results