AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8051

0 Hi, We Use Clearml To Track All Our Experiments. For Each Experiment The Accuracy The Logged For Both The Training And The Test Set:

Are you also adding those metrics to the experiment table as extra columns ?

3 years ago

0 Hi, I Noticed That Clearml Does Not Work Together With The Debugger In Pycharm. Everytime I Use The Debugger I Have To First Comment Out The Clearml Code. Is It Possible To Solve This?

GreasyPenguin14 we never had troubles with Task.init (or any other clearml calls) and working with the pycharm debugger, we use it quite extensively ...
Actually on a very similar setup...
Could you send the full log?
Or maybe a code snippet to reproduce this behavior ?
(We did notice they fixed a few issues with the debugger in 2020.3.3 so it's worth upgrading)

3 years ago

0 Good Day! I Ran Into A Problem When When Running Two Or More Identical Nodes In A Pipeline (Multi_Instance_Support=True ), Only One Of Them Uses An Already Created Venv From Cache For This Task. And The Other Node Starts To Re-Create The Same Virtual Envi

Hi @<1598487094601191424:profile|MysteriousCow84>

only one of them uses an already created venv from cache for this task. And the other node starts to re-create the same virtual environment.

Just be clear, the second one is running, but it does not use the same venv as the other one (that is running in parallel), is that correct?

one year ago

0 Hi, When Using The Logger.Report_Table() Method (

Yep, this seems like a bug in the display

3 years ago

0 Is There A Document Which Describes What Kind Of Data Is Stored In Elasticsearch, Mongodb And Redis.. Just Trying To Understand The Architecture Of Trains And See How It Fit Together

PompousParrot44 unfortunately not yet 😞
But the gist is :
MongoDB stores experiment data (i.e. execution parameters, git ref etc.)
ElasticSearch stores results (i.e. metrics console logs, debug image links etc.)
Does that help?

4 years ago

0 When Running In

PompousParrot44 I see what you mean, yes multiple context switching might cause a bit of decline in performance. not sure how much though ... The alternative of course is to set cpu affinity... Anyhow if you do get there we can try to come up with something that makes sense, but at the end there is no magic there 🙂

4 years ago

0 Hey Folks, When I Run

According to you the VPN shouldn't be a problem right?

Correct as long as all parties are on the same VPN it should work, all the connections are always http so basically trivial communication

3 years ago

0 Why Does My Task Execution Freeze After Pip Installation (Running Agent In Foreground Mode)? The Task Is:

Why does my task execution freeze after pip installation (running agent in foreground mode)?

Hi AdventurousButterfly15
Are you running in agent docker mode or venv mode ?
What do you mean freeze? do you see anything on the Taks console log in the UI? what's the host OS ?

2 years ago

0 Hi, Does Anyone Use Mlflow / Weight & Biases /

Hmmm, can you view the settings? that's the only thing I can think of at the moment that will be diff between your setup and the working one...

Also, is there a way for you to have the trains-server behind https (on your GCP)

4 years ago

0 Hi, We'Re Facing An Error When Uploading Model Checkpoints To Clearml During Training (Using Clearml Version 1.9.0 And Pytorch Lightning 1.7.6), Anyone Knows How To Solve? Thanks! The Error: Clearml.Storage - Error - Failed Uploading: Httpsconnectionpool(

Hi TightDog77 _

HTTPSConnectionPool(host='

', port=443): Max retries exceeded with url: /upload/storage/v1/b/models/o?uploadType=resumable (Caused by SSLError(SSLError(1, '[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2633)')))

This seems like a network error to GCP, (basically GCP python package thows it)
Are you always getting this error? is this something new ?

one year ago

0 Hi, I Faced With A Silly Error, When I Run The Python Script With Task = Trains.Init(Project_Name='My Project', Task_Name='My Task'). The Task Goes To The Trains Server, But In The Trains Server, In Installed Packages Part One Of The Line

Thanks@doru! BTW if you are running a code from outside the trains repo, do you still get the double package?

4 years ago

0 When Using A

Nothing that can't be worked around but for automation I don't think creating a TriggerScheduler with an existing name should be allowed

DangerousDragonfly8 I think I understand , basically you are saying the fact a user can create two triggers with the same name can create some confusion ?

It also sucks a bit that each TriggerScheduler will run in it's own pod in kubernetes.

Actually this depends on how you spin it, and you can actually spin a a service agents running multiple...

2 years ago

0 Hi It Is Me Again, This Time Trying To Upload A Single File As Dataset But Met With The Following Error. The File Is 13.42Gb And Of Apache Arrow Format. Any Idea How To Solve This Error Please? Thank You.

total size 5.34 GB, 1 chunked stored (average size 5.34 GB)PanickyAnt52 The issue itself the Dataset will not break files (it will package into multiple zip files a large folder, but not break the file itself).
The upload itself is limited by the HTTP interface (i.e. 2GB file size limit)
I would just encode it into multiple Arrow files
does that make sense ?

2 years ago

0 Hi, I'M Looking For An Example Show How

Hi OutrageousSheep60
Do you mean something like:
https://github.com/allegroai/clearml/tree/master/examples/datasets
?

2 years ago

0 Hi There, I Am Running A Clearml-Agent In Services Mode (With Docker) On A Machine With Two Disks: One With The Os (8Go, 91% Space Used) And One For The Data (100Go, 40% Space Used). When Executing The Auto-Scaler Task In This Agent, I Get The Following E

I have to admit mounting it to a different drive is a good reason to bring this feature back, the reasoning was it means the agent needs to make sure it manages them (e.g. multiple agents running on the same machine)

3 years ago

0 Hi Guys, Any Plan To Integrate The

Awesome! Any chance you feel like contributing it, I'm sure ppl would be thrilled 🙂

4 years ago

0 Hello! Is It Possible To Run Pipeline Controller Tasks Locally, Similar To Regular Tasks That Run Locally By Default If

Hi SucculentBeetle7
Sure check the latest implementation, it now has "start" and "start_remotely" 🙂

3 years ago

0 Hi! I Am Researching Different Mlops Libraries / Platforms. I Don'T Want To Use Platform As A Service Solutions. Could You Suggest Me What Are The Main Differences Between Clearml And Mlflow? What Are Advantages Of Using Clearml?

Hi RoundMosquito25
This is a bit old but probably a good start:
https://clear.ml/blog/stacking-up-against-the-competition/
tl;dr
ClearML advantages (at least a few I can think of)
Scales way better Enables out of the box experiment orchestration (i.e. remote execution etc) Data management Nicer UI Full RestAPI Full MLops platform Model serving Query-able model repositoryProbably more 🙂

2 years ago

0 Hi, Another Question If You May. Is It Possible To Edit A Logged Task? For Instance - Remove All The Metrics From Some Step Onward?

OddAlligator72 let's separate the two issues:
Continue reporting from a previous iteration Retrieving a previously stored checkpointNow for the details:
Are you referring to a scenario where you execute your code manually (i.e. without the trains-agent) ?

4 years ago

0 From Datetime Import Datetime Import Hashlib From Clearml Import Task Previous_Timestamp = 0 Task_Filter = {} Task_Filter.Update( { 'Page_Size': 100, 'Page': 0, 'Status_Changed': ['>{}'.Format(Datetime.Utcfromtimestamp(Previou

FYI: if you need to query stuff you can always look directly in the RestAPI:
https://github.com/allegroai/clearml/blob/master/clearml/backend_api/services/v2_9/projects.py
https://allegro.ai/clearml/docs/rst/references/clearml_api_ref/index.html

3 years ago

0 Hi, I Think I Found A Bug: In The

StaleKangaroo85 check https://demoapp.trains.allegro.ai/projects/0e152d03acf94ae4bb1f3787e293a9f5/experiments/193ac2bced184c49a57658fceb4bd7f9/info-output/metrics/plots?columns=type&columns=name&columns=status&columns=project.name&columns=user.name&columns=started&columns=last_update&columns=last_iteration&order=last_update on the demo server, seems okay to me...

4 years ago

0 Hi, We Use Clearml To Track All Our Experiments. For Each Experiment The Accuracy The Logged For Both The Training And The Test Set:

Hi GreasyPenguin14
Quick question, any reason not to use a 2D scatter ? or a histogram (or any other non time-series plot)?

3 years ago

0 Hi, Can You Help Me Pls, I Got: Environment Setup Completed Successfully Starting Task Execution: Traceback (Most Recent Call Last): File "Agro_Api.Py", Line 13, In From Help_Models.Consts Import Urls Importerror: No Module Named 'Help_Models'

HI PlainSquid19 could you add a bit more information? Are you running trains-agent ? is it in docker/venv mode ? what's the trains/trains-agent/trains-server versions ?

4 years ago

0 Hi All, Is It Possible To Control The Number Of Steps Of The Pipeline During Run Time. Eg. If User Wants #N Parallel Steps In The Pipeline

@<1523701523954012160:profile|ShallowCormorant89> can you verify it is reproducible in 1.9.3 ? because if it is I'd like to fix that 🙂

will it be possible for us to configure the "new run" button in a way so that it always clones from a particular pipeline ?

What do you mean by "particular pipeline" ? by default it will clone the last successful one, and by right clicking a specific one you can run a copy of that one. what am I missing ?

one year ago

0 Hi, I'M Using Clearml'S Hosted Free Saas Offering. I'M Running Model Training In Pytorch On A Server And Pushing Metrics To Cml. I'Ve Noticed That Anytime My Training Job Fails Due To Gpu Oom Issues, Cml Marks The Job As

Hi JumpyPig73
Funny enough this is being fixed as we speak 🙂
The main issue is that as you mentioned, ClearML does not "detect" the exit code when os.exit() is called, and this is why it is "missing" the failed test (because as mentioned, all exceptions are caught). This should be fixed in the next RC

2 years ago

0 Hello, I Have Two Experiments Having The Same Plot With The Same X Values. When I Compare These Two Experiments, The Plots Are Drawn Next To Each Other (See Figure), But I Would Appreciate To See The Y-Values Of The Experiments Just In One Plot. The Plot

My pleasure 🙂

2 years ago

0 Hi All! Is There Any Simple Way To Use

Since I can't use the

torchrun

comand (from my tests, clearml won't use it on the clearm-agent), I went with the

@<1556450111259676672:profile|PlainSeaurchin97> did you check this example?
None

one year ago

0 For The Frameworks Which Are Supported In Built, Trains Stores The Trained Model As Output Model E.G. For Xgboost Here

PompousParrot44 the fundamental difference is that artifacts are uploaded manually (i.e. a user will specifically "ask" to upload an artifact), models are logged automatically and a user might not want them uploaded (imagine debugging sessions, or testing).
By adding the 'upload_uri' arguments, you can specify to trains that you want all models to be automatically uploaded (not just logged).
Now here is the nice thing, when running using the trains-agent, you can have:
Always upload the mod...

4 years ago

BTW: get_tasks has project_name argument, I would just use it 🙂

3 years ago

Task.get_projects() 🙂

3 years ago

Show more results