AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 8 months ago

Reputation

Badges 1

25 × Eureka!

Questions 48
Answers 8051

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

<!here> New video is out :slightly_smiling_face: Cloud Autoscalers are awesome <https://www.youtube.com/watch?v=j4XVMAaUt3E>

New video is out 🙂 Cloud Autoscalers are awesome https://www.youtube.com/watch?v=j4XVMAaUt3E

clearml

2 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Quick Note: V1.3.1 Caused Pipelinedecorator Tasks To By Default Disable The Automagic Frameworks Connection, This Bug Is Solved In The Latest Rc

Quick note: v1.3.1 caused PipelineDecorator Tasks to by default disable the automagic frameworks connection, this bug is solved in the latest RC pip install ...

clearml

2 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of Trains :smile_cat: ) <https://twitter.com/PyTorch/status/1272919483980500999>

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

<https://allegro.ai/docs>

https://allegro.ai/docs

clearml

4 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Hi , v0.15 is out, 🎉 🚀 Your feedback had a major influence on the features we added 🙂 thank you! A selected list of features: Column resizing / ordering /...

clearml

4 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

This Is Usually Due To Enterprise Level Issued Https Certificates Not Part Of The Local Installation (Basically Any Python Generated Ssl Request Will Fail)

This is usually due to enterprise level issued https certificates not part of the local installation (basically any python generated SSL request will fail)

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

We Are At Aaai Ny, Come Look Us Up :)

We are at AAAI NY, come look us up :)

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

<!here> Gals/Guys/:robot_face: If you have ideas on improving the Slack Monitoring service, please add them on the dedicated Github Issue : <https://github.com/allegroai/trains/issues/161> For example: generate an alert if my experiment reaches a certain

Gals/Guys/ :robot_face: If you have ideas on improving the Slack Monitoring service, please add them on the dedicated Github Issue : https://github.com/alleg...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

@YummyWhale40 awesome thanks!

YummyWhale40 awesome thanks!

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Apparently Everyone Can ...

apparently everyone can ...

clearml

4 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

Hi https://github.com/allegroai/trains/releases/tag/0.15.1 / https://github.com/allegroai/trains-server/releases/tag/0.15.1 / https://github.com/allegroai/tr...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

New Rc For Trains-Agent Is Out

New RC for trains-agent is out pip install trains-agent==0.13.2rc1

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Finally

clearml

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi ! trains 0.16.2 is finally out with the new pipelines interface! Check out the new example https://github.com/allegroai/trains/blob/master/examples/pipeli...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Is It A One Time Thing? Or Recurring?

Is it a one time thing? or recurring?

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Slack Security ... Go Figure

Slack security ... Go figure 😉

clearml

4 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Gals, Guys &

Gals, Guys & :robot_face: , if you want to checkout the Hyper-Parameters automation (Using Bayesian Optimization Hyper-Band) We have an example on the demo s...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hi Guys/Gals, If You Want To Checkout The Latest Rc We Have 0.15.0Rc0 Out :

Hi Guys/Gals, If you want to checkout the latest RC we have 0.15.0rc0 out : pip install trains==0.15.0rc0 pip install trains-agent==0.15.0rc0Many of the impr...

clearml

4 years ago

Show more results

0 Hey Since Hydra Does Not Work With

I see TrickyFox41 try the following:
--args overrides="param=value"Notice this will change the Args/overrides argument that will be parsed by hydra to override it's params

2 years ago

0 Another One: What Is The Difference Between Task.Connect() And Task.Set_Parameter?

Task.connect is "automagic" i.e. to server when in Manual mode, from server in agent mode,
set_parameter is one way only and should be used to set an external Task's parameters.

4 years ago

0 Does Clearml-Session Work In A Kubernetes Environment?

Hi TrickySheep9
Long story short, clearml-session fully supports k8s (using k8s glue)
The --remote-gateway along side ports mode will basically allow you to setup a k8s service so that every session will register with a specific port so k8s does ingest foe you and route the SSH connection to the pod itslef, everything else is tunneled over the original SSH connection.
Make sense ?

3 years ago

0 Hi, When I Try To Execute Pipeline Remotely (In A Docker Container, Triggered From Clearml Ui) I Get An Error '

Hi @<1643060801088524288:profile|HarebrainedOstrich43>
I think I understand what's going on, in order for the pipeline logic to be "aware" of the pipeline component, it needs to be declared in the pipeline logic script file (or scope if you will).
Try to import from src.testagentcomponent import step_one also in the global pipeline script (not just inside the function)

one year ago

I think they (DevOps) said something about next week, internal roll-out is this week (I think)

2 years ago

0 Another Quick Question About Fileservers And Clearml-Agent: Clearml-Agent Seems To Ignore The Output Destination Set In The Task Config

Makes sense, but this means that we are not able to tell clearml-agent where to save on a per-task basis?

The debug samples? or the artifacts/models?

Also it is not possible to use multiple files server? E.g. log tasks on different S3 buckets without changing clearml.conf

Yes, change the Task's output destination in the UI (or programmatically)

one year ago

0 Hello. I Have An Issue In Regards To A Task That I Run As A Service ( Should Always Run). I Run The Clearml Server And Agents In Kubernetes. I Think This Is A Design Problem With The Way Clearml Agents Run On Kubernetes. The K8S Glue Will Launch A Worker

This means that if something happens with the k8s node the pod runs on,

Actually if the pod crashed (the pod not the Task) k8s should re spin it, no?

I also experience that if a worker pod running a task is terminated, clearml does not fail/abort the task.

From the k8s perspective, if the task ended (failed/completed) it always return with exit code 0, i.e. success. Because the agent was able to spin the Task. We do not want Tasks with exception to litter the k8s with endless r...

2 years ago

0 I Seem To Be Missing Something ... I'Ve Only Got One Task Running To Train A Segmentation Model On My Local Machine, And In A Few Days It'S Hit Over 1.15M Api Calls. It Looks Like It'S Sending Every Single Console Output ... Are There Settings To Control

well from 2 to 30sec is a factor of 15, I think this is a good start 🙂

one year ago

0 Does Anyone Know If You Can Export A Dataset (Ml) Directly From A Database To The Clearml-Data Or Have To Export Out As Csv First ?

DeliciousBluewhale87 you can try:
` import sqlite3
import pandas as pd

conn = sqlite3.connect('test_database')

sql_query = pd.read_sql_query ('''
SELECT
*
FROM products
''', conn)

sql_query.to_csv(...) `

3 years ago

0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

There seems to be a problem with multiprocessing: Although I stopped the task,

You mean you "aborted the task" from the UI?

There is a memory leak somewhere, please see the screenshot of datadog memory consumptionI'm assuming from the leftover processes ?

Python 3.8/Pytorch 1.11/clearml-sdk 1.9.0/clearml-agent 1.4.1

From the log I see the agent is running in venv mode
Hmm please try with the latest clearml-agent (the others should not have any effect)

one year ago

Most likely yes, but I don't see how clearml would have an impact here, I am more inclined to think it would be a pytorch dataloader issue, although I don't see why

These are most certainly dataloader process. But clearml-agent when killing the process should also kill all subprocesses, and it might be there is something going on that prenets it from killing the subprocesses ...

Is this easily reproducible ? Can you verify it is still the case with the latest RC of clearml-agent ?

one year ago

0 Hi

SarcasticSparrow10 how do I reproduce it?
I tried launching from a sub process that is a daemon and it worked. Are you using ProcessPool ?

3 years ago

0 Is There An Easy Way To Add A Link To One Of The Tasks Panels? (As An Artifact, Configuration, Info, Etc)? Edit: And Follow Up Regarding The Dataset. As Discussed Somewhere Previously, The Datasets Are Now Automatically Moved To A Hidden "Sub-Project" Pr

LOL love that approach.
Basically here is what I'm thinking,
` from clearml import Task, InputModel, OutputModel

task = Task.init(...)

run this part once

if task.running_locally():
my_auxiliary_stuff = OutputModel()
my_auxiliary_stuff.system_tags = ["DATA"]
my_auxiliary_stuff.update_weights_package(weights_path="/path/to/additional/files")
input_my_auxiliary = InputModel(model_id=my_auxiliary_stuff.id)
task.connect(input_my_auxiliary, "my_auxiliary")

task.execute_remotely()
my_a...

2 years ago

0 Hey There, Happy New Year To All Of You

Did you experiment any drop of performances using forkserver?

No, seems to be working properly for me.

If yes, did you test the variant suggested in the pytorch issue? If yes, did it solve the speed issue?

I haven't tested it, that said it seems like a generic optimization of the DataLoader

3 years ago

0 Hi Folks, I Did A Deployment Of Clearml Using The K8S Helm Chart, And I Set The Agent Using K8S Glue. I Run A Task Locally, And I Went To The Ui Cloned The Experiment And Scheduled It In The Default Queue. After Doing This, I See That The Experiment Is Q

okay this points to an issue with the k8s glue, I think it somehow failed to launch the pod. Can you send me the log of the clearml-k8s-glue ?

2 years ago

0 Hi, Together With

(It would be nice to have all the Pypi releases tagged in github btw)

I wanted to say, we listen ... and point to the tag , but for some reason it was not pushed LOL.

4 years ago

0 Hi, I Have A Script Running Cross Validation, Basically It Calls 5 Times (5 Folds) Another Script That Does A Training And Evaluation. Is It Possible In Clearml To Have A Main Task (The Complete Cross Validation) And Subtasks (One For Each Fold)?

Sounds good 🙂

3 years ago

0 Does Clearml-Session Work In A Kubernetes Environment?

Have to get glue setup, which I couldn’t understand fully, so that’s a different topic

I suggest using the apply template setup (basically you provide a Job/Service template, and it uses that to setup k8s jobs based on the Tasks coming in from the specific queue)

3 years ago

0 I Have A Question Regarding Reducing Execution Time Of Pulling Results From The Server With The Python Api. As Part Of Some Pipeline, After Running Hpo I Am Pulling All The Results From My Optimizer Task And Also Pulling All The Scalars Associated With Th

Sounds good to me. DepressedChimpanzee34 any chance you can add a github feature request, so we do not forget to add it?

3 years ago

0 With Clearml 1.0 It Seems That Console Logs Are Only Shown In The Web Ui When The Task Has Finished. Is This Expected Behaviour? With Previous Versions I Was Able To See "Live" Output. I Tested This With The Pytorch_Tensorboardx.Py Example. I Run The Scri

quick update 1.0.2 will be ready in an hour, apologies 😞

3 years ago

0 Hi, I Have A Small Question Regarding K8S Clearml-Serving Behavior. I Have In My Cluster One Gpu Of 16Gb Ram, And Another One Of 24 Gb Ram. I Have A Llm Model Fitting The 24Gb But Not The 16Gb Gpu. When I Call The Endpoint, How Will I Know To Which Gpu I

Correct the serving Task ID is the clearml serving session. It is the instance that holds all the information of this specific setup and models

one year ago

0 Having Trouble Trying To Set Up My Own Server Hosted Locally On My Mac, I'Ve Got It All Set Up Using Docker And I Can See The Dashboard And Have Changed My Configuration File To Be Set To The Local Ports But When Trying To Run Any Of The Examples, This Is

ThickFox50 I also have to point that there is a free hosted server here 🙂 https://app.community.clear.ml

3 years ago

0 Hi (Again... Sorry For Asking So Many Questions) Question About Using Google Cloud Storage In A Clearml Agent Running In Aws Ec2 Instance. My

Hi PanickyMoth78
Hmm yes, I think the StorageManager (i.e. the google storage pythonclinet) also needs a json file with the credentials.
Let me check something

2 years ago

0 Hello Guys, I Have A Strange Situation With A Pipeline Controller I'M Testing Atm. If I Run The Controller Directly In My Pycharm On Notebook It Connects Correctly To The K8S Cluster With Trains Installed. After This, If I Go Directly In The Ui, I Reset T

what is?

3 years ago

No worries, just found it. Thanks!
I'll make sure to followup on the GitHub issue for better visibility 🙂

3 years ago

Sure thing 🙂

3 years ago

0 Hi, What Is The Right Way Of Syncing A Dataset? Whenever I Add New Archives And Try To Upload I Get:

Hi SkinnyPanda43
Every "commit" is a new version, so sync changes you need to either create a new version (with parent version as the previous one), and sync the local folder (or manually add/remove files).
If you do not need to actually store the "current" version, you can just reset the Task, and sync it again.
wdyt?

3 years ago

0 Hey Since Hydra Does Not Work With

Hi TrickyFox41

Hey since Hydra does not work with

clearml-task

I should shouldn't it? what does not work ?

2 years ago

0 Hi All, I'M A New User With Clearml-Agent. I Know It'S Supposed To Automatically Replicate The Environment Of A Task, Based On Installed Packages List. However, Installed Packages Of My Task Is Misses Many Of Installed Packages (Any Idea Why?) How Do I Co

Could you put the log here?

one year ago

0 If The Trains-Server Stops Responding, Would Any Running Experiment Keep A Cache Of To-Be-Sent-Data, Fail The Experiment, Or Continue The Run, Skipping The Recordings Until The Server Is Back Up?

Hmm TrickyRaccoon92 take a look at the cleanup service, I think you can hack it so instead of deleting the artifacts, it will archive them somewhere (also you can change the filter, maybe only perform on experiments with specific user tag)
What do you think?

https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py

4 years ago

Show more results