AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

25 × Eureka!

Questions 48
Answers 8051

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Lol, I Wonder What The Adblock Rule Was ;)

Lol, I wonder what the adblock rule was ;)

clearml

4 years ago

0 Votes

0 Answers

933 Views

0 Votes 0 Answers 933 Views

Hi Gals / :robot_face: / Guys <!here> Quick update, we will be updating the GitHub repository tomorrow with the new ClearML version, together with the accompanying python packages, ETA Noon(ish) PT time. `trains` , `trains-agent` and the docker images a

Hi Gals / :robot_face: / Guys Quick update, we will be updating the GitHub repository tomorrow with the new ClearML version, together with the accompanying p...

clearml

3 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

@YummyWhale40 awesome thanks!

YummyWhale40 awesome thanks!

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

<https://allegro.ai/docs>

https://allegro.ai/docs

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

you set it :slightly_smiling_face:

you set it 🙂

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hi Guys/Gals, If You Want To Checkout The Latest Rc We Have 0.15.0Rc0 Out :

Hi Guys/Gals, If you want to checkout the latest RC we have 0.15.0rc0 out : pip install trains==0.15.0rc0 pip install trains-agent==0.15.0rc0Many of the impr...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

<!channel> *important notice* : it seems Nvidia broke some of their PPA's security :confused: , causing `apt-get updates` to fail inside containers. This in term will cause `clearml-agent` to fail with specific Nvidia containers. _If you are seeing simila

important notice : it seems Nvidia broke some of their PPA's security 😕 , causing apt-get updates to fail inside containers. This in term will cause clearml...

clearml

2 years ago

0 Votes

0 Answers

924 Views

0 Votes 0 Answers 924 Views

<!everyone> Trains v0.14.2 is out (<https://github.com/allegroai/trains/releases/tag/0.14.2|Change log>) Highlights: <https://github.com/allegroai/trains/blob/master/trains/storage/manager.py#L13|trains.storage.StorageManager> - with caching for any http

Trains v0.14.2 is out ( https://github.com/allegroai/trains/releases/tag/0.14.2 ) Highlights: https://github.com/allegroai/trains/blob/master/trains/storage/...

clearml

4 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi ! ClearML Server + SDK v1.9.0 is out! 🎉 🚀 🎊 Happy Holidays and Happy New Year! ❇️ 🎇 🎄

clearml

one year ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

This Is Usually Due To Enterprise Level Issued Https Certificates Not Part Of The Local Installation (Basically Any Python Generated Ssl Request Will Fail)

This is usually due to enterprise level issued https certificates not part of the local installation (basically any python generated SSL request will fail)

clearml

4 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

Hi https://github.com/allegroai/trains/releases/tag/0.15.1 / https://github.com/allegroai/trains-server/releases/tag/0.15.1 / https://github.com/allegroai/tr...

clearml

4 years ago

0 Votes

1 Answers

992 Views

0 Votes 1 Answers 992 Views

Gals, Guys &

Gals, Guys & :robot_face: , if you want to checkout the Hyper-Parameters automation (Using Bayesian Optimization Hyper-Band) We have an example on the demo s...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Https://M.Facebook.Com/Story.Php?Story_Fbid=2484620658505570&Id=1620822758218702&Refid=52&__Tn__=-R

https://m.facebook.com/story.php?story_fbid=2484620658505570&id=1620822758218702&refid=52&tn=-R

clearml

4 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Quick Note: V1.3.1 Caused Pipelinedecorator Tasks To By Default Disable The Automagic Frameworks Connection, This Bug Is Solved In The Latest Rc

Quick note: v1.3.1 caused PipelineDecorator Tasks to by default disable the automagic frameworks connection, this bug is solved in the latest RC pip install ...

clearml

2 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Is You Server Using Https ?!

Is you server using https ?!

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

YEY!!!! *Download as CSV* :exploding_head:

YEY!!!! Download as CSV 🤯

clearml

2 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

I Would Guess Connectivity Issues, The Tls Is Probably Python Inaccurate Response (I Mean In A Way, It Is Also A Tls Error, But I Would Imagine This Has More To Do With The Actual Network Connection)

I would guess connectivity issues, the TLS is probably python inaccurate response (I mean in a way, it is also a TLS error, but I would imagine this has more...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Is It A One Time Thing? Or Recurring?

Is it a one time thing? or recurring?

clearml

4 years ago

Show more results

0 Any Idea Why Only A Single Instance Of Mujoco Can Be Run With Clearml-Agent? I Run 2 Clearm-Agents, One Per Gpu On My Workstation. However, The Second Task Failes With One Of The Following Errors:

My pleasure

3 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

MysteriousBee56 what do you mean "delete a worker"
stop the agent running remotely ?

4 years ago

0 Hey There, Since A Bit I Often Find Experiments Being Stuck While Training A Model. It Seems To Happen Randomly And I Could Not Find A Reproducible Scenario So Far, But It Happens Often Enough To Be Annoying (I'D Say 1 Out Of 5 Experiments). The Symptoms

There seems to be a problem with multiprocessing: Although I stopped the task,

You mean you "aborted the task" from the UI?

There is a memory leak somewhere, please see the screenshot of datadog memory consumptionI'm assuming from the leftover processes ?

Python 3.8/Pytorch 1.11/clearml-sdk 1.9.0/clearml-agent 1.4.1

From the log I see the agent is running in venv mode
Hmm please try with the latest clearml-agent (the others should not have any effect)

one year ago

0 Hello There, I Am Trying To Organize The Dl Code Into A Monorepo, The Repo Will Have A Section Of Shared Packages That Will Be Used By Other Packages That Are The Actual Training Projects. Let'S Say That I Install The Shared Libs With Pip In Editable Mod

But I am starting to wonder whether It would be easier just changing sys,path on the scripts that use the sibling libs.

that depends, how would the sibling packages get to a remote machine ?

2 years ago

0 We Have A Environment Variables Definitions.Py File Which Every User Configures On Their Local Machine. This File Includes Local Paths As Well As Aws/Api Credentials. This Is An Issue When Spinning Up Clearml Tasks Since It Is Not Included In The Git Repo

Right, so this "vault" design is built into the paid tiers of ClearML to achieve exactly that. Long story short, users can put their credentials/configs on the clearml-server and the agent (or the clients) will pull and merge them into the execution.
It's very cool and works really nice, but not part of the open source (or the SaaS tier).
What you could do is store these configurations on the Task itself (one way o r another). Maybe for example have an empty definitions.py file part of ...

2 years ago

0 Hi! I Am Setting Up A Few Clearml Agents To Run On A Local Gpu Server. They Have To Run In Their Own Docker Containers, Since We'Re Not Allowed To Install A Python Runtime On The Server Directly. I Got One Agent Running Listening To The Default Queue, Bu

Thanks StrongHorse8
Where do you think would be a good place to put a more advanced setup? Maybe we should add an entry for DevOps? Wdyt?

3 years ago

0 Hello Everyone. I'Ve Just Started Playing With Clearml. In The 2Nd 'Getting Started' Tutorial, I Launched The Agent From Google Colab. But Whenever A Task Is Picked, It Fails For The Following Error. Any Clues? Thank You!

Oh!
I see this is using the colab as remote agent (i.e. to launch jobs on it),

[ColabKernelApp] CRITICAL | Bad config encountered during initialization: The 'kernel_class' trait of <main.ColabKernelApp object at 0x7fa41b29e5c0> instance must be a type, but 'google.colab._kernel.Kernel' could not be imported

Can you send the full log?

7 months ago

Hi @<1686547344096497664:profile|ContemplativeArcticwolf43>

In the 2nd 'Getting Started' tutorial,

Could you send a link to the specific notebook?

. But whenever a task is picked, it fails for the following

You mean after the Task.init call?

7 months ago

0 Hi Guys, I Feel Like I'M Missing Something Regarding The Way I Should Be Cloning Tasks. I Have Tasks Templates That I Want To Be Able To Clone And Dynamically Change The Package Requirements Required To Run The Said Task. I Have Tried Most Of What I Coul

I think it is only in get_task (and by default it is true)
I think query task does not filter the

8 months ago

0 Hi, Can We Search Tasks Using Wildcard In The Webapp. Say I Have Task Names

Hi SarcasticSparrow10
I think the default search is any partial match, let me check if there is a way to do some regexp / wildcard

4 years ago

0 Hi

Actually what my service do is to collect

stdout/stderr

from the Docker socket

That's exactly how the agent works, it cannot really filter it, it logs everything by default for full visibility ...

one year ago

0 Hey Guys, Anyone Knows What It Means If I Deployed A New Trains Server And When I Access It My Tab Looks Like This?

Hi ColossalAnt7
Try ctrl-F5 and refresh the page?!
It seems you are missing a few buttons 😉

4 years ago

0 Hi All, I Am Starting To Use Clearml-Agent. Run It With

Are you inheriting from their docker file ?

3 years ago

0 Hello Clearml Community, Does Anyone Have An Idea How I Could Integrate/Manager Carla (

LOL I keep typing clara without noticing (maybe it's the nvidia thing I keep thinking about)
Carla makes much more sense 😄

2 years ago

0 Another Question: How Can I Make Clearml-Agent Use Pre-Installed Version From The Nvidia/Pytorch (

ReassuredTiger98 quick update, the issue was located, next RC will already contain a fix.
In the mean time, you can avoid it by using limiting pip version:
https://github.com/allegroai/clearml-agent/blob/715f102f6d98a44131d5bee909ee779b456c6229/docs/clearml.conf#L67
pip_version: "<20.2"

2 years ago

0 The Histograms I Plot With Matplotlib Look Different On Clearml Compared To The Original Plots, How Can I Make Them Look The Same?

SoreDragonfly16
btw: The difference between the two graphs is the ratio pf the graph display , that it 🙂

3 years ago

0 How Can I Ensure Tasks In A Pipeline Have The Same Environment As The Pipeline Itself? It Seems A Bit Counter-Intuitive That The Pipeline (Executed Remotely) Captures The Local Environment, But The Tasks (Executed Remotely) Do Not Use That Same Environmen

what format should I specify it

requirements.txt format e.g. ["package >= 1.2.3"]

Would this enforce that package on various components

This is a per component control, so you can have different packages / containers based on the componnent

Would it then no longer capture import statements?

This is replacing the auto detected packages, but obviously this fails to detect your internal repo package, which is the main issue here.
How is "internal package" installed, in o...

one year ago

0 Hi All, I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?

available agent, i.e. not running anything else.
I mean how long would instance 1 wait until instance 2 of the experiment is up and running?
In other words what happens of all the nodes/agents are working and we still "need" additional instance.
This is basically like "pre-allocating" the nodes, only they wait in real-time until the additional node joins them.
Agent A pulls the 3 node Task, the Task clones itself (Task B) and enqueues on "very high priory queue" Task A wait until Task B is ru...

3 years ago

0 Hi, I Am New Here, Can I Ask Question On Trains-Server Also?

OHH nice, I thought that it just some kind of job queue on up and running machines

It's much more than that, it's a way of life 🙂
But seriously now, it allows you to use any machine as part of your cluster, and send jobs for execution from the web UI (any machine, even just a standalong GPU machine under your desk, or any cloud GPU instance any mixing the two together:)

Maybe I need to change something here:

apiserver.conf

Not sure, I'm still waiting on answer... It...

4 years ago

0 Hi, Together With

(It would be nice to have all the Pypi releases tagged in github btw)

I wanted to say, we listen ... and point to the tag , but for some reason it was not pushed LOL.

4 years ago

0 Hi All. In Upgrading Clearml-Agent On Our Server Because Of:

is it a good "I see" ? 🙂

2 years ago

0 Looking At Clearml-Serving - Two Questions - 1, What’S The Status Of The Project 2. How Does One Say How A Model Is Loaded And Served Etc? For Example, If I Have A Spacy Ner Model, I Need To Specify Some Custom Code Right?

'config.pbtxt' could not be inferred. please provide specific config.pbtxt definition.

This basically means there is no configuration on how to serve the mode, i.e. size/type of lower (input) layer and output layer.
You can wither store the configuration on the creating Task, like is done here:
https://github.com/allegroai/clearml-serving/blob/b5f5d72046f878bd09505606ca1147d93a5df069/examples/keras/keras_mnist.py#L51
Or you can provide it as standalone file when registering the mo...

3 years ago

0 When Using Docker Mode (And Specifically K8S Glue), What Are The Options For Caching? One Option Is Definitely Having A Base Image That Has The Things Needed. Anything Else? Thanks!

One option is definitely having a base image that has the things needed. Anything else? Thanks!

This is a bit complicated, to get the cache to kick in you have to mount an NFS file into the pod as the cache (to create a persistent cache)
Basically, spin NFS pod to store the cache, change the glue job template yaml to mount it into the pod (see default cache folders:
/root/.cache/pip and /root/.clearml/pip-download-cache)
Make sense ?

3 years ago

0 Hello! I Got The Idea Of Publishing Model/Task. But There Could Be Scenarios When It Still Should Be Archived/Deleted. For Instance Death Of Project. Is It Possible To Archive/Delete/Change Status Of Published Task/Model Via Api? Thanks.

Hi ItchyJellyfish73
You can always archive a Task/Model even when published
In the UI you can right-click and choose archive.
From code you need to add a system tag "archived"
from clearml import Task t = Task.get_task(task_id='aabb') t.set_system_tags(t.get_system_tags() + ['archived'])And similarly for Model(model_id='aabb')

3 years ago

0 With

So, what I am referring to is the ability of a system to allow some rigor and robustness of tracking of experiments, and also enforcing some thoughts on how things might be deployed, early on in the development process, whilst not being overly prescriptive and cumbersome

I'm cannot agree more!!
VivaciousPenguin66 We are working on trying to better understand how to solve this very delicate act of balance and offer some sort of "JIRA" for ML.
If this is okay with you, once product pe...

3 years ago

0 Hi, I Would Like To Understand How I Can Set The Pip Cache Location For My Agent, I Thought That I Already Had The Right Setting With

Hi, I would like to understand how I can set the pip cache location for my agent,

ClumsyElephant70 by default the pip cache (and all other cache folders) are mounted back into the host itself ~/.clearml/
I'm assuming the idea is shared cache, if this is the case, do:
docker_pip_cache = ~/my_shared_nfs/pip-cachehttps://github.com/allegroai/clearml-agent/blob/e3e6a1dda81bee2dd20a64d09746568e415f1823/docs/clearml.conf#L139

2 years ago

0 Hi All

In theory, one could go over previously executed tasks, and create a copy of a specific scalar metric.
ShallowCat10 does that make sense in your scenario ?

4 years ago

0 Hi, I'M Trying To Make Use Of New Capabilities Of Dag Creation In Clearml. Seems That Api Has Changed Pretty Much Since A Few Versions Back. There Seems To Be No Need In

but never executes/enqueues them (they are all in

Draft

mode).

All pipeline steps are not enqueued ?
Is the pipeline controller itself running?

3 years ago

0 I Have An On-Prem/Free Clearml-Server Setup With Custom S3 Back-End Storage. I'M Trying Out The Clearml-Serving Capability And Not Sure What'S Failing. When I Start The Serving Containers It Can'T Retrieve The Model:

Is this like a local minio?
What do you have under the sdk/aws/s3 section ?

2 years ago

0 I Am Using Opennmt-Tf (2.18.1) And Clearml (1.1.2) For Training And Testing My Translation Models. I Am Wanting To Register The Incremental Bleu Scores And Final Test Data With Clearml (For Plotting, Comparison, Etc.), But It Is Not Working. I Cannot Fi

From the docs I think what's going on is that the https://opennmt.net/OpenNMT-tf/package/opennmt.Runner.html#opennmt.Runner.train is spinning a new subprocess, and the training itself happens on the subprocess.
If this is the case this will explain the lack of automagic, as the subprocess is lacking the "Task.init" call
wdyt, could that be the case ?

3 years ago

Show more results