AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Questions 49
Answers 8124

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

New Rc For Trains-Agent Is Out

New RC for trains-agent is out pip install trains-agent==0.13.2rc1

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

This Will Close It

This will close it Task.current_task().close()I think we should rename completed() because it just marks the Task as completed on the backend but does not ac...

clearml

4 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Happy Friday Everyone ! We Have A New Repo Release We Would Love To Get Your Feedback On

Happy Friday everyone ! We have a new repo release we would love to get your feedback on 🚀 🎉 Finally easy FRACTIONAL GPU on any NVIDIA GPU 🎊 Run our nvidi...

clearml

one year ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Well To Be Honest, We Kind Of Thought It'S Redundant. Basically Storing Artifacts In Experiments And Having Them Retrieved Quickly From The Code Itself Was Way More Convenient For Us Then To Manually Have To Do Clone/Pull Of The Data... Example: Create Da

Well to be honest, we kind of thought it's redundant. Basically storing artifacts in experiments and having them retrieved quickly from the code itself was w...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Https://M.Facebook.Com/Story.Php?Story_Fbid=2484620658505570&Id=1620822758218702&Refid=52&__Tn__=-R

https://m.facebook.com/story.php?story_fbid=2484620658505570&id=1620822758218702&refid=52&tn=-R

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

<!here> Gals/Guys/:robot_face: If you have ideas on improving the Slack Monitoring service, please add them on the dedicated Github Issue : <https://github.com/allegroai/trains/issues/161> For example: generate an alert if my experiment reaches a certain

Gals/Guys/ :robot_face: If you have ideas on improving the Slack Monitoring service, please add them on the dedicated Github Issue : https://github.com/alleg...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hello Everyone!

clearml

5 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

These Are Xgboost Internal Metrics That Are Automatically Picked By Clearml

@<1523703325881536512:profile|ConvolutedSealion94> these are xgboost internal metrics that are automatically picked by clearml

xgboost

2 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

We Recently Released A New Version Of

we recently released a new version of clearml-session with Persistent Workspace support! 🚀 🎉 Finally you can develop on remote machines with workspace fold...

remote-ssh

one year ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hey <!here> Just a heads up, starting *Jan 25th*, the default <http://demoapp.demo.clear.ml/|ClearML demo server> will move to a *daily* reset cycle (replacing the current weekly cycle). Anybody needing more than 24h data retention is welcome to use our <

Hey Just a heads up, starting Jan 25th , the default http://demoapp.demo.clear.ml/ will move to a daily reset cycle (replacing the current weekly cycle). Any...

clearml

4 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi Guys! I Have Great News, We Finally Fully Implemented Support For Continuing Previously Trained Models

Hi Guys! I have great news, we finally fully implemented support for continuing previously trained models 🎉 Here is a quick example (this is torch, but any ...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

I Would Guess Connectivity Issues, The Tls Is Probably Python Inaccurate Response (I Mean In A Way, It Is Also A Tls Error, But I Would Imagine This Has More To Do With The Actual Network Connection)

I would guess connectivity issues, the TLS is probably python inaccurate response (I mean in a way, it is also a TLS error, but I would imagine this has more...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi Guys/Gals, If You Want To Checkout The Latest Rc We Have 0.15.0Rc0 Out :

Hi Guys/Gals, If you want to checkout the latest RC we have 0.15.0rc0 out : pip install trains==0.15.0rc0 pip install trains-agent==0.15.0rc0Many of the impr...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

<https://allegro.ai/docs>

https://allegro.ai/docs

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of Trains :smile_cat: ) <https://twitter.com/PyTorch/status/1272919483980500999>

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

:confetti_ball: :champagne: Happy new year <!everyone>! :fireworks: :sparkler: We wanted to thank you all for the great feedback, contribution and general support you guys give us. It is truly fulfilling to see users enjoying the product you build, and y

🎊 🍾 Happy new year ! 🎆 🎇 We wanted to thank you all for the great feedback, contribution and general support you guys give us. It is truly fulfilling to ...

clearml

4 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Apparently Everyone Can ...

apparently everyone can ...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi Gals / :robot_face: / Guys <!here> Quick update, we will be updating the GitHub repository tomorrow with the new ClearML version, together with the accompanying python packages, ETA Noon(ish) PT time. `trains` , `trains-agent` and the docker images a

Hi Gals / :robot_face: / Guys Quick update, we will be updating the GitHub repository tomorrow with the new ClearML version, together with the accompanying p...

clearml

4 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

@YummyWhale40 you are saying the example code is not working when running with the demo server? Also I think I was able to view your experiment on the demo server, and do get the Scalars without any issues...

YummyWhale40 you are saying the example code is not working when running with the demo server? Also I think I was able to view your experiment on the demo se...

clearml

5 years ago

Show more results

0 I’M Trying To Use

But these changes haven’t necessarily been merged into main. The correct behavior would be to use the forked repo.

So I would expect the agent to pull from your fork, is that correct? is that what you want to happen ?

4 years ago

0 Hello! I Think I'Ve Found A Bug, But Couldn'T Fix It Completely To Make A Pull Request. I Want To Optimizer Hyperparameters With Trains.Automation But:

PungentLouse55 could you test again with the latest from the GitHub? I think the issue should be solved:
pip install git+

4 years ago

0 Sorry Folks Too Many Questions - If I Have A Project (And I Set The Output Uri In It While Creating, To A S3 Folder) How Can I Ensure That A Experiment (Task) That I Run On My Local Outputs The Model To The Uri?

Yes, it's a bit confusing, the gist of it is that we wanted to have the ability to have diff configurations for diff buckets

4 years ago

0 I'M Looking At How Triggers Work In Clearml. Is There An Example, Maybe With Clearml Data And A Dataset Being Uploaded Or Some Other Example?

Also could you explain the difference between trigger.start() and trigger.start_remotely()

Start will start the trigger process (the one "watching the changes") locally (this makes sense for debugging etc.)
start_remotely will launch the trigger process on the "services" where it should live forever 🙂

Okay so when I add trigger_on_tags, the repetition issue is resolved.

Nice!

This problem occurs when I'm scheduling a task. Copies of the task keep being put on the queue ...

3 years ago

0 Hi Everybody, I’M Getting Errors With Automatic Model Logging On Pytorch (Running On A Dockered Agent).

CrookedWalrus33 I'm testing with the latest RC on a local minio and this is what I'm getting:
clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_3by281j8.tmp => 10.99.0.188:9000/bucket/debug/PyTorch MNIST train.8b6edc440cde4469b82e6da17e74c952/models/mnist_cnn.tar clearml.Task - INFO - Waiting to finish uploads clearml.Task - INFO - Completed model upload to MNIST train.8b6edc440cde4469b82e6da17e74c952/models/mnist_cnn.tar clearml.Task - INFO - Finished uploadinge...

3 years ago

0 Ok, Next Question, I'Ve Got Some Training Args That I'D Like To Manually Upload And Have Them Show Up In The Attached Place, Under Configuration. It Is A Huggingface Trainingarguments Object, Which Has A To_Dict() And To_Json Function

Task.current_task().connect(training_args, name='hugggingface args')And you should be able to change them when launching remotely 😉
SmallDeer34 btw: "set_parameters_as_dict" will replace all the arguments (and is one way) ...

4 years ago

0 Clearml Pipelines Can Be Build From Tasks, Functions, And Decorated Functions, According To The Examples In

@<1523704157695905792:profile|VivaciousBadger56>

Is the idea here the following? You want to use inversion-of-control such that I provide a function

f

to a component that takes the above dict an an input. Then I can do whatever I like inside the function

f

and return a different dict as output. If the output dict of

f

changes, the component is rerun; otherwise, the old output of the component is used?

Yes exactly ! this way you...

2 years ago

0 Hi There, It Seems Like There Is A Bug With The Visualization Of Debug Samples On The Ui (Server V1.2.0, Self-Hosted): When Clicking On A Debug Sample Then On The Download Button, If The Sample Is Stored In S3, The Download Button Opens A Blank Page With

fyi: hot fix for 1.3.0 (smoothing graphs) was just released see v1.3.1

I am actually considering rolling back to 1.1.0,

Can you share why?
JitteryCoyote63 notice from the release notes of 1.2:

Important Note!
This release requires a MongoDB migration from previous versions. Please see

for more information.

I'm not sure you can downgrade that easily ...

3 years ago

0 Hi, I Would Like To Check What Would Be The Recommended Hardware Specs For The Server Host Clearml Server. I Had One Configured With 32 Cpu Cores, 64Gb Ram And I Noticed That If We Have A Surge In Remote Task Creation, The Following Delays Occurs.

no worries

4 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

TRAINS_WORKER_NAME=first_agent trains-agent --gpus 0
and
TRAINS_WORKER_NAME=second_agent trains-agent --gpus 0

5 years ago

0 We Had A Problem Moving From Google Drive To Bucket Storage (S3, Google Storage, Etc.) In That We Still Wanted To Be Able To Mount The Bucket As A Network Drive. We Were Able To Find A Stable, Free, Open Source, Multiplatform Way To Do This. The Instruc

LOL that's the spirit , making your team happy is key to success in adoption 🙂

4 years ago

0 Hey All, I'M Testing The Usage Of

BoredHedgehog47 if you are running it on K8s, then the setup script is running before everything else, even before an agent appears on the machine, unfortunately this means the output is not logged yet, hence the missing console lines (I think the next version of the glue will fix that)
In order to test you can do:
export TEST_MEthen inside your code you will be able to see it
os.environ['TEST_ME']Make sense ?

2 years ago

0 Hi All—First Off, Thanks For Being Such A Helpful And Thorough Group Of People. I Learn A Ton Just Searching Through The Channel For Problems. I’M Seeing A Weird Issue. I Have A Conda Env On My Linux Machine, And I Can Successfully Run A Training Script

I can't seem to find a difference between the two, why would matplotlib get listed and pandas does not... Any other package that is missing?
BTW: as an immediate "hack" , before your Task.init call add the following:
Task.add_requirements("pandas")

3 years ago

pip freeze | grep pandas

3 years ago

0 I'D Like The Console In A Clearml Run To Show Only The Stdout/Stderr As It Does Now, But I'D Also Like Clearml To Capture Debug Level Logs. Is There An Easy Around This? It Would Be Nice If One Could E.G. Set

No it will not 😞 the closer is closer to the actual print.
That said, I'm sure it would not be complicated to add.
But I have to wonder, this will really create a mess in the console log, so if someone wants it, it will be global (i.e. also in the visible console. not only in the backend), so the case where the console on the machine itself is "clean" but the backend log is full of debug stuff is not clear to me

3 years ago

0 Hey, How Can I Add A Private Key In Order To Let The Clearml Agent To Clone From A Private Git Repository?

If it cannot find the Task ID I'm guessing it is trying to connect to the demo server and not your server (i.e. configuration is missing)

4 years ago

0 Hi

Yes it does. I'm assuming each job is launched using a multiprocessing.Pool (which translates into a sub process). Let me see if I can reproduce this behavior.

4 years ago

0 Hi, I'Ve Got A Quick Question About

connect_configuration

seems to take about the same amount of time unfortunately!

I think it is a better solution, that said from your description it sounds the issue is the upload bandwidth (i.e. json-ing the dict itself), could that be it?
(and even 1000 entries seems like something that would end up at 1mb upload, that is not that much)

3 years ago

0 I Hit A Issue That I Cannot See My Matplotlib Plot, But It Was Shown In The Panel. Any Idea?

EnviousStarfish54

and the 8 charts are actually identical

Are you plotting the same plot 8 times?

5 years ago

0 I Am Also Experiencing A Weird Behaviour When Running A Script Using The Module Flag. For Example I Run:

BTW: how did it get there ?

4 years ago

0 I Am Also Experiencing A Weird Behaviour When Running A Script Using The Module Flag. For Example I Run:

Sure

4 years ago

0 I Am Also Experiencing A Weird Behaviour When Running A Script Using The Module Flag. For Example I Run:

Assuming git repo looks something like:
.git readme.txt module | +---- script.pyThe working directory should be "."
The script path should be: "-m module.scipt"
And under the Configuration/Args, you should have:
args1 = value args2 = another_value
Make sense?

4 years ago

0 I Got An Interesting Question From My Devs. If They Wish To Do Distributed Training, Is Clearml K8S Glue Suitable For It? Local Multiple Gpu: Just A Matter Of Assigning More Than One Gpu In The Yaml File Sent To The K8S Glue. Question Is How To Make This

HI SubstantialElk6
Yes you are correct the glue only needs to change the yaml and it will work.
When you say "Dev end" , what do you mean? I was thinking adding additional glue for multi node and just adding queues , for example add 4nodes queue and attach a glue to it, wdyt?
Regrading horovod, horovod is spinning its own nodes so integration with k8s is not trivial (regardless of ClearML). That said I know that they do have support for horovod in the Enterprise edition, but I'm not sure ...

4 years ago

0 Hey, Don'T Really Understand Why The Clearml Worker Needs To Pull The Repository Where My Pipeline (Defined With Decorators) Is Written Is Since Apparently A Temporary Python File (Containing At Least The Code And Imports For The Executed Component) Seems

This is odd I was running the example code from:
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
It is stored inside a repo, but the steps that are created (i.e. checking the Task that is created) do not have any repo linked to them.
What's the difference ?

2 years ago

It can also work by running on multiple known nodes.

Horovod sits on top of openmpi that needs ssh to open multiple nodes, I'm not sure how one would connect it without passing the SSH keys from one node to the other, and making sure they can directly communicate. (Not saying it is not possible, but just a few things to configure before it works, the enterprise edition remove the need for the direct SSH connection between the nodes)

How would i add a glue for multinode?

Basic...

4 years ago

0 Hi, I Shifted My Clearml Setup To An On-Premise Disconnected Env, Which Has A Pip Repo Setup. I Noted This Warning,

SubstantialElk6 could you post "Installed packaged" section under Execution of this specific Task?

4 years ago

0 Hi, Is There Any Documentation For Setting Up And Using Ssl Certs With The Clearml Server And Agent?

HI @<1687643893996195840:profile|RoundCat60>
Are you running on AWS ?

4 years ago

0 Hi, Is There Any Documentation For Setting Up And Using Ssl Certs With The Clearml Server And Agent?

So assuming they are all on the same LB IP: You should do:
LB 8080 (https) -> instance 8080
LB 8008 (https) -> instance 8008
LB 8081 (https) -> instance 8081

It might also work with:
LB 443 (https) -> instance 8080

4 years ago

0 Hi, Is There Any Documentation For Setting Up And Using Ssl Certs With The Clearml Server And Agent?

We're not using a load balancer at the moment.

The easiest way is to add ELB and have amazon add the httpS on top (basically a few clicks on their console)

4 years ago

0 2. I Have A Local Postgresql And Datafetcher Class, Whats The Best Way To Reuse Same Datafetcher In Local Runs With Pipeline. Is It Possible?

Hmm I would recommend passing it as an artifact, or returning it's value from the decorated pipeline function. Wdyt?

one year ago

Show more results