AgitatedDove14

49 Questions, 8122 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Questions 49
Answers 8122

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

New Rc For Trains-Agent Is Out

New RC for trains-agent is out pip install trains-agent==0.13.2rc1

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of Trains :smile_cat: ) <https://twitter.com/PyTorch/status/1272919483980500999>

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hey <!here> Just a heads up, starting *Jan 25th*, the default <http://demoapp.demo.clear.ml/|ClearML demo server> will move to a *daily* reset cycle (replacing the current weekly cycle). Anybody needing more than 24h data retention is welcome to use our <

Hey Just a heads up, starting Jan 25th , the default http://demoapp.demo.clear.ml/ will move to a daily reset cycle (replacing the current weekly cycle). Any...

clearml

4 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi Guys/Gals, If You Want To Checkout The Latest Rc We Have 0.15.0Rc0 Out :

Hi Guys/Gals, If you want to checkout the latest RC we have 0.15.0rc0 out : pip install trains==0.15.0rc0 pip install trains-agent==0.15.0rc0Many of the impr...

clearml

5 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Quick Note: V1.3.1 Caused Pipelinedecorator Tasks To By Default Disable The Automagic Frameworks Connection, This Bug Is Solved In The Latest Rc

Quick note: v1.3.1 caused PipelineDecorator Tasks to by default disable the automagic frameworks connection, this bug is solved in the latest RC pip install ...

clearml

3 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

@YummyWhale40 you are saying the example code is not working when running with the demo server? Also I think I was able to view your experiment on the demo server, and do get the Scalars without any issues...

YummyWhale40 you are saying the example code is not working when running with the demo server? Also I think I was able to view your experiment on the demo se...

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi , v0.15 is out, 🎉 🚀 Your feedback had a major influence on the features we added 🙂 thank you! A selected list of features: Column resizing / ordering /...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

:confetti_ball: :champagne: Happy new year <!everyone>! :fireworks: :sparkler: We wanted to thank you all for the great feedback, contribution and general support you guys give us. It is truly fulfilling to see users enjoying the product you build, and y

🎊 🍾 Happy new year ! 🎆 🎇 We wanted to thank you all for the great feedback, contribution and general support you guys give us. It is truly fulfilling to ...

clearml

4 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

<!channel> *important notice* : it seems Nvidia broke some of their PPA's security :confused: , causing `apt-get updates` to fail inside containers. This in term will cause `clearml-agent` to fail with specific Nvidia containers. _If you are seeing simila

important notice : it seems Nvidia broke some of their PPA's security 😕 , causing apt-get updates to fail inside containers. This in term will cause clearml...

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi ! trains 0.16.2 is finally out with the new pipelines interface! Check out the new example https://github.com/allegroai/trains/blob/master/examples/pipeli...

clearml

4 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi Guys! I Have Great News, We Finally Fully Implemented Support For Continuing Previously Trained Models

Hi Guys! I have great news, we finally fully implemented support for continuing previously trained models 🎉 Here is a quick example (this is torch, but any ...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Finally

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Https://M.Facebook.Com/Story.Php?Story_Fbid=2484620658505570&Id=1620822758218702&Refid=52&__Tn__=-R

https://m.facebook.com/story.php?story_fbid=2484620658505570&id=1620822758218702&refid=52&tn=-R

clearml

5 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Lstmeow Is Back! Bots/Gals/Guys Feel Free To

LSTMeow is back! Bots/Gals/Guys feel free to 👍 None

clearml

5 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

There Is No V1.0 Release Without A Prompt V1.0.1 Following It, And We Are No Different

🙏 There is no v1.0 release without a prompt v1.0.1 following it, and we are no different 😊 pip install clearml==1.0.1

clearml

4 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

@PunySquid88 I'm not very familiar with what they do, but it seems that although it has a backend server as an option, it will mostly target single users with what seems like an easy to use single app. From the Reddit thread it seems that it is still not

PunySquid88 I'm not very familiar with what they do, but it seems that although it has a backend server as an option, it will mostly target single users with...

clearml

5 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi ! ClearML Server + SDK v1.9.0 is out! 🎉 🚀 🎊 Happy Holidays and Happy New Year! ❇️ 🎇 🎄

clearml

2 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Happy Friday Everyone ! We Have A New Repo Release We Would Love To Get Your Feedback On

Happy Friday everyone ! We have a new repo release we would love to get your feedback on 🚀 🎉 Finally easy FRACTIONAL GPU on any NVIDIA GPU 🎊 Run our nvidi...

clearml

one year ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

We Are At Aaai Ny, Come Look Us Up :)

We are at AAAI NY, come look us up :)

clearml

5 years ago

Show more results

0 Any Info On The Lifecycle Of Datasets Downloaded To $Home/.Clearml/Cache/Storage_Manager/Datasets Via Get_Local_Copy I Have A Task Running And I Was Watching The Above Path And Datasets Were Being Downloaded And Then They Are All Removed And For A Partic

So was definitely related to the symlinks in some form

could it be it actually deleted the cache? How many agents are running on the same machine ?

4 years ago

0 I Wanted To Ask About K8S + Clearml-Agent Integration. Details In The Thread.

Basically two options, spin the clearml-k8s-glue, as a k8s service.
This service takes clearml jobs and creates k8s job on your cluster.
The second option is to spin agents inside pods statically, then inside the pods the agent work in venv model.
I know the enterprise edition has more sophisticated k8s integration where the glue also retains the clearml scheduling capabilities.
https://github.com/allegroai/clearml-agent/#kubernetes-integration-optional

3 years ago

0 Hi All! I Have A Couple Of Things That Are Not Completely Clear To Me, Hope You Can Help Me To Sort Them Out.

yes, looks like. Is it possible?

Sounds odd...
Whats the exact project/task name?
And what is the output_uri?

4 years ago

0 Hi, Expanding On

Thanks DeliciousBluewhale87 ! greatly appreciated 🙂

4 years ago

0 Hi Team! Is There A Way To Make Clearml’S Aws Autoscaler And Queues Resource-Aware Please? I.E. If We Can Say, As We Enqueue Our Job, How Much Ram Or Gpu-Ram Or Even Gpus It Needs, Have The Scheduler/Autoscaler Dispatch The Job To Instances That Are Of Th

because a pipeline is composed of multiple tasks, different tasks in the pipeline could run on different machines.

Yes!

. Or more specifically, they could run on different queues, and as you said, in your other response, we could have a Q for smaller CPU-based instances, and another queue larger GPU-based instances.

Exactly !

I like the idea of having a queue dedicated to CPU-based instances that has multiple agents running on it simultaneously. Like maybe four agents.

Th...

2 years ago

0 Hello! I'M Just Starting Out With Clearml, And I Seem To Be Having Some Sort Of Conflict Between

SmallDeer34 No worries, I'm happy to hear the issue disappeared 🙂

4 years ago

Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance)

This is essentially a "queue". Basically a queue is a way to abstract a specific type of resource, so that you can achieve exactly what you descibed.

open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake).

Yes, that's exactly how clearml is designed, a...

2 years ago

0 Prev, I Worked With Clearml (1 Year Back) And Back Then, We Config Seldon Core For The Deployment And Clearml For The Training.. Now There Is Clearml-Serving, Does It And Can It Fulfill A Similar Objective ?

Hi DeliciousBluewhale87
This is the latest clearml-serving (stable release at GTC at the end of the month)
https://github.com/allegroai/clearml-serving/tree/dev

Generally speaking, clearml-sering is a control plane, preprocessing, ML inference, with Nvidia Triton for DL inference (fully transparent).
It allows you to spin an entire fully dynamic & scalable serving on top of k8s cluster. Once you spin the base containers, you can configure them live with a CLI, this includes adding new en...

3 years ago

Yes you can 🙂 (though not on the open-source version)

2 years ago

0 Hello, I Have A Problem With Task.Set_Initial_Iteration(0) In Google Colab. After Continuing The Experiment, Gaps Appear On My Graph, But If You Use Colab. I Tried It On My Computer And Everything Is Normal There.

Okay I think I know what's going on (there is a race that for some reason on CoLab acts differently).
As a quick hack you can do the following:
Task._report_subprocess_enabled = False task = Task.init(...) task.set_initial_iteration(0)

3 years ago

0 Is It Possible To Restrict An Agent'S Cpu Usage? Like Limit The Number Of Cores It Can Use?

Hi ElegantCoyote26
If there is, it will have to be using the docker-mode, but I do not think this is actually possible because this is not a feature of docker. It is possible to do on k8s, but that's a diff level of integration 🙂
EDIT:
FYI we do support k8s integration

4 years ago

0 Hi There, There Seems To Be An Issue In The Web Ui -> Viewing Plots In "View In Experiment Table" Doesn'T Respect The "Scalars To Display" One Sets When Viewing In "View In Fullscreen". Is This A Bug Or Expected Behaviour?

Feel free to add to the UI request list:
https://github.com/allegroai/trains/issues/81

5 years ago

0 Hi, I Have A Question Regarding

but maybe hyperparam aborts in those cases?

from the hyperparam perspective it will be trying to optimize the global minimum, basically "ignoring" the last value reported. Does that make sense ?

2 years ago

0 Hi Guys, Any Plan To Integrate The

We already redesigned the implementation so it should be quite easy to extend to GCP and Azure, what are you planning ?

5 years ago

0 Hi, I Updated My Project To Use Hydra 1.1 And I Realized That The Params Are Not Being Saved Anymore. Is There Any Known Incompatibility Between The Current Version Of Clearml And Hydra 1.1?

Can you verify this example is not working for you?
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py

4 years ago

But from your other answer, I think I'm understanding that you

can

have multiple agents on a single instance listening to the same queue.

Correct

So we could maybe initialize 4 instances of the agent on a single EC2 instance which would allow us to handle a higher volume of small batches concurrently without tying up the entire instance.

Correct (that said I do not understand how come a single Task does not utilize the CPU, I was under the impression it is run...

2 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

Yes, that seems to be the case. That said they should have different worker IDs agent-0 and agent-1 ...
What's your trains-agent version ?

5 years ago

0 Hi All, I Am Testing The New

👍

3 years ago

0 Sorry Folks Too Many Questions - If I Have A Project (And I Set The Output Uri In It While Creating, To A S3 Folder) How Can I Ensure That A Experiment (Task) That I Run On My Local Outputs The Model To The Uri?

Let me chech

4 years ago

0 Hey All, Quick Question About Pipeline Execution Queues. I Set The

Hi ObedientDolphin41

However, all of the pipelines tasks are ran on the same queue. Could I be missing something?

The pipeline Task itself is running on a dedicated queue (meaning agent/s) usually because the pipeline logic is mostly idling, where as the components themselves are doing the actual compute.
Specifically you can control the pipeline logic queue with pipeline_execution_queue
https://github.com/allegroai/clearml/blob/7016138c849a4f8d0b4d296b319e0b23a1b7bd9e/clearm...

2 years ago

0 Clearml (Remote Execution) Sometimes Doesn'T "Pick-Up" Gpu. After I Rerun The Task It Picks It Up. Seems Random, Doesn'T Happen Too Often (Maybe Once In 30-40 Times) And I Cannot Seem To Detect Any Pattern. Did Anyone Else Notice This? Agents Are Vms On G

This smells like a driver/image issue on the instance VM
What are you getting if add this inside your code?

os.system('nvidia-smi')

one year ago

0 Hello! Getting Credential Errors When Attempting To Pip Install Transformers From Git Repo, On A Gpu Queue.

So what is the difference?!

4 years ago

0 Hello! Getting Credential Errors When Attempting To Pip Install Transformers From Git Repo, On A Gpu Queue.

1e876021bbef49a291d66ac9a2270705 just make sure you reset it 🙂

4 years ago

0 Hello! I Have An Issue Reproducing My Runs. The Task.Create Completes Successfully. When I Clone And Enqueue A Completed Task The Clone Fails. It Fails During The Python Requirements Installation. Why Is This? Do You Know How I Can Debug? Thank You In Adv

How are you getting:

beautifulsoup4 @ file:///croot/beautifulsoup4-split_1681493039619/work

is this what you had on the Original manual execution ? (i.e. not the one executed by the agent) - you can also look under "org _pip" dropdown in the "installed packages" of the failed Task

one year ago

0 Hi All, I Am Getting A Bunch Of This Kind Of Log Messages "Clearml.Storage - Info - Starting Upload: /Tmp/.Clearml.Upload_Model_6Ou50Pb1.Tmp =>" I Am Pretty Sure They Happen As A Part Of The Model Initialization About 10 Of Those, My Guess Is That Every T

You can see the class here:
https://github.com/allegroai/clearml/blob/9b962bae4b1ccc448e1807e1688fe193454c1da1/clearml/binding/frameworks/init.py#L52

Basically you do:
` def my_callback(load_or_save, model):
# type: (str, WeightsFileHandler.ModelInfo) -> WeightsFileHandler.ModelInfo
assert load_or_save not in ('load', 'save')
# do something
if skip:
return None
return model

WeightsFileHandler.add_pre_callback(my_callback) `

4 years ago

RipeGoose2 models are automatically registered
i.e. added to the models artifactory, but it only points to where the files are stored
Only if you are passing the output_uri argument to the Task.init, they will be actually uploaded.
If you want to disable this behavior you can pass
Task.init(..., auto_connect_frameworks={'pytorch': False})

4 years ago

0 Hey All, Hope You’Re All Doing Well. I’M Running A Self-Deployed Server (0.17, I Think, Where Can You Find The Version In Use?). I’M Having Trouble With The Automatic Plot Capture. If I Run

Okay good news, there is a fix, bad news, sync to GitHub will only be tomorrow

4 years ago

Hmm so there is a way to add callbacks (somewhat cumbersome, and we would live feedback) so you can filter them out.
What do you think, would that work?

4 years ago

0 Hi. I Somehow Managed To Exceed The Metrics Quota By ~35Gb. I Logged Some Histograms, But Still That Seems Excessive. Now I Am Trying To Delete Archived Experiments With The Cleanup Service, But Some Tasks Cannot Be Deleted:

Task deletion failed: unhashable type: 'dict'Hi FlutteringWorm14 trying to figure where this is coming from, give me a sec

2 years ago

0 Hey All, Hope You’Re All Doing Well. I’M Running A Self-Deployed Server (0.17, I Think, Where Can You Find The Version In Use?). I’M Having Trouble With The Automatic Plot Capture. If I Run

Hmm that is odd, but at least we have a workaround 🙂
What's the matplotlib backend ?

4 years ago

Show more results