AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 When Use Gcp Bucket As Files_Server + Yolov5 Train For Now Its Upload The Model In The End To

so other process can use it

This is why there is a model repository, so you can query the last model created, or by name or tag or query the Task that created it and then via the Task the model and it's location.
This is a stable way to make sure your application code (the one using the model) will get to use stable models regardless of the training processes.
I would add a Tag to the model and then search based on the project and the tag, wdyt?

one year ago

0 I Got An Interesting Question From My Devs. If They Wish To Do Distributed Training, Is Clearml K8S Glue Suitable For It? Local Multiple Gpu: Just A Matter Of Assigning More Than One Gpu In The Yaml File Sent To The K8S Glue. Question Is How To Make This

It can also work by running on multiple known nodes.

Horovod sits on top of openmpi that needs ssh to open multiple nodes, I'm not sure how one would connect it without passing the SSH keys from one node to the other, and making sure they can directly communicate. (Not saying it is not possible, but just a few things to configure before it works, the enterprise edition remove the need for the direct SSH connection between the nodes)

How would i add a glue for multinode?

Basic...

3 years ago

0 Hi! Is There A Simple Way To Visualize Tensors In Clearml? Something Like Tensorboard'S Tsne Or Pca...

FrustratingWalrus87 If you need active one, I think there is currently no alternative to TB tSNE 🙂 it is truly great 🙂
That said you can use plotly for the graph:
https://plotly.com/python/t-sne-and-umap-projections/#project-data-into-3d-with-tsne-and-pxscatter3d
and report it to ClearML with Logger report_plotly :
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/examples/reporting/plotly_reporting.py#L20

3 years ago

0 Hello,

Is this reproducible ?

2 years ago

0 Hello,

Hi WickedElephant66

Setting the pipeline controller with pipeline_execution_queue as None

is actually launching the pipeline controller on your "dev" machine, not sure why you have two of them?

2 years ago

0 I Have A Second Question As Well, Is It Possible To Disable Any Parts Of The Automagical Logging? In My Project I Use Both Config And Argparse. It Works By Giving Path To A Config File As A Console Argument And Then Allow The User To Adjust Values With Mo

Hi UnsightlyShark53 , just a quick FYI, you can also log the entire config file config.json this will be stored as model configuration, and you can see it in the input/output models under the artifacts tab.
See example here you can path either the path to the configuration file, or the dictionary itself after you loaded the json, whatever is more convenient :)

4 years ago

UnsightlyShark53 See if this one solves the problem :)
BTW: the reasoning for the message is that when running the task with "trains-agent" if the parsing of the argparser happens before the the Task is initialized, the patching code doesn't know if it supposed to override the values. But this scenario was fixed a long time ago, and I think the error was mistakenly left behind...

4 years ago

0 Hi Again. As I Am Running My Experiment From Server Using Agent, I Am Failing On The Point, Where The Arguments Of Argparse Are Processed. When Is The Agent Task Registered. I Am Getting None For Task.Current_Task() At The Begining Of My Script.

Oh, did you try task.connect_configuration
?
https://allegro.ai/docs/examples/reporting/model_config/#using-a-configuration-file

4 years ago

0 Hi Team! Is There A Way To Make Clearml’S Aws Autoscaler And Queues Resource-Aware Please? I.E. If We Can Say, As We Enqueue Our Job, How Much Ram Or Gpu-Ram Or Even Gpus It Needs, Have The Scheduler/Autoscaler Dispatch The Job To Instances That Are Of Th

Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance)

This is essentially a "queue". Basically a queue is a way to abstract a specific type of resource, so that you can achieve exactly what you descibed.

open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake).

Yes, that's exactly how clearml is designed, a...

one year ago

0 Hi We Just Got The Aws Autoscaler To Create A New Instance When You Enqueue A Task To The Relevant Queue. However, For Some Reason The Task Itself Is Never Run, It Stays In The Pending State. When Looking At The Worker Details, It Says "No Queues Curren

When looking at the worker details, it says "No queues currently assigned to this worker"

Yes, I think we should have better information there, the "AWS service" is not directly pulling jobs from any specific queue, hence nothing there. It is "listening" to queues and launching machines, those machines will be listening to the queue. I wonder if it is just easier to also make sure it is listed as "assigned" to those queues . wdyt?

one year ago

(i.e. importing the trains package is enough to patch the argparser, only when you call the task.init the arguments will be logged, before they are stored in memory)

4 years ago

Yes you can 🙂 (though not on the open-source version)

one year ago

0 Task Struck At

it was uploading fine for most of the day

What do you mean by uploading fine most of the day ? are you suggesting the upload stuck to the GS ? are you seeing the other metrics (scalars console logs etc) ?

one year ago

Hi WorriedParrot51
Assuming you run the code "manually" once (i.e. without the agent). Then when you call Task.init it will register the argparser.
When running with the agent, the first time you will call parse, it will automatically override the argparse defaults with the values stored in the Task.
Make sesne?

am getting None for Task.current_task() at the beginning of my script.

Task.init() is doing the magic , only after this call you will have current_task (either running manua...

4 years ago

0 Hi, I Am New Here. I Was Wondering Where Can I Configure Which Machines Trains (Or Trains-Agent?) Use For Queueing Tasks, And How Do I Create Such Queues. Thanks.

We are here if you need further help 🙂

4 years ago

0 In My Requirement.Txt File I Have Modules Installed From The Same Repository, I.E., I Have Lines Such As:

Task.add_requirements does not handle it (traceback in the thread). Any suggestions?

Hmm that is a good point maybe we should fix that 🙂
I'm assuming someone already created this module? Or is it part of the repository?
(if it than the assume this executed from the git root)

2 years ago

0 In My Requirement.Txt File I Have Modules Installed From The Same Repository, I.E., I Have Lines Such As:

understood, can you try
Task.add_requirements("-e path/to/folder/")

2 years ago

0 Hi All—First Off, Thanks For Being Such A Helpful And Thorough Group Of People. I Learn A Ton Just Searching Through The Channel For Problems. I’M Seeing A Weird Issue. I Have A Conda Env On My Linux Machine, And I Can Successfully Run A Training Script

I think the main issue is running with python -m module.name --args
Which is a bit different, when trying to "understand" what is the actual repository.
Can you try to run it from the repository folder (same command, just to see if it will have any effect on the detected packages)

3 years ago

0 Hey, Trying To Figure Out How To Create An

BTW: what happens if you pass the same s3://bucket to Task.init output_uri ? I assume you are getting the same access issue ?

one year ago

Hi UnsightlyShark53 apologies for this delayed reply, slack doesn't alert users unless you add @ , so things sometimes get lost :(
I think you pointed at the correct culprit...
Did you manage to overcome the circular include?
BTW , how could I reproduce it? It will be nice if we could solve it

4 years ago

0 Hi Guys, I Am Having Some Trouble Running Some Training Scripts With The Agent Functionality:

Martin, if you want, feel free to add your answer in the stackoverflow so that I can mark it as a solution.

Will do 🙂 give me 5

2 years ago

0 Hi All, Idk Whether This Is The Right Channel To Ask This Question My Team Are Using Clearml As Part Of Our Development Process. We Wish To Use Clearml-Serve As Well To Deploy Our Services To K8S; However, As Clearml-Serve Is Not Mature Enough, We’Ll Wai

Hi GrittyKangaroo27
Maybe check the TriggerScheduler , and have a function trigger something on k8s every time you "publish" a model?
https://github.com/allegroai/clearml/blob/master/examples/scheduler/trigger_example.py

2 years ago

HI SubstantialElk6
Yes you are correct the glue only needs to change the yaml and it will work.
When you say "Dev end" , what do you mean? I was thinking adding additional glue for multi node and just adding queues , for example add 4nodes queue and attach a glue to it, wdyt?
Regrading horovod, horovod is spinning its own nodes so integration with k8s is not trivial (regardless of ClearML). That said I know that they do have support for horovod in the Enterprise edition, but I'm not sure ...

3 years ago

0 Task Struck At

I think this was the issue: None
And that caused TF binding to skip logging the scalars and from that point it broke the iteration numbering and so on.

one year ago

0 Hi Everyone! I'M Trying To Upload Roc Figure From Matplotlib To Clearml. Unfortunately Clearml Adds Invalid Legend Item To The Plot As You Can See On The Attached Image. Is There Any Way To Hide This Junk?

Hi SpicyOtter88
plt.plot([0, 1], [0, 1], 'r--', label='')ti cannot have a legend without a label, so it gives it "anonymous" label, I think it should just get "unlabeled 0" wdyt?

one year ago

Hmm I see your point.
Any chance you can open a github issue with a small code snippet to make sure we can reproduce and fix it?

one year ago

0 I Wanted To Ask About Html Reporting, If I Want To Do A More Fancy Visualization (Like Overlay Of Two Images Depending On Mouse Hovering), I Have To Inject This Html Into The Reporting Code, Right? I Am Asking, As Perhaps It Is Possible To Have Custom Wid

(I am not an expert on UI to be honest)

Same here 🙂 lol

we can implement this externally

What do you mean by that?

3 years ago

Hmm, it is not returned, it is inside the function....

3 years ago

HealthyStarfish45 you mean as in RestAPI ?

3 years ago

0 How Can I Add My Requirements.Txt File To The Pipeline Instead Of Each Tasks?

Yes exactly like a Task (pipeline is a type of task)
'''
clonedpipeline=Task.clone(pipeline_uid_here)
Task.enqueue(...)
'''

one year ago

Show more results