AgitatedDove14

49 Questions, 8126 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8126

0 Hi Everybody, I'M Running Experiments Inside A Docker Which Includes Multiple Python Instances, Some Of Them Are Inside Conda Environments. How Can I Specify The Agent To Use A Specific Conda Environment Inside The Docker?

CrookedWalrus33 can you send the entire log? (you can DM it to me)

3 years ago

0 Hi Everyone! Is Anybody Using Log-Scale Parameter Ranges For Hyper-Parameter Optimization? It Seems That There Is A Bug In The Hpbandster Module. I'M Getting Negative Learning Rates..

` from clearml.automation.parameters import LogUniformParameterRange
sampler = LogUniformParameterRange(name='test', min_value=-3.0, max_value=1.0, step_size=0.5)
sampler.to_list()

Out[2]:
[{'test': 1.0},
{'test': 3.1622776601683795},
{'test': 10.0},
{'test': 31.622776601683793},
{'test': 100.0},
{'test': 316.22776601683796},
{'test': 1000.0},
{'test': 3162.2776601683795}] `

3 years ago

0 Hello, "In The Last Period I Pushed To Adopt Clearml Company Wide As It Is A Great Tool. We Actually Have A Data Center And All Nodes Are Managed By Rancher Meaning, Everything We Use Is Purely Kubernetes Stuff. I Deployed Clearml Server In Our

For the on-prem you can check the k8s helm charts it case spin agents for you (static agents).
For the GKE the best solution is the k8s glue:
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py

4 years ago

0 Is The

WackyRabbit7 this is funny, it is not ClearML providing this offering
some generic company grabbed the open-source and put t there, which they should not 🙂

3 years ago

0 Hi! I'M Currently Saving A Dataframe With Predictions Inside The Task. To Do So, I Save A Dataframe As Pickle File In

MuddySquid7
are you saying that for some reason the models pick the artifacts ? Is that reproducible ? (they are two different things)
Can you see the df.pkl on the Models section of the Task (in the UI) ?

4 years ago

0 Hi Guys, I Have Many Questions To Ask, Sorry If This Questions Were Posted Already - If The Answer Exist, Please, Point Me To It. Thank You For Your Help. I'M Training Object Detection Model Using Tf 2.3 Object Detection Api And Use Clearml On Local Serve

This should have worked with the latest clearml RC.
And you verified it is not working?

4 years ago

0 Hi, If I Am Starting My Training With The Following Command:

If I call explicitly

task.get_logger().report_scalar("test", str(parse_args.local_rank), 1., 0)

, this will log as expected one value per process, so reporting works

JitteryCoyote63 and do prints get logged as well (from all processes) ?

3 years ago

0 Hi All, I'M Looking For An Easy Clean Way To Check If The Code Is Executed By A Clearml Agent Programatically, Any Suggestions?

Apologies on the typo ;)
There is also a global "running_remotely" but it's not on the task

4 years ago

0 Hello, Everyone! I Have A Question Regarding Clearml Features. We Run Into The Situation When Some Of The Agents That Are Working On A Hpo Die Due To Variable Reasons. Some Workers Go Offline Or Resources Need Temporarily Be Detached For Other Needs. Thu

okay that makes sense, if this is the case I would just use clearml-agent execute --id <task_id here> to continue the training Task.
Do notice you have to reload your last chekcpoint from the Task's models/artifacts to continue 🙂
Last question, what is the HPO optimization algorithm, is it just grid/random search or optuna hbop/optuna, if this is the later, how do make it "continue" ?

3 years ago

0 Hello! Question About

I see, actually what you should do is a fully custom endpoint,

preprocessing -> doenload video
processing -> extract frames and send them to Triton with gRPC (see below how)
post processing, return a human readable answer
Regrading the processing itself, what you need is to take this function (copy paste):
None
have it as internal `_process...

2 years ago

0 Hello! There Is Great Alternative For Argparse Developed By Facebook For Ml Named

GrievingTurkey78 Actually it is in progress, see the GitHub issue for details:
https://github.com/allegroai/trains/issues/219

5 years ago

0 Good Day! Please Tell Me How To Screw It Up. On What Chart Or Value Does Such An Error Appear?

Hi @<1523702932069945344:profile|CheerfulGorilla72>
I think more details re needed here:)

2 years ago

0 I Have A Self-Hosted Clearm-Server And And Clearml-Agent Started With

Thanks!

4 years ago

0 Hello! Since Today I Get

It asks the driver or find the cuda dll/so

4 years ago

0 Hi, I Assume It Is Very Basic But How Can I Add The Model That Is Created In The Training To The Artifacts And To See It In The Models Tab?

but I belive it should have work with 0.14.1 as well

Correct

5 years ago

0 Hello, Is There A Way To Update A Task Diff Programatically? Eg, I'M Creating A Task Using

Eg, i'm creating a task using

clearml.Task.create

, often it doesn't properly get the git diff correctly,

ShakyJellyfish91 Task.create does not store any "git diff" automatically, is there a reason not to use Task.init ?

4 years ago

0 Hi Guys, I Couldn'T Find Any Information Whether You Guys Are Looking For Contributors (Programming-Wise Not Just Bug Reports) Thanks

This only talks about bugs reporting and enhancement suggestions

I'll make sure this is fixed 🙂

5 years ago

0 Back To Autoscaler; Is There Any Way To Ensure The Environment Variables On The Services Queue (Where The Scaler Runs) Will Be Automatically Exposed To New Ec2 Instance? Some Bash Hack Or Similar Would Be Nice, Really

LOL yes 🙂
just make sure it won't be part of the uncommitted changes of the AWS autoscaler 😉

3 years ago

0 Hi, I Tried To Provide Docker Image From Pipeline Controller Task To Step Task. Before Pipe.Add_Step(), I Created The Task:

This is definitely a but, in the super class it should have the same condition (the issue is checking if you are trying to change the "main" task)
Thanks ApprehensiveFox95
I'll make sure we push a fix 🙂

4 years ago

0 Hey There, Happy New Year To All Of You

Hi JitteryCoyote63
If you want to stop the Task, click Abort (Reset will not stop the task or restart it, it will just clear the outputs and let you edit the Task itself) I think we witnessed something like that due to DataLoaders multiprocessing issues, and I think the solution was to add 'multiprocessing_context='forkserver' to the DataLoaderhttps://github.com/allegroai/clearml/issues/207#issuecomment-702422291
Could you verify?

4 years ago

0 Hey Guys Trying To Save A Model Via The Outputmodel.Update_Weights Function I Get The Following Error:

Hi @<1546303269423288320:profile|MinuteStork43>

Failed uploading: cannot schedule new futures after interpreter shutdown
Failed uploading: cannot schedule new futures after interpreter shutdown

This is odd where / when exactly are you trying to upload it?

2 years ago

0 Hello, Is It Possible To Run Trains Offline Where There'S No Http Connection Between The Node Running The Job And Where The Web Ui Runs? I See In Your Diagram The Connection Between Training Machine And Trains Server (Which Contains The Web Ui) Is Over Ht

I see.
You can get the offline folder programmatically then copy the folder content (it's the same as the zip, and you can also pass a folder instead of zip to the import function)
task.get_offline_mode_folder()You can also have a soft link of the offline folder (if you are working on a linux machine:
ln -s myoffline_folder ~/.trains/cache/offline

5 years ago

0 Hi. Inside A Notebook When I Cerate A New Clearml Task And Then Run Sklearn Gridsearchcv , Clearml Uploads A Lot Of Model. Is There A Way To Force Clearml Not To Upload These Models? Related Question Is What Are These Models Anyway? Their Name Only Contai

My question was about the automatically uploaded models. Those that were uploaded by clearml client.

So there is a way to add a callback would that work?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/binding/frameworks/init.py#L137
def callback(_, model_info): model_info.name = "my new name" return model_info

3 years ago

0 <image>

Releasing an RC

4 years ago

0 Hello, My Name Is Gabriel, I'M Using Clearml For Our Machine Learning Experiments, Which Is An Amazing Tool To Manage This Type Of Stuff So Thank You Guys For Creating This. But The Last Time I Tried To Use It Some Unexpected Error Came Up For Which I Can

Could it be someone deleted the file? this is inside the temp venv folder but it should not get there

4 years ago

0 If I Have A Dataset And I Process It And I Want The Processed Data As Another Dataset, Is Parent The Right Approach?

Yes 🙂 documentation is being worked on ... Anyhow we will be uploading a new documentation site soon (hopefully in a week or so), putting it all on GitHub so it will be easier for the community to edit and add more

4 years ago

0 Question About The Storage Manager. Assuming I Have An Object That Updates Frequently And Always Saved At The Same Path (E.G.

I assume here:
https://github.com/allegroai/trains/blob/04b3fa809bb73d7101d1995327684ebe5b2911e3/trains/storage/cache.py#L47

5 years ago

0 Probably My Question Will Be About Something Other Than Core Concepts Of Clearml, But Something Connected I Think.... I Want To Schedule Tasks On Gpu And I Want Many Agents To Use Given Gpu At One Time. I Also Want To Avoid Situation When Gpu Will Be Out

Hi RoundMosquito25
The main problem here is there is no way to know before running the Task how much memory it would need ... And without that parameter maximizing GPUs is quite challenging. wdyt?

2 years ago

0 Currently, To Provide Ssh Access To The Docker Images For A Task,

By default SSH server is not running in a lot of scenarios (k8s for example, Windows, MacOS)...

4 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

Merged 🙂

5 years ago

Show more results