JitteryCoyote63

214 Questions, 1021 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

979 × Eureka!

Answers 1021

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

Well, as long as you’re using a single node, it should indeed alleviate the shard disk size limit, but I’m not sure ES will handle that too well. In any case, you can’t change that for existing indices, you can modify the mapping template and reindex the existing index (you’ll need to index to another name, delete the original and create an alias to the original name as the new index can’t be renamed...)

Ok thanks!

Well, as long as you use a single node, multiple shards offer no sca...

3 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

can it be that the merge op takes so much filesystem cache that the rest of the system becomes unresponsive?

3 years ago

0 Hi, I Would Like To Switch From The Elastic-Search Service In The Docker-Compose Of The Clearml-Server To An Externally Managed, Scalable Elastic-Search Cluster. I Have Two Questions:

The number of documents in the old and the new env are the same though 🤔 I really don’t understand where this extra space used comes from

3 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

Here is (left) the data disk (/opt/clearml) and right the OS disk

3 years ago

0 Hi, I Update Recently To Clearml-Server 1.2 (Self Hosted), Great Job! I Am Seeing The Popup Asking For S3 Creds Often When Navigating In Debug Samples. I Set Them Multiple Times Under Settings > Configuration > Web App Cloud Access, But For Some Reason It

it also happens without hitting F5 after some time (~hours)

2 years ago

0 Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

Here is the console with some errors

3 years ago

0 Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

Yes, I set:
auth { cookies { httponly: true secure: true domain: ".clearml.xyz.com" max_age: 99999999999 } }It always worked for me this way

3 years ago

0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

"Can only use wildcard queries on keyword and text fields - not on [iter] which is of type [long]"

3 years ago

0 Hi, In The Context Of Multi-Gpu Training, Is

if I want to resume a training on multi gpu, I will need to call this function on each process to send the weights to each gpu

3 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

SuccessfulKoala55 Thanks! If I understood correctly, setting index.number_of_shards = 2 (instead of 1) would create a second shard for the large index, splitting it into two shards? This https://stackoverflow.com/a/32256100 seems to say that it’s not possible to change this value after the index creation, is it true?

3 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

Would adding a ILM (index lifecycle management) be an appropriate solution?

3 years ago

0 Hi, Is It Possible To Disable Some Of The System Metrics Monitored? And Also Downsample The Rate Of Logging?

Ha nice, makes perfect sense thanks AgitatedDove14 !

3 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

AgitatedDove14 I made some progress:
In clearml.conf of the agent, I set: sdk.development.report_use_subprocess = false (because I had the feeling that Task._report_subprocess_enabled = False wasn’t taken into account) I’ve set task.set_initial_iteration(0) Now I was able to get the followin graph after resuming -

3 years ago

0 Hi There, I Would Like To Report A Bug With The Resizing Of The Columns In The Projects View: It Doesn’T Work As Expected. Please Look At The Behavior Of The Resizing On The Following Screen Recording

Super, thanks Shay!

3 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

from 10 to 11

3 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

3 years ago

0 Hi, I Would Like To Switch From The Elastic-Search Service In The Docker-Compose Of The Clearml-Server To An Externally Managed, Scalable Elastic-Search Cluster. I Have Two Questions:

ha nice thanks 🙏

3 years ago

0 Hi, I Would Like To Switch From The Elastic-Search Service In The Docker-Compose Of The Clearml-Server To An Externally Managed, Scalable Elastic-Search Cluster. I Have Two Questions:

SuccessfulKoala55

In the docker-compose file, you have an environment setting for the apiserver service host and port (CLEARML_ELASTIC_SERVICE_HOST and CLEARML_ELASTIC_SERVICE_PORT) - changing those will allow you to point the server to another ES service

The ES cluster is running in another machine, how can I set its IP in CLEARML_ELASTIC_SERVICE_HOST ? I would need to add host to the networks of the apiserver service somehow? How can I do that?

3 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

ha sorry it’s actually the number of shards that increased

3 years ago

0 Hi, I Would Like To Switch From The Elastic-Search Service In The Docker-Compose Of The Clearml-Server To An Externally Managed, Scalable Elastic-Search Cluster. I Have Two Questions:

I am not sure I can do both operations at the same time (migration + splitting), do you think it’s better to do splitting first or migration first?

3 years ago

0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

Same for regexp, damn

3 years ago

0 Hi All, I Updated From Clearml-Server 1.14.1 To 1.15.0 And I Am Getting The Following Error While Trying To Start The Server After Running Docker-Compose Pull:

Setting to redis from version 6.2 to 6.2.11 fixed it but I have new issues now 😄

8 months ago

0 Hi, I Have A Clearml-Agent (1.1.2) In A G4Dn.4Xlarge Aws Instance (With One T4 Gpu), That Reports

Nevermind, nvidia-smi command fails in that instance, the problem lies somewhere else

2 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

Still failing with the same error 😞

3 years ago

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

I now have a different question: when installing torch from wheels files, I am guaranteed to have the corresponding cuda library and cudnn together right?

4 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

I am still confused though - from the get started page of pytorch website, when choosing "conda", the generated installation command includes cudatoolkit, while when choosing "pip" it only uses a wheel file.
Does that mean the wheel file contains cudatoolkit (cuda runtime)?

3 years ago

0 Hey There! I Would Like To Use The Function

that’s perfect, thanks!

2 years ago

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

alright I am starting to get a better picture of this puzzle

4 years ago

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

From https://discuss.pytorch.org/t/please-help-me-understand-installation-for-cuda-on-linux/14217/4 it looks like my assumption is correct: there is no need for cudatoolkit to be installed since wheels already contain all cuda/cudnn libraries required by torch

4 years ago

0 Hi, It Seems That The

Thanks SuccessfulKoala55 for the answer! One followup question:
When I specify:
agent.package_manager.pip_version: '==20.2.3'
in the trains.conf, I get:
trains_agent: ERROR: Failed parsing /home/machine1/trains.conf (ParseException): Expected end of text, found '=' (at char 326), (line:7, col:37)

4 years ago

Show more results