Unanswered
Hi
I Just Updated Our Server To The Latest Version, But It Seems To Have Broken All Our Running Experiments.
Scalars Is Totally Down, I Just Get This Error When Going To The Scalars Tab:
Hi
I just updated our server to the latest version, but it seems to have broken all our running experiments.
Scalars is totally down, I just get this error when going to the Scalars tab:
Error 100 : General data error (ApiError(503, 'search_phase_execution_exception', '[clearml][172.20.0.4:9300][indices:data/read/search[phase/query]]'))
Also our experiments log these errors:
2024-09-05 04:54:16,047 - clearml.Metrics - ERROR - Action failed <401/30: events.add_batch/v1.0 (Invalid token (invalid jwt token): reason=Signature verification failed)>
2024-09-05 04:55:17,836 - clearml.Metrics - ERROR - Action failed <500/100: events.add_batch/v1.0 (General data error: err=500 document(s) failed to index., extra_info=[events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b][0]] containing [500] requests and a refresh])>
and I'm at a loss as to what they mean. Must be related to the server restart.
Help?
148 Views
0
Answers
2 months ago
2 months ago
Tags
Similar posts