Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, I'D Like Some Help Debugging Why My Clearml Server Is Crashing. I'M Currently Running My Server Inside Gcp, And I'M Not Being Able To Access It Anymore, So I Need To Restart The Server From Outside. Has This Ever Happened Before To Anyone? What Kin

Hello, I'd like some help debugging why my ClearML server is crashing. I'm currently running my server inside GCP, and I'm not being able to access it anymore, so I need to restart the server from outside. Has this ever happened before to anyone? What kind of logging should I be looking at to narrow down the problem? I would discard connectivity issues since all my other VMs are working just fine. Also, the VM is running the GCP image you provide, and I only use the VM for that purpose. Thanks!

  
  
Posted 3 years ago
Votes Newest

Answers 14


I should remark that it's been working OK nonstop for 5 months already.. but yesterday and today I'm experiencing theses crashes

  
  
Posted 3 years ago

When you restart the server, can you access it?

  
  
Posted 3 years ago

yes

  
  
Posted 3 years ago

how can you even try to do comparisons if you can't access the WebApp?

  
  
Posted 3 years ago

but the reason I said the comparison could be an issue is because I'm not being able to do comparisons of experiments

  
  
Posted 3 years ago

I doubt it

  
  
Posted 3 years ago

oh I meant now...so after the reboot everything goes back to "normal"..except that I can't make the comparisons

  
  
Posted 3 years ago

Hi MuddySquid7 ,

I'm not being able to access it anymore

Do you mean by SSH? If so, this is likely related to storage space on the system disk

  
  
Posted 3 years ago

I have 143 GB free in /

  
  
Posted 3 years ago

or I can make comparisons inside some projects but not others

  
  
Posted 3 years ago

This also suits what you described - perhaps the server has been running for a long time and the system disk was filled by ES/files and perhaps logs?

  
  
Posted 3 years ago

could it be a memory issue triggered by the comparison of 3 experiments?

  
  
Posted 3 years ago

I can't access the WebAPP nor ssh the server

  
  
Posted 3 years ago

So the issue is only SSH to the server, or the server not responding to the SDK and/or WebApp?

  
  
Posted 3 years ago
1K Views
14 Answers
3 years ago
one year ago
Tags