Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
I Have A Problem That Might Not Directly Be Clearml Related, But Maybe Someone Here Has An Idea: I Run A Clearml-Server On A Machine With 128Gb Ram, 32 Cores And 2 Gpus. On The Same Machine I Run 2 Clearml-Agent Each With Access To 1 Gpu, 12 Cores, An 48G

SuccessfulKoala55 I just had the issue again. The logs show nothing of interest. It looks like OOM to me, but I will test this again with way larger SWAP, so the server only slows down, but does not kill something. Unfortunately, kernel logs also do not show much (maybe I have my server logs misconfigured, I am no expert).
What is interesting though is that docker only showed my nginx, minio and docker-registry to have exited, while all the clearml containers were still running. I restarted everything and now previously running experiments are shown as aborted. I checked the clearml-agents and I can clearly see that the tasks are still running (high GPU/CPU load and processes still running). But then after they clearml-agents reconnect to the server, the tasks stop (no more processes running). Super weird.

Posted 2 years ago
0 Answers
2 years ago
one year ago