Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hi! I Have Some Agents On Gcp. Lately I Have Been Getting Some Experiments That Simply Stop Running (No Signs That The Experiment Crashed). Here Is A Plot That Shows The Resource Monitoring. Any Ideas On What Could Be Causing This?

Hi! I have some agents on GCP. Lately I have been getting some experiments that simply stop running (no signs that the experiment crashed). Here is a plot that shows the resource monitoring. Any ideas on what could be causing this?

Posted one year ago
Votes Newest

Answers 6

GrievingTurkey78 , what framework are you working with? Can you provide some more information regarding your environment - linux/windows, pip/conda? Can you provide maybe a snippet of your code I can try to run to reproduce?

Posted one year ago

You can check the run time by switching to 'wall time' axis 🙂

Posted one year ago

Hey CostlyOstrich36 ! I am using clearml==1.1.2 and clearml-agent==1.1.0 . Stopped is not the right word, more like frozen, it just froze at an epoch. The console on the agent shows epoch 33 first batch and the one at the server epoch 32 last batch. The experiment was running for ~6 hours.

Posted one year ago

GrievingTurkey78 Hi!

What versions of clearml and clearml-agent are you using? Also for how long were the experiments were going?

Seems like agent is still reporting iterations and usage for the experiment so what do you mean by stopped?

Posted one year ago

I am using pytorch_lightning , I'll try to create a snippet I can share! Thanks 🙌

Posted one year ago

Yeah, I experienced the same issue. Training stopps / freezes at the end of the 10th, or 15th epoch. Using pytorch_lightning as well.

Posted one year ago
6 Answers
one year ago
8 months ago