Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi. I'M Currently Working On Training A Model, Specifically Fine-Tuning From Segment Anything. I'M Using Remote Training In Clearml, And I Have Three Servers: 2 A30 And 1 A100. Interestingly, When Training On The

Hi. I'm currently working on training a model, specifically fine-tuning from Segment Anything. I'm using remote training in ClearML, and I have three servers: 2 A30 and 1 A100. Interestingly, when training on the A30, the IOU is quite good, around ~0.9 . However, when I train on the A100, the score is significantly lower, around ~0.6.
I've conducted several tests to troubleshoot the issue:

  • I tried remote training on the CPU , but the scores on both A30 and A100 remained the same (good on A30 and bad on A100).
  • I also attempted training directly on A30 and A100 without using remote training. Surprisingly, the scores on both cards were the same and good (IOU ~0.9)Any insights or suggestions on this matter would be greatly appreciated. Is there any issue with how ClearML utilizes the A100 card? Thank you.
    "This image depicts the plot of IOU metrics over training in A30 and A100"
    image
    image
  
  
Posted 3 months ago
Votes Newest

Answers 2


Hi @<1661542597597859840:profile|SilkyHawk58> , ClearML doesn't "utilize" the cards directly per se. ClearML enables your code to execute on remote machines (among many other things). However, the one actually utilizing the card is actually your code.

Makes sense?

  
  
Posted 3 months ago

yeah, but why is there such a notable difference in IOU when training remotely on server with A30 card compared to another server with A100 card? I simply enqueued the task to agents

  
  
Posted 3 months ago
179 Views
2 Answers
3 months ago
3 months ago
Tags