Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I'M Using Aws Ec2 Instance To Trian My Models With Clearml Autoscaler, But It Says Cuda Device Is Not Avaliable. The Code Runs Well On My Local Pc And It Runs Well On Clearml With Ec2 Yesterday, But It Suddenly Doesn'T Work Today. Is There Anyway To S

Hi, I'm using AWS EC2 instance to trian my models with ClearML autoscaler, but it says CUDA device is not avaliable. The code runs well on my local PC and it runs well on clearml with EC2 yesterday, but it suddenly doesn't work today. Is there anyway to solve this?

  
  
Posted one month ago
Votes Newest

Answers 10


Hi EnchantingPenguin77 , I don't see any errors related to CUDA in the log

  
  
Posted one month ago

CostlyOstrich36 sorry wrong log uploaded, here is the error:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

  
  
Posted one month ago

screenshot of AWS Autoscaler setup, cpu mode is NOT enabled
image

  
  
Posted one month ago

EnchantingPenguin77 , are you sure you added the correct log? I don't see any errors related to cuda

  
  
Posted one month ago

Hi CostlyOstrich36 Any idea why this happen?

  
  
Posted one month ago

Hi CostlyOstrich36 , here is the configuration. The GPU could be found sometimes when I clone the previous successful run, but the GPU was found randomly. Also I am unable to run multiple task at the same time even with cloning the previous run

  
  
Posted 26 days ago

CostlyOstrich36 yes, in the end of the new file
image

  
  
Posted 29 days ago

And this issue happens randomly, I was able to run it again last night, but failed again this morning

  
  
Posted 29 days ago

Can you add here the configuration of the autoscaler?

  
  
Posted 28 days ago

one thing I've changed is the AMI for the autoscaler, I changed it from amazon linux to ubuntu linux since my docker file size exceed the limit of the amazon linux. Not sure if this has anything to do with this issue

  
  
Posted 25 days ago
135 Views
10 Answers
one month ago
24 days ago
Tags