setting  ignore_remote_overrides = True  help solve the issue, but obviously we can't use it as a solution. what reasons might be that it would take so much time when trying to find params override in the backend? is it a network issue? maybe needs to change the machine network configuration?
			
				Answered
			
			
 
			
	
		
			
		
		
		
		
	
			
 				
	
	
		
			
		
		
		
		
	
 
					
		
		Hi, We Are Migrating From Aws To Gcp Machines And We Experience Issues With
Hi, we are migrating from AWS to GCP machines and we experience issues with  task.connect  function. the issue is that on GCP machines that are spawn by the autoscaler (clearml gcp autoscaler) it takes a lot of time to complete the  task.connect  , for example:
Connected config: experiment_globals in 37.61 seconds
Connected config: data in 74.52 seconds
Connected config: augmentations in 50.19 seconds
Connected config: model in 64.26 seconds
Connected config: losses in 26.28 seconds
Connected config: trainer in 155.20 seconds
Connected config: datasets_config in 70.18 seconds
Connected config: model_architecture in 4191.67 seconds
Connected config: losses_config in 1254.73 seconds
Connected config: trainer_config in 58.90 seconds
as you can see it takes hours to connect our configurations.
using a GCP VM that we spawn manually, with the same machine image,  running not in remote mode , we don't have this issue, and  task.connect  is done in a few seconds.
we would love to get some ideas on what might cause this? what/where do we need to look at/for?
attached the full log file for your convenience.
1K Views
				1
Answer
				
					 
	8 months ago
				
					
						 
	8 months ago
					
					 Tags
					
			Similar posts