Why Does Clearml Still Waste Time On Requirement Analysis When I Provide Them? Any Tips For How I Can Reduce Clearml Overhead ... (The Time Before Work Actually Starts)?

Answered

why does clearml still waste time on requirement analysis when I provide them?
any tips for how I can reduce clearml overhead ... (the time before work actually starts)?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

Votes Newest

Answers 9

thanks so much!
I've been running a bunch of tests with timers and seeing an absurd amount of variance. Ive seen parameters connect and task create in seconds and other times it takes 4 minutes.

Since I see timeout connection errors somewhat regularly, I'm wondering if perhaps I'm having networking errors. Is there a way (at the class level) to control the retry logic on connecting to the API server?

my operating theory is that some sort of backoff / timeout (eg 10s) is causing the high variance.

To test this, I spun up a local instance and pointing to localhost as well as the normal reverse-proxy, and found that localhost had "overhead times" that were completely reasonable - practically none at all.

The difference in the two screenshots is literally only the URLs in clearml.conf and it went from 30s down to 2-3s.

(server has been destroyed already, not worried about the keys showing)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

Hi @<1689446563463565312:profile|SmallTurkey79>
This call is to set an existing (already created Task's requirements). Since it was just created it waits for the automatic package detection before overriding it.
What you want is " Task.force_requirements_env_freeze " (notice Class level, that need to be called Before Task.init)

Task.force_requirements_env_freeze(requirements_file="requirements.txt")
task = Task.init(...)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

. Ive seen parameters connect and task create in

seconds

and other times it takes 4 minutes.

This might be your backend (cleamrl-server) replying slowly becuase of load?

Is there a way (at the class level) to control the retry logic on connecting to the API server?

The difference in the two screenshots is literally only the URLs in

clearml.conf

and it went from 30s down to 2-3s.

Yes that could be network, also notice that there is auto retries that are quiet basically if a request is dropped due to network issues / timeout etc, it will automatically retry (but of course it will look slow from the outside because of the retries)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I'm not familiar with this one, I think you should be able to control it with:
None

CLEARML_AGENT__API__HTTP__RETRIES__BACKOFF_FACTOR

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

thank you very much.

for remote workers, would this env variable get parsed correctly?
CLEARML_API_HTTP_RETRIES_BACKOFF_FACTOR=0.1

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

thanks for the clarification. is there any bypass? (a git diff + git rev parse should take mere milliseconds)

I'm working out of a mono repo, and am beginning to suspect its a cause of slowness. next week ill try moving a pipeline over to a new repo to test if this theory holds any water.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

@MichaelHi @<1689446563463565312:profile|SmallTurkey79> , the trigger to the detection is the call to Task.init() , so while calling set_packages() overrides any packages found, it will not prevent it.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

yup! that's what I was wondering if you'd help me find a way to change the timings of. Is there an option I can override to make the retry more aggressive?

I've definitely narrowed it down to the reverse proxy I'm behind. when I switch to a cloudflare tunnel, the overhead of the network is <1s compared to localhost, everything feels snappy!

But for security reasons, I need to keep using the reverse proxy, hence my question about configuring the silent clearml retries.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmallTurkey79
				
					0
					 × 1

yup! that's what I was wondering if you'd help me find a way to change the timings of. Is there an option I can override to make the retry more aggressive?

you mean wait for less?
None
add to your clearml.conf:

api.http.retries.backoff_factor = 0.1

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

9 Answers

one year ago