Is It Possible To Increase The Polling Interval For K8S Glue? Currently It Is 5 Seconds I Believe. Would Adding An Argument For It Help? Can Do A Pr If So

Answered

Is it possible to increase the polling interval for k8s glue? Currently it is 5 seconds I believe. Would adding an argument for it help? Can do a PR if so

  				
Posted 
	3 years ago

					More  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Votes Newest

Answers 17

Nope, that doesn’t seem to be it. Will debug a bit more.

  				
Posted 
	3 years ago

					More  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Let me know :)

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Good question 🙂

this is what I am seeing in the logs:

` No tasks in queue 9154efd8a1314550b1c7882981720861
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue 9154efd8a1314550b1c7882981720861
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue 9154efd8a1314550b1c7882981720861
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue 9154efd8a1314550b1c7882981720861
No tasks in Queues, sleeping for 5.0 seconds
K8S Glue pods monitor: Failed parsing kubectl output:

Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output:

Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output: `
This pattern repeats after a minute or so. Error for a while, normal output for a while. My guess is eks is throttling. Need to see how I can get the correct error.

  				
Posted 
	3 years ago

					More  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output:

Run with --debug as the first parameter
Are you running the latest from the git repo ?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yep, you are right

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

This is the thread checking the state of the running pods (and updating the Task status, so you have visibility into the state of the pod inside the cluster before it starts running)

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Like it said, it works, but goes into the error loop

  				
Posted 
	3 years ago

					More  		
  Report
		
					TrickySheep9
				
					0
					 × 1

No idea why it fails...

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

https://github.com/allegroai/clearml-agent/blob/aede6f4bac71c8fc56e7cf982318a48527953a3c/clearml_agent/glue/k8s.py#L217

  				
Posted 
	3 years ago

					More  		
  Report
		
					TrickySheep9
				
					0
					 × 1

I am using the clearml-agent from pypi version

  				
Posted 
	3 years ago

					More  		
  Report
		
					TrickySheep9
				
					0
					 × 1

And then comes back again

  				
Posted 
	3 years ago

					More  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Since it’s already logging this debug wouldn’t add anything?

  				
Posted 
	3 years ago

					More  		
  Report
		
					TrickySheep9
				
					0
					 × 1

5 seconds will be a sleep between two consecutive pulls where there are no jobs to process, why would you increase it to a higher pull freq ?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I saw that the debug param wasn’t adding anything additional for this?

  				
Posted 
	3 years ago

					More  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Planning to exec into the container and run it in a loop and see what happens

  				
Posted 
	3 years ago

					More  		
  Report
		
					TrickySheep9
				
					0
					 × 1

(no objection to add an argument but, I just wonder what's the value)

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

kubectl get pods -n {namespace} -o=JSONWhat are you getting when running the above on your cluster ?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

17 Answers

3 years ago

2 years ago