(no objection to add an argument but, I just wonder what's the value)
kubectl get pods -n {namespace} -o=JSON
What are you getting when running the above on your cluster ?
I saw that the debug param wasn’t adding anything additional for this?
5 seconds will be a sleep between two consecutive pulls where there are no jobs to process, why would you increase it to a higher pull freq ?
Since it’s already logging this debug wouldn’t add anything?
Planning to exec into the container and run it in a loop and see what happens
I am using the clearml-agent from pypi version
Like it said, it works, but goes into the error loop
Good question 🙂
this is what I am seeing in the logs:
` No tasks in queue 9154efd8a1314550b1c7882981720861
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue 9154efd8a1314550b1c7882981720861
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue 9154efd8a1314550b1c7882981720861
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue 9154efd8a1314550b1c7882981720861
No tasks in Queues, sleeping for 5.0 seconds
K8S Glue pods monitor: Failed parsing kubectl output:
Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output:
Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output: `
This pattern repeats after a minute or so. Error for a while, normal output for a while. My guess is eks is throttling. Need to see how I can get the correct error.
Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output:
Run with --debug as the first parameter
Are you running the latest from the git repo ?
This is the thread checking the state of the running pods (and updating the Task status, so you have visibility into the state of the pod inside the cluster before it starts running)
Nope, that doesn’t seem to be it. Will debug a bit more.