I am using the clearml-agent from pypi version
I saw that the debug param wasn’t adding anything additional for this?
(no objection to add an argument but, I just wonder what's the value)
Like it said, it works, but goes into the error loop
Good question 🙂
this is what I am seeing in the logs:
` No tasks in queue 9154efd8a1314550b1c7882981720861
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue 9154efd8a1314550b1c7882981720861
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue 9154efd8a1314550b1c7882981720861
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue 9154efd8a1314550b1c7882981720861
No tasks in Queues, sleeping for 5.0 seconds
K8S Glue pods monitor: Failed parsing kubectl output:
Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output:
Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output: `
This pattern repeats after a minute or so. Error for a while, normal output for a while. My guess is eks is throttling. Need to see how I can get the correct error.
Planning to exec into the container and run it in a loop and see what happens
Since it’s already logging this debug wouldn’t add anything?
kubectl get pods -n {namespace} -o=JSON
What are you getting when running the above on your cluster ?
This is the thread checking the state of the running pods (and updating the Task status, so you have visibility into the state of the pod inside the cluster before it starts running)
Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output:
Run with --debug as the first parameter
Are you running the latest from the git repo ?
5 seconds will be a sleep between two consecutive pulls where there are no jobs to process, why would you increase it to a higher pull freq ?
Nope, that doesn’t seem to be it. Will debug a bit more.