Thank JitteryCoyote63 this is very helpful!
Hi AgitatedDove14 , I upgraded to 1.3.1 and the bug of missing logs in the console is still there… 😞
I made another recording so that you can understand what it is about:
I enqueue a task the task starts, the logs shown in the console are very sparse I scroll up and down to try to fetch missing logs, without success I download the logs, open the file and there I see the full logs
meaning the RestAPI returns nothing, is that correct
Yes exactly, this is the response from the api server when I try to scroll down on the console to get more logs
AgitatedDove14 SuccessfulKoala55 I just saw that clearml-server 1.4.0 was released, congrats 🚀 🙌 Was this bug fixed with this new version?
SuccessfulKoala55 , This is not the exact corresponding request (I refreshed the tab since then), but the request is an events.get_task_logs
, with the following content:
Well actually I do see many errors like that in the browser console:
Ok AgitatedDove14 SuccessfulKoala55 I made some progress in my investigation:
I can exactly pinpoint the change that introduced the bug, it is the one changing the endpoint "events.get_task_log", min_version="2.9"
In the firefox console > Network, I can edit an events.get_task_log
and change the URL from …/api/v2.9/events.get_task_log
to …/api/v2.8/events.get_task_log
(to use the endpoint "events.get_task_log", min_version="1.7"
) and then all the logs are returned by the endpoint. When using the 2.9 endpoint, no logs are returned
As a quick fix, can you test with auto refresh (see top right button with the pause sign you have on the video)
That doesn’t work unfortunately
JitteryCoyote63 oh dear, let me see if we can reproduce (version 1.4 is already in internal testing, I want to verify this was fixed)
JitteryCoyote63 can you show the corresponding request in the Network tab?
Hi JitteryCoyote63
Is it possible to rollback from 1.2.0 to 1.1.0?
Not really there was a DB migration so out of the box downgrade is not really supported.
That said, v1.3.1 is already out, with what seems like a fix:
As a quick fix, can you test with auto refresh (see top right button with the pause sign you have on the video)
So the new EventsIterator
is responsible for the bug.
Is there a way for me to easily force the WebUI to always use the previous endpoint (v1.7)? I saw in the diff changes v1.1.0 > v1.2.0 that ES version was bumped to 7.16.2. I am using an external ES cluster, and its version is still 7.6.2. Can it be that the incompatibility comes from here? I’ll update the cluster to make sure it’s not the case
JitteryCoyote63 wait are you saying that when you download the log is full, but in the UI it is missing?
So you mean 1.3.1 should fix this bug?
Yes it should see the release notes, there are a few "disappearing" UI fixes:
https://github.com/allegroai/clearml-server/releases/tag/v1.3.0
That said, v1.3.1 is already out, with what seems like a fix:
So you mean 1.3.1 should fix this bug?
Super! I’ll give it a try and keep you updated here, thanks a lot for your efforts 🙏
AgitatedDove14 Yes exactly! it is shown in the recording above
Hi AppetizingMouse58 , I sent you the files in PM 🙂
I checked the server code diffs between 1.1.0 (when it was working) and 1.2.0 (when the bug appeared) and I saw many relevant changes that can introduce this bug > https://github.com/allegroai/clearml-server/compare/1.1.1...1.2.0
it is shown in the recording above
It was so odd, I had to ask 🙂 okay let me see if we can reproduce
I don’t have any error message in the browser console - Just an empty array returned on events.get_task_logs. This bug didn’t exist on version 1.1.0 and is quite annoying…
meaning the RestAPI returns nothing, is that correct ?
Hi SuccessfulKoala55 , AgitatedDove14 ,
I updated to 1.4.0 (Web UI shows: WebApp: 1.5.0-186 • Server: 1.5.0-186 • API: 2.18
)
Unfortunately the bug is still there 😞
I don’t see errors in the console anymore though!
I had another look and modified a events.get_task_logs
request with a super old timestamp to try to retrieve all logs, this returned me only the few logs already displayed in the console. So I think the problem doesn’t come from the WebUI, but from the API endpoint, that doesn’t return all the logs. The endpoint events.download_task_log
does return all the logs though, so there is a difference of logic between the twos. I suspect that events.get_task_logs
is doing some caching logic, which is responsible for this bug.
A way to fix this bug for me would be to have the WebUI call events.download_task_log
instead of events.get_task_logs
and always clear the console and dump the output directly in the console, wdyt?
I am happy if I can be of any help to fix that 😄
Hi JitteryCoyote63 , you mentioned that download task logs brings all the events. It would be interesting to compare the events that are in the download log but not in the task log screen with those that are returned in the screen too. Can you please share the download task logs file and the request and response that you get from the events.get_task_log for the same task?
Hi JitteryCoyote63 , although we didn't reproduce the issue exactly, we did find some potential issue that might be the cause of this behavior and introduced some additional safeguards in the UI code