Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Would Like To Follow-Up In This

Hi, I would like to follow-up in this https://clearml.slack.com/archives/CTK20V944/p1646123127790389 happening on clearml server 1.2.0 (self hosted on a single machine, with an external es-cluster). I could not find a solution so far.
Here is recording of the bug, happening as follows:
I reset the aws autoscaler Task I send it to the services queue. It is picked up by the agent and run As soon as I sent it to queue, I refresh the experiment by clicking on another experiment and back to that experiment No scalar, no console appearing, not even the console black screen area.
I don’t have any error message in the browser console - Just an empty array returned on events.get_task_logs. This bug didn’t exist on version 1.1.0 and is quite annoying…

Would you have any idea of what it could be? I saw in the clearml-server code that between 1.1.0 and 1.2.0, the code responsible for fetching the logs has been changed and I suspect the bug comes from these changes, but I am not sure….

Is it possible to rollback from 1.2.0 to 1.1.0 otherwise?

  
  
Posted 2 years ago
Votes Newest

Answers 26


Hi JitteryCoyote63

Is it possible to rollback from 1.2.0 to 1.1.0?

Not really there was a DB migration so out of the box downgrade is not really supported.
That said, v1.3.1 is already out, with what seems like a fix:
As a quick fix, can you test with auto refresh (see top right button with the pause sign you have on the video)

  
  
Posted 2 years ago

So you mean 1.3.1 should fix this bug?

Yes it should see the release notes, there are a few "disappearing" UI fixes:
https://github.com/allegroai/clearml-server/releases/tag/v1.3.0

  
  
Posted 2 years ago

As a quick fix, can you test with auto refresh (see top right button with the pause sign you have on the video)

That doesn’t work unfortunately

  
  
Posted 2 years ago

JitteryCoyote63 oh dear, let me see if we can reproduce (version 1.4 is already in internal testing, I want to verify this was fixed)

  
  
Posted 2 years ago

JitteryCoyote63 wait are you saying that when you download the log is full, but in the UI it is missing?

  
  
Posted 2 years ago

That said, v1.3.1 is already out, with what seems like a fix:

So you mean 1.3.1 should fix this bug?

  
  
Posted 2 years ago

it is shown in the recording above

It was so odd, I had to ask 🙂 okay let me see if we can reproduce

I don’t have any error message in the browser console - Just an empty array returned on events.get_task_logs. This bug didn’t exist on version 1.1.0 and is quite annoying…

meaning the RestAPI returns nothing, is that correct ?

  
  
Posted 2 years ago

AgitatedDove14 Yes exactly! it is shown in the recording above

  
  
Posted 2 years ago

meaning the RestAPI returns nothing, is that correct

Yes exactly, this is the response from the api server when I try to scroll down on the console to get more logs

  
  
Posted 2 years ago

Well actually I do see many errors like that in the browser console:

  
  
Posted 2 years ago

Thank JitteryCoyote63 this is very helpful!

  
  
Posted 2 years ago

Another error that just popped up:

  
  
Posted 2 years ago

Hi JitteryCoyote63 , although we didn't reproduce the issue exactly, we did find some potential issue that might be the cause of this behavior and introduced some additional safeguards in the UI code

  
  
Posted 2 years ago

SuccessfulKoala55 , This is not the exact corresponding request (I refreshed the tab since then), but the request is an events.get_task_logs , with the following content:

  
  
Posted 2 years ago

AgitatedDove14 SuccessfulKoala55 I just saw that clearml-server 1.4.0 was released, congrats 🚀 🙌 Was this bug fixed with this new version?

  
  
Posted 2 years ago

Super! I’ll give it a try and keep you updated here, thanks a lot for your efforts 🙏

  
  
Posted 2 years ago

JitteryCoyote63 can you show the corresponding request in the Network tab?

  
  
Posted 2 years ago

I am happy if I can be of any help to fix that 😄

  
  
Posted 2 years ago

Hi SuccessfulKoala55 , AgitatedDove14 ,
I updated to 1.4.0 (Web UI shows: WebApp: 1.5.0-186 • Server: 1.5.0-186 • API: 2.18 )
Unfortunately the bug is still there 😞
I don’t see errors in the console anymore though!

I had another look and modified a events.get_task_logs request with a super old timestamp to try to retrieve all logs, this returned me only the few logs already displayed in the console. So I think the problem doesn’t come from the WebUI, but from the API endpoint, that doesn’t return all the logs. The endpoint events.download_task_log does return all the logs though, so there is a difference of logic between the twos. I suspect that events.get_task_logs is doing some caching logic, which is responsible for this bug.

A way to fix this bug for me would be to have the WebUI call events.download_task_log instead of events.get_task_logs and always clear the console and dump the output directly in the console, wdyt?

  
  
Posted one year ago

Ok AgitatedDove14 SuccessfulKoala55 I made some progress in my investigation:
I can exactly pinpoint the change that introduced the bug, it is the one changing the endpoint "events.get_task_log", min_version="2.9"
In the firefox console > Network, I can edit an events.get_task_log and change the URL from …/api/v2.9/events.get_task_log to …/api/v2.8/events.get_task_log (to use the endpoint "events.get_task_log", min_version="1.7" ) and then all the logs are returned by the endpoint. When using the 2.9 endpoint, no logs are returned

  
  
Posted one year ago

So the new EventsIterator is responsible for the bug.
Is there a way for me to easily force the WebUI to always use the previous endpoint (v1.7)? I saw in the diff changes v1.1.0 > v1.2.0 that ES version was bumped to 7.16.2. I am using an external ES cluster, and its version is still 7.6.2. Can it be that the incompatibility comes from here? I’ll update the cluster to make sure it’s not the case

  
  
Posted one year ago

Hi JitteryCoyote63 , you mentioned that download task logs brings all the events. It would be interesting to compare the events that are in the download log but not in the task log screen with those that are returned in the screen too. Can you please share the download task logs file and the request and response that you get from the events.get_task_log for the same task?

  
  
Posted one year ago

I checked the server code diffs between 1.1.0 (when it was working) and 1.2.0 (when the bug appeared) and I saw many relevant changes that can introduce this bug > https://github.com/allegroai/clearml-server/compare/1.1.1...1.2.0

  
  
Posted one year ago

Hi AppetizingMouse58 , I sent you the files in PM 🙂

  
  
Posted one year ago

Thanks, looking into it:)

  
  
Posted one year ago

Hi AgitatedDove14 , I upgraded to 1.3.1 and the bug of missing logs in the console is still there… 😞
I made another recording so that you can understand what it is about:
I enqueue a task the task starts, the logs shown in the console are very sparse I scroll up and down to try to fetch missing logs, without success I download the logs, open the file and there I see the full logs

  
  
Posted 2 years ago