Hi CostlyOstrich36 , one more observation: it looks like when I don’t open the experiment in the webUI before it is finished, then I get all the logs correctly. It is when I open the experiment in the webUI while it is running that I don’t see all the logs.
So it looks like there is an effect of caching (the logs are retrieved only once, when I open the experiment for the first time), and not afterwards (or rarely). Is that possible?
CostlyOstrich36 , this also happens with clearml-agent 1.1.1 on a aws instance…
JitteryCoyote63 , if you hit F12 and open the console in the webUI you should see some calls going out called events.get_task_log
, can you take a peek and see if the logs are missing from there?
I am sorry to give infos that are not very precise, but it’s the best I can do - Is this bug happening only to me?
I think it comes from the web UI of the version 1.2.0 of clearml-server, because I didn’t change anything else
Sure, if you can post it here or send in private if you prefer it would be great
Hi CostlyOstrich36 , this weekend I took a look at the diffs with the previous version ( https://github.com/allegroai/clearml-server/compare/1.1.1...1.2.0# ) and I saw several changes related to the scrolling/logging:
apiserver/bll/event/ http://log_events_iterator.py apiserver/bll/event/ http://events_iterator.py apiserver/config/default/services/_mongo.conf apiserver/database/model/ http://base.py apiserver/services/ http://events.pyI suspect that one of these changes might be responsible for the bug I am facing, wdyt?
CostlyOstrich36 yes, when I scroll up, a new events.get_task_log is fired and the response doesn’t contain any log (but it should)
However when downloading the log manually it appears all the data is there?
JitteryCoyote63 , doesn't seem to happen to me. I'll try raising a clean server and see if this happens then. You're running with 1.2, correct?
No, they have different names - I will try to update both agents to the latest versions
Interesting! Do they happen to have the same machine name in UI?
CostlyOstrich36 Were you able to reproduce it? That’s rather annoying 😅
CostlyOstrich36 , actually this only happens for a single agent. The weird thing is that I have a machine with two gpus, and I spawn two agents, one per gpus. Both have the same version. For one, I can see all the logs, but not for the other
JitteryCoyote63 , if you go to a completed experiment you only see the packages stage installed in the log?
What OS/ClearML-Agent are you running?
CostlyOstrich36 I updated both agents to 1.1.2 and still go the same problem unfortunately. Since I can download the full log file from the Web UI, I guess the agents are reporting correctly?
Could it be that the elasticsearch does not return all the requested logs when it is queried from the WebUI to display it in the console?
Now that I think about it, I remember that on the changelog of the clearml-server 1.2.0 the following is listed:Fix UI Workers & Queues and Experiment Table pages display mismatching experiment runtime values #100
Can it be that somehow the logic of querying logs was affected by this change? Ie. the webUI asks the logs with a wrong timestamp, hence not getting all the logs? That would explain the current bug.
Or maybe something introduced with the changes on Reddit scrolling?
To me, it really looks like a bug from the way the WebUI fetches the logs and show them in the console, I got this problem only after upgrading the clearml-server to 1.2.0
I am happy to send you a screen recording of the problem if it helps understanding it better