Hi MassiveHippopotamus56
Can you please open the browser developer tools, navigate to scalar tabs for one of the experiments that show wrong iteration and copy here the request payload and response for the events.scala_metrics_iter_histogram call?
MassiveHippopotamus56
the "iteration" entry is actually the "max reported iteration over all graphs" per graph there is different max iteration. Make sense ?
LazyFish41
Access-Control-Allow-Credentials: true Access-Control-Allow-Origin: http://clearml.cloud.mobileye.com:8080 Connection: keep-alive Content-Encoding: gzip Content-Length: 429677 Content-Type: application/json Date: Tue, 28 Jun 2022 06:45:46 GMT Server: nginx/1.20.1 Vary: Accept-Encoding Vary: OriginRequest Headers View sourceAccept: application/json Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.9,he-IL;q=0.8,he;q=0.7 Connection: keep-alive Content-Length: 56 Content-Type: application/json Cookie: _gcl_au=1.1.687609999.1649947617; __metrk_uid=PvZb6JIt; _hjSessionUser_272887=eyJpZCI6IjZiNTMwMzVlLTk1YzEtNTYyYS05YWRmLTAzZjMxMjM3ZmQ5YyIsImNyZWF0ZWQiOjE2NDk5NDc2MTc1NjAsImV4aXN0aW5nIjpmYWxzZX0=; _uetvid=bbfd7200bc0111eca6ec8f62cfef517d; _fbp=fb.1.1649947617720.1704085307; __hstc=30567776.af7a4b2a159b4568265807e0a562010e.1649947617665.1649947617665.1649947617665.1; hubspotutk=af7a4b2a159b4568265807e0a562010e; _clck=1svuhbs|1|f0m|0; _ga_T1ZW6BFG7C=GS1.1.1649947616.1.0.1649947626.0; ajs_user_id=%2254d3d730db432927b1e06c5073a956b5b0822790%22; ajs_anonymous_id=%221d11b711-697b-4583-b1dd-fc266b017967%22; _ga_9N3Y21SBZF=GS1.1.1651562678.3.1.1651562679.0; clearml_token_basic=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE2NTc3ODk5MDIsImlhdCI6MTY1NTE5NzkwMiwiaWRlbnRpdHkiOnsiY29tcGFueV9uYW1lIjoidHJhaW5zIiwidXNlciI6ImM1OTNkODgwNzhjMjQxNDI5Nzg1YTQ1MjhlMGYyYzdjIiwiY29tcGFueSI6ImQxYmQ5MmEzYjAzOTQwMGNiYWZjNjBhN2E1YjFlNTJiIiwidXNlcl9uYW1lIjoiT3JpIEtpc2hvbnkiLCJyb2xlIjoidXNlciJ9LCJhdXRoX3R5cGUiOiJCZWFyZXIiLCJlbnYiOiI8dW5rbm93bj4iLCJhcGlfdmVyc2lvbiI6IjIuMTciLCJzZXJ2ZXJfdmVyc2lvbiI6IjEuMy4wIiwic2VydmVyX2J1aWxkIjoiMTY1IiwiZmVhdHVyZV9zZXQiOiJiYXNpYyJ9.7fm20kXnWIN1w6ivUCP6qBEUK_bWsAUWFOnaBo0FsSM; _ga=GA1.2.1474697051.1649947617; _ga_GE4ZXDNGRN=GS1.1.1655975869.8.0.1655975869.0 Host: clearml.cloud.mobileye.com:8080 Origin: http://clearml.cloud.mobileye.com:8080 Referer: http://clearml.cloud.mobileye.com:8080/projects/7df742283b004957bcba936dbca1ec69/experiments/d45ecb5ad7084175bd83dd39777b10c5/output/metrics/scalar User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36 X-Allegro-Client: Webapp-1.3.0-165
AgitatedDove14 this is not the case as all the scalars report the same iterations
this is not the case as all the scalars report the same iterations
MassiveHippopotamus56 could it be the the machine statistics? (i.e. cpu/gpu etc. these are considered scalars as well...)
they show iterations around 300k so I don't think they are relevant for this
AgitatedDove14 AppetizingMouse58 I want to clarify, that this run used to show the correct iterations on the website starting around 40k, and about a month later for some reason the initial iteration seems to have changed to 0
and about a month later for some reason the initial iteration seems to have changed to 0
Hmm, I see your point. Just so I fully understand, your are not saying Old experiments were changed, but new experiments (running the same code-ish) have a totally different max iterations value. Is this correct ?
No, an old experiment changed, nothing was rerun
as if the server has changed the way that it shows the data
No, an old experiment changed, nothing was rerun
ohh, that is odd. I think the max iteration value is stored on the DB, which is odd if it changed after an update.
BTW: just making sure, could it be these Tasks were imported ? (i.e. offline execution + import)
Hi MassiveHippopotamus56 - just to make sure I understand - when you say "started from iteration" you mean in the UI Scalars section, the first samples appear with the x-axis showing some iteration number (which is not 0) like 40k?
Yes. This is what used to be shown, and was the expected behaviour because the run began from a previous checkpoint. And now the same exact experiment (not rerun) shows a different starting iteration for the scalars (now starting at 0).
I don't have a screenshot of how it looked before but it looked exactly like this but the iterations started around 40k
This is how it looks now:
MassiveHippopotamus56 The data that you posted from the browser developers tool seems coming from the "Headers" tab. Can you please post the data from the "Payload" and "Response" tabs. This is in case you run in Chrome. In other browsers the tabs may have different names
Ok I think I got it.
Payload:
{task: "d45ecb5ad7084175bd83dd39777b10c5", key: "iter"}key: "iter" task: "d45ecb5ad7084175bd83dd39777b10c5"Response is long so it is attached as a file
The data that you sent looks fine. It seems that you actually has these iterations in Elasticsearch. To check whether it is the case please run the following command in the shell on your host. You should get the first 10 task events with the smallest iterations:curl -XGET -H "Content-Type: application/json" localhost:9200/events-training_stats_scalar*/_search?pretty -d' { "query": { "term": {"task": "d45ecb5ad7084175bd83dd39777b10c5"} }, "sort": {"iter": "asc"} }'