Reputation
Badges 1
41 × Eureka!Ah I see, it's based on a naming scheme, thanks. Sorry I forgot to link the tutorial I was looking at: https://allegro.ai/docs/examples/frameworks/pytorch/pytorch_tensorboard/
The only change I made in the .yml file was:
` ports:
- "8080:80"
to
ports: - "8082:80" `
I already had something running on 8080, but since it's the trains-apiserver and not the webserver, this shouldn't be an issue.
First I tried without build, but same problem. --build
just means that it will re-download all layers instead of using the ones already cached.
Exactly, so that remapping of port 8080
should not be the reason for this issue
Ah my bad, it seems I had to rundocker-compose -f /opt/trains/docker-compose.yml pull
once. I quickly tried trains like half a year ago, so maybe it was using the old images? However, I thought --build
would take care of that.
Now it's working 🙂
It's my colleague's experiment (with scikit-learn), so I'm not sure about the details.
It seems to be related to trains-apiserver
, based on the log inside the Docker compose:
` trains-apiserver | [2020-11-10 04:40:14,133] [8] [ERROR] [trains.service_repo] Returned 500 for queues.get_next_task in 20ms, msg=General data error: err=('1 document(s) failed to index.', [{'index': {'_index': 'queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2020-11', '_type': '_doc', '_id': 'rkh0sHUBwyiZSyeZUAov', 'status': 403, 'error': {'type': 'cluster_block_exception', 'reason': 'index [queu...
AgitatedDove14 There is only a events.out.tfevents.1604567610.system.30991.0
file.
If I open this with a text editor, most is unreadable, but I do find a the letters "PNG" close to the name of the confusion matrix. So it looks like the image is encoded inside the TB log file?
So if I want it under plots, I would need to call e.g. report_confusion_matrix
right?
Aah, I couldn't find it under PLOTS, but indeed it's there under DEBUG SAMPLES.
AgitatedDove14 TB has the confusion matrix like this: