
Reputation
Badges 1
63 × Eureka!yes, that solved the errors, however the two lines "could not detect iteration reporting" and "reporting detected" a few moments later, still show up
after running docker ps
I saw that all the ports are still listed. I then changed the name of /opt/clearml
back to /opt/trains
and ran the command sudo docker-compose -f /opt/trains/docker-compose.yml down
and it did the trick
the train_loss is on the second from left column (the far left is epoch num 30-36)
I think you have the page cached - can you try reloading using
Ctrl-F5
?
using Ctrl-F5 it redirects to the ClearML new login
but if I just enter the URL with ending :8080 it takes me to the old login
Understood. If there is something I can tweak in the reporting, I couldn't find where I tweak it since it is supposed to be related to the one line of activation of the reporting learn.callback_fns.append(partial(LearnerTensorboardWriter, base_dir=tboard_path, name=taskName))
do you have any ideas what are the options I can do to change the report of the train_loss?
I see, I'll try to clear the cache
this is what I got:{"meta":{"id":"7cd78b67e5384e739b9aec6cdc030e6d","trx":"7cd78b67e5384e739b9aec6cdc030e6d","endpoint":{"name":"projects.delete","requested_version":"2.20","actual_version":"1.0"},"result_code":400,"result_subcode":12,"result_msg":"Validation error (error for field 'project'. field is required!)","error_stack":null,"error_data":{}},"data":{}}
Turns out the step I missed (maybe should be mentioned in the doc...) the configuration of the Security Group for the EC2 machine to allow inbound connections to the ports 8080, 8008, 8081, and to limit the source to my ip (or my office ip) only
Thank you Martin for your fast response! Will do
when running the docker ps
command the output is empty
so, inbound rules should allow custom TCP for the three ports, 8080, 8001, 8081? what about the outbound rules?
this is an error during training that points out to ElasticSearch error. This might be also the cause of the delete error, what do you think SuccessfulKoala55 ?
The valid_loss and Accuracy are showing on the Tboard in the same number values as they show up on the terminal, but the train_loss is showing in a different scale and I can't figure out why. I did not change anything in the core files of either torc, Tboard or fastai, and used the intialization in the same way that you showed, and was on fastai docs, using learn.callback_fns.append(partial(LearnerTensorboardWriter, base_dir=tboard_path, name=taskName))
going to the server URL:8080 -> old trains login (working as usual if I enter my credentials) -> Ctrl-F5 -> switched to the new interface
` Traceback (most recent call last):
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/torch/utils/tensorboard/init.py", line 2, in <module>
from tensorboard.summary.writer.record_writer import RecordWriter # noqa F401
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/trains/binding/import_bind.py", line 59, in __patched_import3
level=level)
ModuleNotFoundError: No module named 'tensorboard'
During handling of the above exception, ...
after upgrade - if I go to the url and ending with :8080 I get the old "trains" welcome page
Good morning Alon, since you helped me so much getting tensorboard to show results yesterday, I'm hoping you can help me understand why some results I'm getting are strange: