Reputation
Badges 1
63 × Eureka!no, I meant to change the way it is reported. I'm still interested in the train_loss graph, naturally 🙂 but obviously it is reporting something that is the inverse of the train_loss, since in the graph it is exploding, and in reality (as reported in the terminal) it is decaying to 9e-2
Turns out the step I missed (maybe should be mentioned in the doc...) the configuration of the Security Group for the EC2 machine to allow inbound connections to the ports 8080, 8008, 8081, and to limit the source to my ip (or my office ip) only
The valid_loss and Accuracy are showing on the Tboard in the same number values as they show up on the terminal, but the train_loss is showing in a different scale and I can't figure out why. I did not change anything in the core files of either torc, Tboard or fastai, and used the intialization in the same way that you showed, and was on fastai docs, using learn.callback_fns.append(partial(LearnerTensorboardWriter, base_dir=tboard_path, name=taskName))
yes, that solved the errors, however the two lines "could not detect iteration reporting" and "reporting detected" a few moments later, still show up
Understood. If there is something I can tweak in the reporting, I couldn't find where I tweak it since it is supposed to be related to the one line of activation of the reporting learn.callback_fns.append(partial(LearnerTensorboardWriter, base_dir=tboard_path, name=taskName)) do you have any ideas what are the options I can do to change the report of the train_loss?
can this give us a clue? I'm getting this error:
this is what I got:{"meta":{"id":"7cd78b67e5384e739b9aec6cdc030e6d","trx":"7cd78b67e5384e739b9aec6cdc030e6d","endpoint":{"name":"projects.delete","requested_version":"2.20","actual_version":"1.0"},"result_code":400,"result_subcode":12,"result_msg":"Validation error (error for field 'project'. field is required!)","error_stack":null,"error_data":{}},"data":{}}
after upgrade - if I go to the url and ending with :8080 I get the old "trains" welcome page
(and I didn't use the -f switch since it wasn't in the instructions, and I'm not familiar with dockers all that much)
I see, I'll try to clear the cache
it will switch to the new one
So it means the old WebApp is still in cache...
meaning the browser or in the server?
I'll look at the security group. Any tips on how to configure it so that it isn't exposed to the entire world, but also not locked to me?
Thanks Jake, clearing the Cache did the trick! thank you so much for your assistance!
Good morning Alon, since you helped me so much getting tensorboard to show results yesterday, I'm hoping you can help me understand why some results I'm getting are strange:
(checking now, there are no Load Balancers in this region)
tried both with Firefox and Chrome, results are similar also between computers and OS (ubuntu and Windows)
Thanks Jake for your help, it's highly appreciated. This is an AWS EC2 running the clearml-server AMI (region of EC2 is us-east-1)
it turns out that my docker-compose.yml wasn't in the environment path, so when I first ran the down command, it had no effect




