
Reputation
Badges 1
63 × Eureka!so, inbound rules should allow custom TCP for the three ports, 8080, 8001, 8081? what about the outbound rules?
Turns out the step I missed (maybe should be mentioned in the doc...) the configuration of the Security Group for the EC2 machine to allow inbound connections to the ports 8080, 8008, 8081, and to limit the source to my ip (or my office ip) only
yes, that solved the errors, however the two lines "could not detect iteration reporting" and "reporting detected" a few moments later, still show up
I'll look at the security group. Any tips on how to configure it so that it isn't exposed to the entire world, but also not locked to me?
Understood. If there is something I can tweak in the reporting, I couldn't find where I tweak it since it is supposed to be related to the one line of activation of the reporting learn.callback_fns.append(partial(LearnerTensorboardWriter, base_dir=tboard_path, name=taskName))
do you have any ideas what are the options I can do to change the report of the train_loss?
the "Payload" tab contains the project id info, so it shouldn't be the cause for the delete call fail
in the meantime, I got this error message, this time regarding Trains:
going to the server URL:8080 -> old trains login (working as usual if I enter my credentials) -> Ctrl-F5 -> switched to the new interface
Thanks Jake for your help, it's highly appreciated. This is an AWS EC2 running the clearml-server AMI (region of EC2 is us-east-1)
What's interesting is that SOMETIMES (rarely) it succeeds
tried both with Firefox and Chrome, results are similar also between computers and OS (ubuntu and Windows)
How do I check that? I did nothing in the settings other than the recommended defaults in the docs
this is an error during training that points out to ElasticSearch error. This might be also the cause of the delete error, what do you think SuccessfulKoala55 ?
(checking now, there are no Load Balancers in this region)
no, I meant to change the way it is reported. I'm still interested in the train_loss graph, naturally 🙂 but obviously it is reporting something that is the inverse of the train_loss, since in the graph it is exploding, and in reality (as reported in the terminal) it is decaying to 9e-2
The valid_loss and Accuracy are showing on the Tboard in the same number values as they show up on the terminal, but the train_loss is showing in a different scale and I can't figure out why. I did not change anything in the core files of either torc, Tboard or fastai, and used the intialization in the same way that you showed, and was on fastai docs, using learn.callback_fns.append(partial(LearnerTensorboardWriter, base_dir=tboard_path, name=taskName))