Oh, I just realized that the mondo version between ServerA and ServerB is mismatch.
The problem was resolved by updating the mongo image to 4.0.23 as serverA.
Hello, after did the steps you mentioned https://clearml.slack.com/archives/CTK20V944/p1659702067809619?thread_ts=1659694970.919069&cid=CTK20V944
The server is now can start properly but Clearml UI doesn’t show any experiments that I cloned from serverA. Any suggestion? thank you!
VictoriousPenguin97 I'm assuming the exact same server version ?
VictoriousPenguin97 basically spin down sereverA (this should flush all DBs) then copy /opt/clearml to the new server and spin it with docker-compose. As long as the new server is on the same address as the previous one, everything should work out of the box
I mean migrating the data from serverA to serverB.
I just replace serverB with ServerA’s /opt/clearml/data
.
After I diddocker-compose -f docker-compose.yml down docker-compose -f docker-compose.yml up -d
Then elasticsearch container got this error
ElasticsearchException[failed to bind service]; nested: IOException[failed to test writes in data directory [/usr/share/elasticsearch/data/nodes/0/indices/mQ-x_DoZQ-iZ7OfIWGZ72g/_state] write permission is required]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes/0/indices/mQ-x_DoZQ-iZ7OfIWGZ72g/_state/.es_temp_file]; clearml-elastic | Likely root cause: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes/0/indices/mQ-x_DoZQ-iZ7OfIWGZ72g/_state/.es_temp_file clearml-elastic | at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) clearml-elastic | at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) clearml-elastic | at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) clearml-elastic | at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219) clearml-elastic | at java.base/java.nio.file.Files.newByteChannel(Files.java:380) clearml-elastic | at java.base/java.nio.file.Files.createFile(Files.java:658) clearml-elastic | at org.elasticsearch.env.NodeEnvironment.tryWriteTempFile(NodeEnvironment.java:1313) clearml-elastic | at org.elasticsearch.env.NodeEnvironment.assertCanWrite(NodeEnvironment.java:1284) clearml-elastic | at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:342) clearml-elastic | at org.elasticsearch.node.Node.<init>(Node.java:427) clearml-elastic | at org.elasticsearch.node.Node.<init>(Node.java:309) clearml-elastic | at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:234) clearml-elastic | at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:234) clearml-elastic | at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:434) clearml-elastic | at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:166) clearml-elastic | at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:157) clearml-elastic | at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77) clearml-elastic | at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112) clearml-elastic | at org.elasticsearch.cli.Command.main(Command.java:77) clearml-elastic | at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:122) clearml-elastic | at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80) clearml-elastic | For complete error details, refer to the log at /usr/share/elasticsearch/logs/clearml.log
I'm not entirely sure which steps you took and if you missed something. Elastic is complaining about permissions - Maybe you missed one of the steps?
Did I migrate the data correctly using the steps I took?
I already did
chmod 777 on /opt/clearml/data
or there’s other folders I need to grant the permission
For example, ServerA stores file at /opt/clearml but ServeB stores at /some_path/clearml
As long as you adjust your docker-compose yaml file, should be just fine
Is it ok if the path of ServerA and ServerB is difference.
For example, ServerA stores file at /opt/clearml but ServeB stores at /some_path/clearml
Looks like a permissions issue:nested: IOException[failed to test writes in data directory [/usr/share/elasticsearch/data/nodes/0/indices/mQ-x_DoZQ-iZ7OfIWGZ72g/_state] write permission is required]; nested
I’ve follow the installation steps that mentioned in this page
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac/
Then I replaced /opt/clearml/data
of ServerB by ServerA /opt/clearml/data
.
And you have the exact same folder structure / content, and server A/B give a different set of experiments ?
(is serverB empty, meaning no experiments at all?)