CDH一个节点故障影响namenode启动

CDH某个节点磁盘故障,导致上面的角色都有问题。启动namenode时失败,日志报下面错误:

Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [10.0.20.102:8485, 10.0.20.103:8485, 10.0.20.104:8485], stream=null))
java.io.IOException: Timed out waiting 120000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createNewUniqueEpoch(QuorumJournalManager.java:197)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:436)
at org.apache.hadoop.hdfs.server.namenode.JournalSet6.apply(JournalSet.java:616) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385) at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:613) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1603) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1210) at org.apache.hadoop.hdfs.server.namenode.NameNode6.apply(JournalSet.java:616)atorg.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)atorg.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:613)atorg.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1603)atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1210)atorg.apache.hadoop.hdfs.server.namenode.NameNodeNameNodeHAContext.startActiveServices(NameNode.java:1898)
at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1756)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1700)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
at org.apache.hadoop.ha.proto.HAServiceProtocolProtosHAServiceProtocolServiceHAServiceProtocolService2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
at org.apache.hadoop.ipc.ProtobufRpcEngineServerServerProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPCServer.call(RPC.java:991) at org.apache.hadoop.ipc.ServerServer.call(RPC.java:991)atorg.apache.hadoop.ipc.ServerRpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.ServerRpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.ipc.ServerRpcCall.run(Server.java:815)atjava.security.AccessController.doPrivileged(NativeMethod)atjavax.security.auth.Subject.doAs(Subject.java:422)atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)atorg.apache.hadoop.ipc.ServerHandler.run(Server.java:2675)

CDH重新启动namenode时报错,错误信息为:Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [10.0.20.102:8485, 10.0.20.103:8485, 10.0.20.104:8485], stream=null)),而且提示等待120秒超时,无法响应。这与Hadoop分布式协议不一致有关,可能由于某个节点的欠缺或故障造成了段错误。建议检查所有节点的网络连接是否正常、端口是否打开,以及检查journal节点的状态。同时也需要检查在这个问题出现之前是否已经存在其他问题。如果仍然无法解决问题,建议向CDH官方技术支持组寻求帮助。

关注公众号“大模型全栈程序员”回复“小程序”获取1000个小程序打包源码。更多免费资源在http://www.gitweixin.com/?p=2627

发表评论

邮箱地址不会被公开。 必填项已用*标注