求,魔装学园hxh本子全彩云盘

22:44 提问
关于hbase HMaster启动后自动关闭的问题
最近在学习hbase,按照教程配置好后,发现使用hbase-daemon.sh start master 启动后,HMaster进程一会自动退出,查看日志如下:
04:38:32,322 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
04:38:32,322 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
04:38:32,322 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2115)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:152)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2129)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:1111)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:1101)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:1085)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:164)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:157)
at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:348)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2110)
... 5 more
请高手看看是什么原因,在网上找了很多解决办法,但还是无法解决。
按赞数排序
一、启动HBase
在Namenode节点上执行start-hbase.sh后,HMaster启动了,但是过几秒钟就挂了,
查看日志报错:
[master:master:60000] catalog.CatalogTracker: Failed verificat......答案就在这里:----------------------Hi,地球人,我是问答机器人小S,上面的内容就是我狂拽酷炫叼炸天的答案,除了赞同,你还有别的选择吗?
找到一个同样问题,供你参考下:
其他相似问题博客访问: 2066536
博文数量: 187
博客积分: 613
博客等级: 中士
技术积分: 5416
注册时间:
认证徽章:
人, 既无虎狼之爪牙,亦无狮象之力量,却能擒狼缚虎,驯狮猎象,无他,唯智慧耳。
IT168企业级官微
微信号:IT168qiye
系统架构师大会
微信号:SACC2013
分类: HADOOP
& & &好久没有来写博客了,总算是忙完了,今天回到阔别已久的CU。一早来到单位,就开始着手调试新测试镜像。但是一启动就出了问题,原先调试好的分布式平台却提示了错误:
& & &Zookeeper available but no active master location found
& & &直观的感觉是HMaster的问题,果然,JPS查看发现没有了HMaster进程,进入到hbase-master日志中查看,发现了以下错误提示:
& & &Could not obtain block: blk_number... ...file=/hbase/hbase.version
& & &无法访问数据块的原因无非有两个:一是该数据块不存在;二是该数据块没有权限。自己去HDFS下查看发现了/hbase目录,也有hbase.version文件,虽然该文件大小为0kb。于是自己首先想到是权限问题,接下来开始为/hbase修改权限:
& & &%hadoop fs -chmod 777 /hbase
& & &%hadoop fs -chmod -R 777 /hbase & & (修改目录权限)
& & &但是试过之后结果依旧。这时自己确定HMaster自动关闭的问题不是因为目录权限拒绝访问,那么是什么呢?之前也发生过HMaster启动后自动关闭的问题,自己当时的解决办法是格式化namenode即可:
& & &%hadoop namenode -format
& & &但是这次试过之后仍旧不成功,于是自己考虑会不会是由于分布式环境下不同节点的hdfs的重复工作导致的不一致使得HMaster无法正常启动呢?抱着这样的想法删掉了各个节点和master上的hdfs数据,在master上重新启动hbase结果成功,HMaster不再自动关闭。
& & 这时我们需要重新复制生成HDFS干净的HDFS:
& & &%rm -Rf reports/data
& &&%hadoop fs -copyFromLocal reports /texaspete/templates/reports
& & %hadoop fs -put match/src/main/resources/regexes /texaspete/regexes
& & &HMaster启动后自动关闭可能有多种原因,按照自己的经验,可以试着尝试以下方法:
1. 重新格式化namenode,重启HMaster看问题是否依旧存在;
2. 检查/hbase目录的hdfs权限设置是否有问题;
3. 清楚各个节点上的hbase数据,然后重新启动hbase,自己猜测是各节点状态不一致导致的HMaster无法启动问题。
& & &当然,应该还会有其他的情况,随着学习的深入慢慢来积累总结吧!
阅读(19142) | 评论(0) | 转发(1) |
相关热门文章
给主人留下些什么吧!~~
请登录后评论。每日一搏 | HBase 之 HMaster 工作原理 - 推酷
每日一搏 | HBase 之 HMaster 工作原理
监控所有的RegionServer,一般运行在namenode
从图中可以看出,主要有External Interfaces、Executor Services、Zookeeper System Trackers、File System Interfaces、Chores和Others六部分组成
1. External Interfaces
查询regionserver机群状态
InfoServer, 一个jetty部署的web应用,默认端口 60010 ,查看集群状态。
RpcServer,提供rpc与master的通信方式,支持Writable和protobuf。
Master MXBean,提供JMX查询状态metrics,知道有就好,一般也用不上。
2. Executor Services
监听Zookeeper的消息队列,交给各种handler处理
Open Region Service,当master监测到一个region成功运行(通过ZK的watch机制),发送一个消息RS ZK REGION_OPENED给这个service,这个事件会触发OpenRegionHandler()方法。
Close Region Service,(同上,消息为:RS ZK REGION_CLOSED)
Server Operations Service, master检测到有region需要split,交给SplitRegionHandler处理,同时,master需要关闭一个region(非root或meta),发送一条消息:M SERVER SHUTDOWN ,交给ServerShutdownHandler处理
Meta Server Operation Service,当关闭的region是ROOT或者META的宿主机,
Table Operation Service,对表的删除、disable、enable、修改、创建,
3. Zookeeper System Trackers
master或者regionServer是通过zokeeper实现跟踪特定的消息事件
Active Master Manager,处理master的所有时间,包括master的选举
Region Server Tracker,维护region的一个list,挂载到/hbase/rs目录下,任何region的新增和删除,都在这个目录下
Drainning Server Tracker,region server退役之后,转变成draining Server挂载在/hbase_root/draining
Catalog Tracker,-ROOT或者.META
Cluster Status Tracker,集群是否正常启动
Assignment Manager,当master挂掉之后,可以通过这里找到过渡时期的region
Root Region Tracker,监控root的位置和状态变化
Load Balance,决定是否要在RegionServer之中移动Region
Meta Node Tracker
Master Address Tracker
4. File System Interfaces
与底层的文件系统打交道的接口
MasterFileSystem,抽象的一个文件系统
Log Cleaner,一个后台任务(chores),默认有两个TimeToLiveLogCleaner和ReplicationLogCleaner,可以自己实现,添加到配置中。
HFile Cleaner一个HBase内置的周期性的任务。默认有一个TimeToLiveHFileCleaner在执行,可以自己实现,添加到配置中。
一些后台自动运行的任务
Balancer Chore,用来平衡hdfs机群的datanode是否平衡,有一个可配置的阈值,一般是10%,会检查所有node的使用率,自动迁移blocks。
Catalog Janitor Chore,一个检查catelog的任务,查看未使用的regions,进行垃圾回收
Log Cleaner Chore(同上)
HFile Cleaner Chore (同上)
Server Manager,维护regionserver的信息,维护online和dead的server,处理regionServer的启动、关闭、死亡
Co-Processor Host,通用的框架和Hbase服务调用,自己实现hbase的二级索引的时候一般会用到。
已发表评论数()
请填写推刊名
描述不能大于100个字符!
权限设置: 公开
仅自己可见
正文不准确
标题不准确
排版有问题
主题不准确
没有分页内容
图片无法显示
视频无法显示
与原文不一致  这一章是server端开始的第一章,有兴趣的朋友先去看一下,我专门从网上弄下来的。
  按照HMaster的run方法的注释,我们可以了解到它的启动过程会去做以下的动作。
* &li&阻塞直到变成ActiveMaster
* &li&结束初始化操作
* &li&循环
* &li&停止服务并执行清理操作* &/ol&
  HMaster是没有单点问题是,因为它可以同时启动多个HMaster,然后通过zk的选举算法选出一个HMaster来。
  我们首先来看看这个阻塞直到变成ActiveMaster的过程吧。
  1、如果不是master的话,就一直睡,isActiveMaster的判断条件是,在zk当中没有master节点,如果没有就一直等待。master节点建立之后,就开始向Master冲刺了,先到先得。
  2、尝试着在master节点下面把自己的ServerName给加上去,如果加上去了,它就成为master了,成为master之后,就把自己从备份节点当中删除。
  3、如果没有成为master,把自己添加到备份节点,同时检查一下当前的master节点,如果是和自己一样,那就是出现异常了,明明是设置成功了,确说不成功,接下来它就会一直等待,等到master死掉。
  成为master之后的结束初始化操作,这才是重头戏啊,前面的都是小意思,实例化的代码我就补贴了,看着也没啥意思,就把这些属性贴出来吧,让大家认识认识。
/** 专门负责master和hdfs交互的类
private MasterFileSystem fileSystemM
/** 专门用于管理Region Server的管理器
ServerManager serverM
/** 专门用于管理zk当中nodes的节点
AssignmentManager assignmentM
/** 负责负载均衡的类
private LoadB
/** 负责读取在hdfs上面的表定义的类 */
private TableDescriptors tableD
/** 表级别的分布式锁,专门负责监控模式的变化
private TableLockManager tableLockM
/** 负责监控表的备份 */
private SnapshotManager snapshotM
  这些类都会被实例化,具体的顺序就不讲了,这个不是特别重要。开学啦,等到region server过来报道,还要记录一下在zk当中注册了的,但是没有在master这里报道的,不做处理。
// 等待region server过来注册,至少要有一个启动了
this.serverManager.waitForRegionServers(status);
// 检查在zk当中注册了,但是没在master这里注册的server
for (ServerName sn: this.regionServerTracker.getOnlineServers()) {
if (!this.serverManager.isServerOnline(sn)
&& serverManager.checkAlreadySameHostPortAndRecordNewServer(
sn, ServerLoad.EMPTY_SERVERLOAD)) {
("Registered server found up in zk but who has not yet "
+ "reported in: " + sn);
分配META表前的准备工作,Split Meta表的日志
  okay,下面是重头戏了,准备分配meta表了,先启动个计时器。
// 启动超时检查器了哦
if (!masterRecovery) {
this.assignmentManager.startTimeOutMonitor();
&  上代码,从日志文件里面找出来挂了的server,然后对这些server做处理。
  // 从WALs目录下找出挂了的机器
Set&ServerName& previouslyFailedServers = this.fileSystemManager
.getFailedServersFromLogFolders();
// 删除之前运行的时候正在恢复的region,在zk的recovering-regions下所有的region节点一个不留
this.fileSystemManager.removeStaleRecoveringRegionsFromZK(previouslyFailedServers);
// 获取就的meta表的位置,如果在已经挂了的机器上
ServerName oldMetaServerLocation = this.catalogTracker.getMetaLocation();
//如果meta表在之前挂了的server上面,就需要把meta表的日志从日志文件里面单独拿出来
if (oldMetaServerLocation != null && previouslyFailedServers.contains(oldMetaServerLocation)) {
splitMetaLogBeforeAssignment(oldMetaServerLocation);
  F3进入getFailedServersFromLogFolders方法。
     //遍历WALs下面的文件
FileStatus[] logFolders = FSUtils.listStatus(this.fs, logsDirPath, null);//获取在线的server的集合
Set&ServerName& onlineServers = ((HMaster) master).getServerManager().getOnlineServers().keySet();
for (FileStatus status : logFolders) {
String sn = status.getPath().getName();
//如果目录名里面包含-splitting,就是正在split的日志
if (sn.endsWith(HLog.SPLITTING_EXT)) {
sn = sn.substring(0, sn.length() - HLog.SPLITTING_EXT.length());
//把字符串的机器名转换成ServerName
ServerName serverName = ServerName.parseServerName(sn);
//如果在线的机器里面不包括这个ServerName就认为它是挂了
if (!onlineServers.contains(serverName)) {
serverNames.add(serverName);
("Log folder " + status.getPath() + " belongs to an existing region server");
  从代码上面看得出来,从WALs日志的目录下面找,目录名称里面就包括ServerName,取出来和在线的Server对比一下,把不在线的加到集合里面,最后返回。看来这个目录下都是出了问题的Server才会在这里混。
  我们接着回到上面的逻辑,查出来失败的Server之后,从zk当中把之前的Meta表所在的位置取出来,如果Meta表在这些挂了的Server里面,就糟糕了。。得启动恢复措施了。。。先把META表的日志从日志堆里找出来。我们进入splitMetaLogBeforeAssignment这个方法里面看看吧。
 private void splitMetaLogBeforeAssignment(ServerName currentMetaServer) throws IOException {
//该参数默认为false
if (this.distributedLogReplay) {
// In log replay mode, we mark hbase:meta region as recovering in ZK
Set&HRegionInfo& regions = new HashSet&HRegionInfo&();
regions.add(HRegionInfo.FIRST_META_REGIONINFO);
this.fileSystemManager.prepareLogReplay(currentMetaServer, regions);
// In recovered.edits mode: create recovered edits file for hbase:meta server
this.fileSystemManager.splitMetaLog(currentMetaServer);
  可以看出来这里面有两种模式,分布式文件恢复模式,通过zk来恢复,还有一种是recovered.edit模式,通过创建recovered.edits文件来恢复。文件恢复是通过hbase.master.distributed.log.replay参数来设置,默认是false,走的recovered.edit模式。看得出来,这个函数是为恢复做准备工作的,如果是分布式模式,就执行prepareLogReplay准备日志恢复,否则就开始创建recovered.edits恢复文件。
  (a)prepareLogReplay方法当中,把HRegionInfo.FIRST_META_REGIONINFO这个region添加到了recovering-regions下面,置为恢复中的状态。
  (b)下面看看splitMetaLog吧,它是通过调用这个方法来执行split日志的,通过filter来过滤META或者非META表的日志,META表的日志以.meta结尾。
public void splitLog(final Set&ServerName& serverNames, PathFilter filter) throws IOException {
long splitTime = 0, splitLogSize = 0;
//修改WALs日志目录的名称,在需要分裂的目录的名称后面加上.splitting,准备分裂
List&Path& logDirs = getLogDirs(serverNames);
//把这些挂了的server记录到splitLogManager的deadWorkers的列表
splitLogManager.handleDeadWorkers(serverNames);
splitTime = EnvironmentEdgeManager.currentTimeMillis();
//split日志
splitLogSize = splitLogManager.splitLogDistributed(serverNames, logDirs, filter);
splitTime = EnvironmentEdgeManager.currentTimeMillis() - splitT
//记录split结果到统计数据当中
if (this.metricsMasterFilesystem != null) {
if (filter == META_FILTER) {
this.metricsMasterFilesystem.addMetaWALSplit(splitTime, splitLogSize);
this.metricsMasterFilesystem.addSplit(splitTime, splitLogSize);
  上面也带了不少注释了,不废话了,进splitLogDistributed里面瞅瞅吧。
public long splitLogDistributed(final Set&ServerName& serverNames, final List&Path& logDirs,
PathFilter filter) throws IOException {//读取文件
FileStatus[] logfiles = getFileList(logDirs, filter);long totalSize = 0;
//任务跟踪器,一个batch包括N个任务,最后统计batch当中的总数
TaskBatch batch = new TaskBatch();
Boolean isMetaRecovery = (filter == null) ? null : false;
for (FileStatus lf : logfiles) {
totalSize += lf.getLen();
//获得root后面的相对路径
String pathToLog = FSUtils.removeRootPath(lf.getPath(), conf);
//把任务插入到Split任务列表当中
if (!enqueueSplitTask(pathToLog, batch)) {
throw new IOException("duplicate log split scheduled for " + lf.getPath());
//等待任务结束
waitForSplittingCompletion(batch, status);
if (filter == MasterFileSystem.META_FILTER)
isMetaRecovery = true;
//清理recovering的状态,否则region server不让访问正在恢复当中的region
this.removeRecoveringRegionsFromZK(serverNames, isMetaRecovery);
if (batch.done != batch.installed) {
//启动的任务和完成的任务不相等
batch.isDead = true;
String msg = "error or interrupted while splitting logs in "
+ logDirs + " Task = " +
throw new IOException(msg);
//最后清理日志
for(Path logDir: logDirs){try {
if (fs.exists(logDir) && !fs.delete(logDir, false)) {
} catch (IOException ioe) {
status.markComplete(msg);
return totalS
  它的所有的文件的split文件都插入到一个split队列里面去,然后等待结束,这里面有点儿绕了,它是到zk的splitWALs节点下面为这个文件创建一个节点,不是原来的相对路径名,是URLEncoder.encode(s, "UTF-8")加密过的。
  呵呵,看到这里是不是要晕了呢,它是在zk里面创建一个节点,然后不干活,当然不是啦,在每个Region Server里面读会启动一个SplitLogWorker去负责处理这下面的日志。split处理过程在HLogSplitter.splitLogFile方法里面,具体不讲了,它会把恢复文件在region下面生成一个recovered.edits目录里面。
分配META表
  下面就开始指派Meta表的region啦。
void assignMeta(MonitoredTask status)
throws InterruptedException, IOException, KeeperException {
// Work on meta region
int assigned = 0;
ServerName logReplayFailedMetaServer = null;
//在RegionStates里面状态状态,表名该region正在变化当中
assignmentManager.getRegionStates().createRegionState(HRegionInfo.FIRST_META_REGIONINFO);
//处理meta表第一个region,重新指派
boolean rit = this.assignmentManager.processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.FIRST_META_REGIONINFO);
//这个应该是meta表,hbase:meta,等待它在zk当中可以被访问
boolean metaRegionLocation = this.catalogTracker.verifyMetaRegionLocation(timeout);
if (!metaRegionLocation) {
assigned++;
if (!rit) {
// 没分配成功,又得回头再做一遍准备工作
ServerName currentMetaServer = this.catalogTracker.getMetaLocation();
if (currentMetaServer != null) {
if (expireIfOnline(currentMetaServer)) {
splitMetaLogBeforeAssignment(currentMetaServer);
if (this.distributedLogReplay) {
logReplayFailedMetaServer = currentMetaS
//删掉zk当中的meta表的位置,再分配
assignmentManager.assignMeta();
//指派了,就更新一下它的状态为online
this.assignmentManager.regionOnline(HRegionInfo.FIRST_META_REGIONINFO,this.catalogTracker.getMetaLocation());
//在zk当中启用meta表
enableMeta(TableName.META_TABLE_NAME);
// 启动关机处理线程
enableServerShutdownHandler(assigned != 0);
if (logReplayFailedMetaServer != null) {
// 这里不是再来一次,注意了啊,这个是分布式模式状态下要进行的一次meta表的日志split,
//回头看一下这个变量啥时候赋值就知道了
this.fileSystemManager.splitMetaLog(logReplayFailedMetaServer);
  历经千辛万苦跟踪到了这个方法里面,通过RPC,向随机抽出来的Region Server发送请求,让它打开region。
public RegionOpeningState sendRegionOpen(final ServerName server,
HRegionInfo region, int versionOfOfflineNode, List&ServerName& favoredNodes)
throws IOException {
AdminService.BlockingInterface admin = getRsAdmin(server);
if (admin == null) {return RegionOpeningState.FAILED_OPENING;
//构建openRegion请求,
OpenRegionRequest request =
RequestConverter.buildOpenRegionRequest(region, versionOfOfflineNode, favoredNodes);
//调用指定的Region Server的openRegion方法
OpenRegionResponse response = admin.openRegion(null, request);
return ResponseConverter.getRegionOpeningState(response);
} catch (ServiceException se) {
throw ProtobufUtil.getRemoteException(se);
  这个工作完成, 如果是分布式文件恢复模式,还需要进行这个工作,recovered.edit模式之前已经干过了
//获取正在恢复的meta region server
Set&ServerName& previouslyFailedMetaRSs = getPreviouselyFailedMetaServersFromZK();
if (this.distributedLogReplay && (!previouslyFailedMetaRSs.isEmpty())) {
previouslyFailedMetaRSs.addAll(previouslyFailedServers);
this.fileSystemManager.splitMetaLog(previouslyFailedMetaRSs);
分配用户Region
  之后就是一些清理工作了,处理掉失败的server,修改一些不正确的region的状态,分配所有用户的region。
// 已经恢复了meta表,我们现在要处理掉其它失败的server
for (ServerName tmpServer : previouslyFailedServers) {
this.serverManager.processDeadServer(tmpServer, true);
// 如果是failover的情况,就修复assignmentManager当中有问题的region状态,如果是新启动的,就分配所有的用户region
this.assignmentManager.joinCluster();
  分配region的工作都是由assignmentManager来完成的,在joinCluster方法中的调用的processDeadServersAndRegionsInTransition的最后一句调用的assignAllUserRegions方法,隐藏得很深。。经过分配过的region,hmaster在启动的时候默认会沿用上一次的结果,就不再变动了,这个是由一个参数来维护的hbase.master.startup.retainassign,默认是true。分配用户region的方法和分配meta表的过程基本是一致的。
  至此HMaster的启动过程做的工作基本结束了。
阅读(...) 评论()后使用快捷导航没有帐号?
查看: 17428|回复: 14
hbase Hmaster 进程启动后一会消失的问题解决过程
高级会员, 积分 939, 距离下一级还需 61 积分
论坛徽章:4
1、 安装好
后,发现启动start-hbase.sh 后
& &&&hmaster 进程总是过一会就没了
& &&&hbase(main):004:0& create 't1', {NAME =& 'f1', VERSIONS =& 5}
& & ERROR: Can't get master address from ZooK znode data == null
2、节点1 zookeeper的日志信息如下:
09:16:49,263 DEBUG [main] hbase.HBaseConfiguration: preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@175cb80, name=log4j:logger=org.apache..hbase.HBaseConfiguration
09:16:49,264 DEBUG [main] hadoop.hbase: preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@175cb80, name=log4j:logger=org.apache.hadoop.hbase
09:16:49,264 INFO&&[main] quorum.QuorumPeerMain: Starting quorum peer
09:16:49,293 INFO&&[main] server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
09:16:49,534 INFO&&[main] quorum.QuorumPeer: tickTime set to 3000
09:16:49,534 INFO&&[main] quorum.QuorumPeer: minSessionTimeout set to -1
09:16:49,534 INFO&&[main] quorum.QuorumPeer: maxSessionTimeout set to 90000
09:16:49,534 INFO&&[main] quorum.QuorumPeer: initLimit set to 10
09:16:50,114 INFO&&[main] persistence.FileSnap: Reading snapshot /home/hadoop/tmp/hbase/zookeeper/version-2/snapshot.
09:16:50,537 INFO&&[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.1.3:35467
09:16:50,594 INFO&&[Thread-1] quorum.QuorumCnxManager: My election bind port: 0.0.0.0/0.0.0.0:3888
09:16:50,598 WARN&&[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Exception causing close of session 0x0 due to .io.IOException: ZooKeeperServer not running
09:16:50,598 INFO&&[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.1.3:35467 (no session established for client)
09:16:50,669 INFO&&[QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181] quorum.QuorumPeer: LOOKING
09:16:50,673 INFO&&[QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181] quorum.FastLeaderElection: New election. My id =&&0, proposed zxid=0xe
09:16:50,680 INFO&&[WorkerReceiver[myid=0]] quorum.FastLeaderElection: Notification: 0 (n.leader), 0xe (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x6 (n.peerEPoch), LOOKING (my state)
09:16:50,748 WARN&&[WorkerSender[myid=0]] quorum.QuorumCnxManager: Cannot open channel to 1 at election address /192.168.1.4:3888
java.net.ConnectException: Connection refused
节点2的zookeeper日志
md64:/usr/lib64:/lib64:/lib:/usr/lib
09:16:59,684 INFO&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:java.io.tmpdir=/tmp
09:16:59,684 INFO&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server piler=&NA&
09:16:59,685 INFO&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environments.name=
09:16:59,685 INFO&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:os.arch=amd64
09:16:59,685 INFO&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:os.version=2.6.32-358.el6.x86_64
09:16:59,685 INFO&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:user.name=hadoop
09:16:59,685 INFO&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:user.home=/home/hadoop
09:16:59,685 INFO&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Server environment:user.dir=/home/hadoop/hbase-0.98.0-hadoop1
09:16:59,706 INFO&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] server.ZooKeeperServer: Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 90000 datadir /home/hadoop/tmp/hbase/zookeeper/version-2 snapdir /home/hadoop/tmp/hbase/zookeeper/version-2
09:16:59,715 INFO&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] quorum.Learner: FOLLOWING - LEADER ELECTION TOOK - 518
09:16:59,813 INFO&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] quorum.Learner: Getting a diff from the leader 0xe
09:16:59,834 INFO&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] persistence.FileTxnSnapLog: Snapshotting: 0xe to /home/hadoop/tmp/hbase/zookeeper/version-2/snapshot.e
09:16:59,908 INFO&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] persistence.FileTxnSnapLog: Snapshotting: 0xe to /home/hadoop/tmp/hbase/zookeeper/version-2/snapshot.e
09:17:00,985 INFO&&[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.1.4:43783
09:17:00,993 INFO&&[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.1.4:43783
09:17:00,997 WARN&&[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181] quorum.Learner: Got zxid 0x expected 0x1
09:17:00,998 INFO&&[SyncThread:1] persistence.FileTxnLog: Creating new log file: log.
09:17:01,097 INFO&&[CommitProcessor:1] server.ZooKeeperServer: Established session 0x0000 with negotiated timeout 90000 for client /192.168.1.4:43783
09:17:01,561 INFO&&[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.1.3:54568
09:17:01,563 INFO&&[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.1.3:54568
09:17:01,571 INFO&&[CommitProcessor:1] server.ZooKeeperServer: Established session 0x0001 with negotiated timeout 90000 for client /192.168.1.3:54568
09:17:01,995 INFO&&[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.1.5:55593
09:17:01,998 INFO&&[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.1.5:55593
09:17:02,013 INFO&&[CommitProcessor:1] server.ZooKeeperServer: Established session 0x0002 with negotiated timeout 90000 for client /192.168.1.5:55593
09:18:39,080 INFO&&[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /192.168.1.3:54568 which had sessionid 0x0001
节点3 zooker日志
014-03-04 09:51:03,814 INFO&&[main] quorum.QuorumPeerConfig: Defaulting to majority quorums
09:51:04,739 DEBUG [main] util.Bytes: preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@4ff93efa, name=log4j:logger=org.apache.hadoop.hbase.util.Bytes
09:51:04,743 DEBUG [main] util.VersionInfo: preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@4ff93efa, name=log4j:logger=org.apache.hadoop.hbase.util.VersionInfo
09:51:04,745 DEBUG [main] zookeeper.ZKConfig: preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@4ff93efa, name=log4j:logger=org.apache.hadoop.hbase.zookeeper.ZKConfig
09:51:04,745 DEBUG [main] hbase.HBaseConfiguration: preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@4ff93efa, name=log4j:logger=org.apache.hadoop.hbase.HBaseConfiguration
09:51:04,746 DEBUG [main] hadoop.hbase: preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@4ff93efa, name=log4j:logger=org.apache.hadoop.hbase
09:51:04,746 INFO&&[main] quorum.QuorumPeerMain: Starting quorum peer
09:51:04,861 INFO&&[main] server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
09:51:05,430 INFO&&[main] quorum.QuorumPeer: tickTime set to 3000
09:51:05,432 INFO&&[main] quorum.QuorumPeer: minSessionTimeout set to -1
09:51:05,433 INFO&&[main] quorum.QuorumPeer: maxSessionTimeout set to 90000
09:51:05,433 INFO&&[main] quorum.QuorumPeer: initLimit set to 10
09:51:05,641 INFO&&[main] persistence.FileSnap: Reading snapshot /home/hadoop/tmp/hbase/zookeeper/version-2/snapshot.0
09:51:06,181 INFO&&[Thread-1] quorum.QuorumCnxManager: My election bind port: 0.0.0.0/0.0.0.0:3888
09:51:06,226 INFO&&[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.QuorumPeer: LOOKING
09:51:06,236 INFO&&[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.FastLeaderElection: New election. My id =&&2, proposed zxid=0x
09:51:06,567 WARN&&[WorkerSender[myid=2]] quorum.QuorumCnxManager: Cannot open channel to 0 at election address backup01/192.168.1.3:3888
java.net.ConnectException: Connection refused
& && && &at java.net.PlainSocketImpl.socketConnect(Native Method)
& && && &at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
& && && &at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
& && && &at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
& && && &at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
& && && &at java.net.Socket.connect(Socket.java:579)
& && && &at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
& && && &at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
& && && &at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)
& && && &at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)
& && && &at java.lang.Thread.run(Thread.java:744)
09:51:07,759 WARN&&[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.QuorumCnxManager: Cannot open channel to 0 at election address backup01/192.168.1.3:3888
java.net.ConnectException: Connection refused
& && && &at java.net.PlainSocketImpl.socketConnect(Native Method)
& && && &at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
& && && &at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
& && && &at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
& && && &at java.net.Socket.connect(Socket.java:579)
& && && &at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
& && && &at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)
& && && &at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:765)
& && && &at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
09:51:07,776 INFO&&[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.FastLeaderElection: Notification time out: 400
09:51:07,790 WARN&&[WorkerSender[myid=2]] quorum.QuorumCnxManager: Cannot open channel to 1 at election address backup02/192.168.1.4:3888
java.net.ConnectException: Connection refused
& && && &at java.net.PlainSocketImpl.socketConnect(Native Method)
& && && &at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
& && && &at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
& && && &at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
& && && &at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
& && && &at java.net.Socket.connect(Socket.java:579)
& && && &at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
& && && &at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
& && && &at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)
& && && &at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)
& && && &at java.lang.Thread.run(Thread.java:744)
09:51:07,795 INFO&&[WorkerReceiver[myid=2]] quorum.FastLeaderElection: Notification: 2 (n.leader), 0x (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
09:51:08,229 WARN&&[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.QuorumCnxManager: Cannot open channel to 0 at election address backup01/192.168.1.3:3888
java.net.ConnectException: Connection refused
& && && &at java.net.PlainSocketImpl.socketConnect(Native Method)
& && && &at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
& && && &at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
& && && &at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
& && && &at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
& && && &at java.net.Socket.connect(Socket.java:579)
& && && &at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
& && && &at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)
& && && &at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:765)
& && && &at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:716)
09:51:08,348 WARN&&[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181] quorum.QuorumCnxManager: Cannot open channel to 1 at election address backup02/192.168.1.4:3888
java.net.ConnectException: Connection refused
& && && &at java.net.PlainSocketImpl.socketConnect(Native Method)
& && && &at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
& && && &at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
& && && &at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
& && && &at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
& && && &at java.net.Socket.connect(Socket.java:579)
& && && &at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
& && && &at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)
3、检查hadoop datanode 日志
10:31:12,129 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:& &host = backup02/192.168.1.4
STARTUP_MSG:& &args = []
STARTUP_MSG:& &version = 1.2.1
STARTUP_MSG:& &build =
-r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:& &java = 1.7.0_45
************************************************************/
10:31:16,281 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
10:31:16,600 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
10:31:16,658 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
10:31:16,658 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
10:31:18,917 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
10:31:27,163 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/hadoop/tmp/dfs/data: namenode namespaceID = ; datanode namespaceID =
& && &&&at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
& && &&&at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
& && &&&at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
& && &&&at org.apache.hadoop.hdfs.server.datanode.DataNode.&init&(DataNode.java:321)
& && &&&at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
& && &&&at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
& && &&&at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
& && &&&at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
& && &&&at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)
10:31:27,221 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
节点1 的namenode 日志:
014-03-05 09:15:05,376 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.io.IOException: File /home/hadoop/tmp/mapred/ could only be replicated to 0 nodes, instead of 1
09:15:05,378 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000, call addBlock(/home/hadoop/tmp/mapred/, DFSClient_NONMAPREDUCE_-, null) from 192.168.1.3:37809: error: java.io.IOException: File /home/hadoop/tmp/mapred/ could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /home/hadoop/tmp/mapred/ could only be replicated to 0 nodes, instead of 1
& && &&&at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
& && &&&at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
& && &&&at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
& && &&&at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
& && &&&at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
& && &&&at java.lang.reflect.Method.invoke(Method.java:606)
& && &&&at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
& && &&&at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
& && &&&at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
& && &&&at java.security.AccessController.doPrivileged(Native Method)
& && &&&at javax.security.auth.Subject.doAs(Subject.java:415)
& && &&&at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
& && &&&at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
09:15:06,454 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.io.IOException: File /home/hadoop/tmp/mapred/ could only be replicated to 0 nodes, instead of 1
此时感觉到主要问题出现在hadoop上,2节点的datanode 日志信息中:
Incompatible namespaceIDs in /home/hadoop/tmp/dfs/data: namenode namespaceID =
1节点的namenode日志信息::java.io.IOException: File /home/hadoop/tmp/mapred/ could only be replicated to 0 nodes, instead of 1
此时想起之前由于重新格式化过namenode ,可能之前的版本信息没有删除导致
4、清理相应的目录,保险起见又重新格式化了namenode
5、重新启动hadoop 及hbase 运行正常
6、hbase(main):003:0& create 't1', {NAME =& 'f1', VERSIONS =& 5}
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.98.0-hadoop1/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-1.2.1/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See
for an explanation.
0 row(s) in 11.1950 seconds
=& Hbase::Table - t1
hbase(main):004:0& list
TABLE& && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && &&&
t1& && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && && &&&
1 row(s) in 0.5330 seconds
高级会员, 积分 601, 距离下一级还需 399 积分
论坛徽章:1
高级会员, 积分 789, 距离下一级还需 211 积分
论坛徽章:7
学习一下,谢谢
高级会员, 积分 675, 距离下一级还需 325 积分
论坛徽章:8
好多人都遇到这个问题,都放到了hbase里,其实这是咱们在hadoop的HDFS里遇到的问题
注册会员, 积分 151, 距离下一级还需 49 积分
论坛徽章:6
namenode格式化主要是干什么用呀
高级会员, 积分 606, 距离下一级还需 394 积分
论坛徽章:2
hbase可以不用zookeeper吗 独立不部署
金牌会员, 积分 1932, 距离下一级还需 1068 积分
论坛徽章:11
中级会员, 积分 375, 距离下一级还需 125 积分
论坛徽章:2
多谢分享 留着参考一下
高级会员, 积分 576, 距离下一级还需 424 积分
论坛徽章:2
高级会员, 积分 939, 距离下一级还需 61 积分
论坛徽章:4
神云kg 发表于
hbase可以不用zookeeper吗 独立不部署
hbase 自身带了zookeeper的,在配置文件中打开即可,不用单独部署
扫一扫加入本版微信群

我要回帖

更多关于 魔装学园hxh无修 的文章

 

随机推荐