求生活大爆炸第二季资源一到十季百度云资源!!!!急求!!谢谢谢谢!!麻烦留下链接,谢谢!!

hadoop(3)
hbase(10)
在hadoop集群中snappy是一种比较好的压缩工具,相对gzip压缩速度和解压速度有很大的优势,而且相对节省cpu资源,但压缩率不及gzip。在此不讨论孰优孰劣,各有各的用途。
在hadoop2.X中最好使用源码编译生成snappy的so文件,如果使用其它已编译好的so文件,可能会报出not support错误,以下便是hadoop2.X编译生成snappy的so文件的步骤:
在此,hadoop集群采用的版本是hadoop-2.6.0-cdh5.9.0,系统是centos7.2。
安装前提:
java 7(使用java 8会报一个错误)
java安装并配置PATH
export JAVA_HOME=/usr/java/default
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$PATH:$JRE_HOME/lib
若安装多个版本的jdk,请将上述的/usr/java/default软链接至java7的目录
# source /etc/profile
# java -version
java version "1.7.0_XXX"
Java(TM) SE Runtime Environment (build 1.7.X_XXX-XXX)
Java HotSpot(TM) 64-Bit Server VM (build XXX.XXX-XXX, mixed mode)
yum安装部分基础软件
yum -y install gcc gcc-c++ libtool cmake maven zlib-devel
解压安装基础包
下载如下压缩包:
hadoop-2.6.0-cdh5.9.0-src.tar.gz(下载地址:,也可下载二进制包,内包含src源码:hadoop-2.6.0-cdh5.9.0-tar.gz)
snappy1.1.1.tar.gz(下载地址:)
protobuf-2.5.0.tar.gz(下载地址: 建议选择2.5.0版本,不支持最新版本)
安装snappy
# tar xf snappy-1.1.1.tar.gz
# cd snappy-1.1.1
# ./configure
# make && make install
查看snappy是否安装完成
# ll /usr/local/lib/ | grep snappy
安装protobuf
# tar xf protobuf-2.5.0.tar.gz
# cd protobuf-2.5.0
# ./configure
# make && make install
# protoc --version
libprotoc 2.5.0
编译生成hadoop-native-Libraries(包括snappy)
# tar xf hadoop-2.6.0-cdh5.9.0-src.tar.gz
# cd hadoop-2.6.0-cdh5.9.0
# mvn package -DskipTests -Pdist,native -Dtar -Dsnappy.lib=/usr/local/lib -Dbundle.snappy
如上通过mvn安装,不过通过此安装方法会从官方下载相关所需文件,时间漫长,我大概花了20小时左右,目前由于对mvn工具不是很熟悉,尝试多种方法均未提升下载速度(包括修改下载源,也许修改方式有误,报错找不到相关包)
编译成功后,snappy的so文件会生成在如下目录:
hadoop-2.6.0-cdh5.9.0/hadoop-dist/target/hadoop-2.6.0-cdh5.9.0/lib/native
将此目录下的文件拷贝到hadoop集群中的hadoop下的lib/native目录和hbase下的lib/native/Linux-amd64-64目录下,没有则新建,各节点均需拷贝。
修改配置文件:
$ cat core-site.xml
&property&
&name&io.compression.codecs&/name&
&value&org.apache.hadoop.io.compress.SnappyCodec&/value&
&/property&
重启hadoop和hbase。
测试是否安装成功
$ hadoop checknative -a
true ...../hadoop-2.6.0-cdh5.9.0/lib/native/libhadoop.so
true /usr/local/lib/libz.so.1
true ...../hadoop-2.6.0-cdh5.9.0/lib/native/libsnappy.so.1
true revision:10301
openssl: true /lib64/libcrypto.so
$ hbase org.apache.hadoop.hbase.util.CompressionTest /tmp/crq snappy
....................................................
$ hbase org.apache.hadoop.hbase.util.CompressionTest hdfs:
....................................................
上述的hostname一般要改为namenode的hostname。
此时,若修改hbase中表压缩为snappy,会将此表进入RIT状态,并无法自动恢复,原因在于,hbase(版本1.2)默认关闭压缩状态(关于此,官方文档有所介绍),须在配置文件中开启:
$ cat hbase-site.xml
&hbase.block.data.cachecompressed&
重启hbase。
在hbase中使用snappy
建表时加入snappy
$ echo "create 'snappyTest',{NAME=&'f',COMPRESSION =& 'SNAPPY'} " | hbase shell
修改表压缩工具
$ echo "disable 'snappyTest2'" | hbase shell
$ echo "desc 'snappyTest2'" | hbase shell
$ echo "alter 'snappyTest2',{NAME=&'f',COMPRESSION =& 'SNAPPY'} " | hbase shell
$ echo "enable 'snappyTest2'" | hbase shell
$ echo "major_compact 'snappyTest2'" | hbase shell
如上hadoop集群的snappy算是基本安装ok。
通过今天的简单测试压缩率能达到25%左右,虽然不及gzip等压缩工具,但由于其考虑了cpu资源利用率和压缩,解压数率,这个压缩率还是令人满意的,至少我挺满足了。
&&相关文章推荐
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
访问:14381次
排名:千里之外
原创:43篇
(1)(2)(4)(3)(1)(6)(5)(5)(11)(7)Posts - 141,
Articles - 36,
Comments - 241
12:25 by zhenjing, ... 阅读,
相关软件版本:
jdk-6u32-linux-x64.bin
tar xzvf hadoop-1.0.2.tar.gz
tar xzvf hbase-0.92.1-security.tar.gz
jdk-6u32-linux-x64.bin
注:目前已经有更高版本,建议使用高版本的hadoop和hbase。
环境:suse10 &64位机器
单机版参考:&&
0. 设置集群机器的hostname;
1. 建立hadoop用户,如hadoop;
2. 配置ssh(简单办法:先在单机上生成ssh key pair文件,将授权文件拷贝到相应的机器上)
3. 安装java、hadoop、hbase。
4. 配置hadoop:core-site.xml 、core-site.xml 、mapred-site.xml 、taskcontroller.cfg、master、slavers、hadoop-env.sh、/etc/hosts、
环境变量(/etc/profile):&
# hadoop envexport JAVA_HOME=/usr/share/javaexport JRE_HOME=$JAVA_HOME/jreexport HADOOP_CONF=/home/oicq/hadoop/conf
HADOOP_HOME=/home/oicq/hadoop/hadoopenxport PATH=$PATH:/usr/share/java/bin:/home/oicq/hadoop/hadoop/bin:/home/oicq/hadoop/hbase/bin
5. 安装snappy:
#! /bin/bash
tar xzvf snappy-1.0.5.tar.gz
cd snappy-1.0.5
./configure
make install
cp .libs/libsnappy.*
../hadoop/lib/native/Linux-amd64-64/
6. 编译本地库(因原始安装包不支持suse,才需要)
切换到./hadoop/src/native目录, 执行下面脚本:
#! /bin/bash
export JAVA_HOME=/usr/share/java
export HADOOP_NATIVE_SRCDIR=/home/oicq/hadoop/hadoop/src/native
export JVM_DATA_MODEL=64
export OS_NAME=Linux
export OS_ARCH=amd64
chmod 755 configure
./configure
CFLAGS="-DHADOOP_SNAPPY_LIBRARY"
touch src/org/apache/hadoop/io/compress/snappy/org_apache_hadoop_io_compress_snappy_SnappyCompressor.h
touch src/org/apache/hadoop/io/compress/snappy/org_apache_hadoop_io_compress_snappy_SnappyDecompressor.h
touch src/org/apache/hadoop/io/compress/zlib/org_apache_hadoop_io_compress_zlib_ZlibCompressor.h
touch src/org/apache/hadoop/io/compress/zlib/org_apache_hadoop_io_compress_zlib_ZlibDecompressor.h
touch src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsMapping.h
touch src/org/apache/hadoop/security/org_apache_hadoop_security_JniBasedUnixGroupsNetgroupMapping.h
touch src/org/apache/hadoop/security/org_apache_hadoop_io_nativeio_NativeIO.h
make clean
cp ./.libs/libhadoop.*
/lib/native/Linux-amd64-64/
注:hadoop从0.92开始已包含snappy的集成接口,但默认编译本地库时并不打开,需要明确指定才行。
./configure
CFLAGS="-DHADOOP_SNAPPY_LIBRARY" 至关重要。
7.配置hbase:hbase-env.sh、hbase-site.xml、regionservers;
8. (可选)启用hadoop-metrics.properties、hadoop-metrics.properties
上述步骤是在1)无法在线安装;2)非hadoop支持系统; 情况下采用的本地编译安装方式。
更多信息搜索:site:hbase导出工具Export介绍
可以通过Export工具将hbase中的表的数据导到hhdfs上
org.apache.hadoop.hbase.mapreduce.Export
下面拿表sunwg01做测试
org.apache.hadoop.hbase.mapreduce.Export sunwg01
/test/sunwg01
将表sunwg01的数据导出到/test/sunwg01下
[hadoop@sunwg ~]$ hadoop fs -ls /test/sunwg01
Found 3 items
drwxr-xr-x & hadoop supergroup 0
/test/sunwg01/_logs
drwxr-xr-x & hadoop supergroup 0
/test/sunwg01/_temporary
-rw-r&r& 1 hadoop supergroup 318
/test/sunwg01/part-m-00000
查看文件内容,因为文件是seq格式的,所以要使用-text来查看
[hadoop@sunwg ~]$ hadoop fs -text /test/sunwg01/part-m-00000
12/04/05 09:33:59 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform… using builtin-java classes
where applicable
12/04/05 09:33:59 WARN snappy.LoadSnappy: Snappy native library not
72 31 keyvalues={r1/f1:k1/4/Put/vlen=3}
72 32 keyvalues={r2/f1:k1/7/Put/vlen=3}
72 33 keyvalues={r3/f1:k1/0/Put/vlen=3}
72 34 keyvalues={r4/f1:k1/9/Put/vlen=3}
已投稿到:
以上网友发言只代表其个人观点,不代表新浪网的观点或立场。1008人阅读
一、Snappy安装环境:
gcc c++, autoconf, automake, libtool, Java 6, JAVA_HOME set, Maven 3
其中gcc一定要是4.4的版本。
二、下载Snappy 1.1.1
下载地址:
三、编译Snappy
./configure
make install
四、Hadoop Snappy 源码编译
svn checkout http://hadoop-/svn/trunk hadoop-snappy
mvn package [-Dsnappy.prefix=SNAPPY_INSTALLATION_DIR]注:如果第三步snappy安装路径是默认的话,即/usr/local/lib,则此处&[-Dsnappy.prefix=SNAPPY_INSTALLATION_DIR] 可以不写,或者&-Dsnappy.prefix=/usr/local/
五、拷贝文件
解压第四步target下hadoop-snappy-0.0.1-SNAPSHOT.tar.gz,解压后,复制lib文件
cp -r /home/hadoopuser/snappy-hadoop/target/hadoop-snappy-0.0.1-SNAPSHOT/lib/native/Linux-amd64-64/* $HADOOP_HOME/lib/native/Linux-amd64-64/将第四步target下的hadoop-snappy-0.0.1-SNAPSHOT.jar复制到$HADOOP_HOME/lib 下。
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/Linux-amd64-64/:/usr/local/lib/
六、配置mapred-site.xml,这个文件中,所有跟压缩有关的配置选项有:
&property&
&name&press&/name&
&value&false&/value&
&description&Should the job outputs be compressed?
&/description&
&/property&
&property&
&name&pression.type&/name&
&value&RECORD&/value&
&description&If the job outputs are to compressed as SequenceFiles, how should
they be compressed? Should be one of NONE, RECORD or BLOCK.
&/description&
&/property&
&property&
&name&pression.codec&/name&
&value&org.apache.press.DefaultCodec&/value&
&description&If the job outputs are compressed, how should they be compressed?
&/description&
&/property&
&property&
&name&press.map.output&/name&
&value&false&/value&
&description&Should the outputs of the maps be compressed before being
sent across the network. Uses SequenceFile compression.
&/description&
&/property&
&property&
&name&mapred.pression.codec&/name&
&value&org.apache.press.DefaultCodec&/value&
&description&If the map outputs are compressed, how should they be
compressed?
&/description&
&/property&
根据自己的需要,配置进去即可。我们为了验证方便,仅配置map部分:
&property&
&name&press.map.output&/name&
&value&true&/value&
&/property&
&property&
&name&mapred.pression.codec&/name&
&value&org.apache.press.SnappyCodec&/value&
&/property&
重新启动hadoop。为了验证是否成功,往hdfs上传一个文本文件,敲入一些词组,运行wordcount程序。如果map部分100%完成,即说明我们hadoop snappy安装成功。因为hadoop没有像HBase一样提供pressionTest类(或者是我没有找到),所以只能按照这种方法来测试。接下来,将详细列出HBase使用Snappy的配置过程。
bin/hadoop jar hadoop-examples-1.2.1.jar wordcount /input /output1
七、HBase配置Snappy
拷贝文件:
cp -r $HADOOP_HOME/lib/native/Linux-amd64-64/ $HBASE_HOME/lib/native/
hbase-env.sh配置:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/Linux-amd64-64/:/usr/local/lib/
export HBASE_LIBRARY_PATH=$HBASE_LIBRARY_PATH:$HBASE_HOME/lib/native/Linux-amd64-64/:/usr/local/lib/重启HBase
验证安装是否成功:
hbase org.apache.hadoop.pressionTest hdfs://10.200.102.239:9000/output1/part-r-00000 snappy执行命令后结果为:
14/01/15 16:41:55 INFO util.ChecksumType: Checksum can use java.util.zip.CRC32
14/01/15 16:41:55 DEBUG util.FSUtils: Creating file=hdfs://10.200.102.239:9000/output1/part-r-00000 with permission=rwxrwxrwx
14/01/15 16:41:55 INFO util.FSUtils: FileSystem doesn't support getDefaultReplication
14/01/15 16:41:55 INFO util.FSUtils: FileSystem doesn't support getDefaultBlockSize
14/01/15 16:41:55 ERROR metrics.SchemaMetrics: Inconsistent configuration. Previous configuration for using table name in metrics: true, new configuration: false
14/01/15 16:41:55 WARN snappy.LoadSnappy: Snappy native library is available
14/01/15 16:41:55 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/01/15 16:41:55 INFO snappy.LoadSnappy: Snappy native library loaded
14/01/15 16:41:55 INFO compress.CodecPool: Got brand-new compressor
14/01/15 16:41:55 DEBUG hfile.HFileWriterV2: Initialized with CacheConfig:disabled
14/01/15 16:41:56 INFO compress.CodecPool: Got brand-new decompressor
接下来创建并操作Snappy压缩格式的表:
[hadoopuser@RDCMaster ~]# hbase shell
HBase S enter 'help&RETURN&' for list of supported commands.
Type &exit&RETURN&& to leave the HBase Shell
Version 0.94.2, r1395367, Sun Oct
7 19:11:01 UTC 2012
hbase(main):001:0& create 'tsnappy', { NAME =& 'f', COMPRESSION =& 'snappy'}
0 row(s) in 10.6590 seconds
//describe表
hbase(main):002:0& describe 'tsnappy'
DESCRIPTION
{NAME =& 'tsnappy', FAMILIES =& [{NAME =& 'f', DATA_BLOCK_ENCODING =& 'NONE', BLOOMFILTER =& 'NONE', REPLICATION_ true
SCOPE =& '0', VERSIONS =& '3', COMPRESSION =& 'SNAPPY', MIN_VERSIONS =& '0', TTL =& '', KEEP_DELETED_CE
LLS =& 'false', BLOCKSIZE =& '65536', IN_MEMORY =& 'false', ENCODE_ON_DISK =& 'true', BLOCKCACHE =& 'true'}]}
1 row(s) in 0.2140 seconds
hbase(main):003:0& put 'tsnappy', 'row1', 'f:col1', 'value'
0 row(s) in 0.5190 seconds
//scan数据
hbase(main):004:0& scan 'tsnappy'
COLUMN+CELL
column=f:col1, timestamp=0, value=value
1 row(s) in 0.0860 seconds
hbase(main):005:0&以上所有过程均成功执行,说明Snappy 在Hadoop及HBase上配置成功~
&&相关文章推荐
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
访问:8704次
排名:千里之外
原创:12篇
(1)(1)(1)(1)(1)(3)(3)(3)

我要回帖

更多关于 生活大爆炸第三季资源 的文章

 

随机推荐