使用Snappy压缩时出现java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z的问题
使用Snappy压缩时出现java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z的问题
现象描述
当应用程序中使用Snappy压缩时,报出UnsatisfiedLinkError,如下:
14/09/18 20:59:50 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 0, vm-183): java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190) org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915) org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810) org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759) com.spark.common.format.ProtobufFileInputFormat$ProtobufSequenceFileRecordReader.initialize(ProtobufFileInputFormat.java:76) org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:117) org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103) org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:86) ……….
或者报如下错误:
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, node3): java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support. at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65) at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193) at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1918) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1813) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1762) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1776) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) .........
可能原因
Spark使用Snappy时,检查是否有native的方法可供调用,结果是没有。
定位思路
Spark依赖HDFS上的数据,计算时依赖YARN。应用程序中使用Snappy压缩,很可能是找不到Snappy的压缩代码。
处理步骤
- 进入Spark客户端的“$Spark_Client/conf/spark-defaults.conf”配置文件。
- 将spark.executor.extraJavaOptions和spark.driver.extraJavaOptions参数中加入如下参数值。
spark.executor.extraLibraryPath= -Djava.library.path=$HADOOP_HOME/lib/native
spark.yarn.cluster.driver.extraLibraryPath= -Djava.library.path=$HADOOP_HOME/lib/native
spark.driver.extraLibraryPath= -Djava.library.path=$HADOOP_HOME/lib/native
说明:
- java.library.path的值对应的是实际环境中的路径。
- spark.executor.extraJavaOptions和spark.driver.extraJavaOptions参数的等号后面需加空格。
参考信息
无。