使用Snappy压缩时出现java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z的问题

使用Snappy压缩时出现java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z的问题

现象描述

当应用程序中使用Snappy压缩时,报出UnsatisfiedLinkError,如下:

14/09/18 20:59:50 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 0, vm-183): java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190)
org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
com.spark.common.format.ProtobufFileInputFormat$ProtobufSequenceFileRecordReader.initialize(ProtobufFileInputFormat.java:76)
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:117)
org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:86)
……….

或者报如下错误:

Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, 
most recent failure: Lost task 0.3 in stage 1.0 (TID 7, node3): java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
         at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
         at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193)
         at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
         at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1918)
         at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1813)
         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1762)
         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1776)
         at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
         at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
         at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239)
         at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
         at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
         at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
.........                 

可能原因

Spark使用Snappy时,检查是否有native的方法可供调用,结果是没有。

定位思路

Spark依赖HDFS上的数据,计算时依赖YARN。应用程序中使用Snappy压缩,很可能是找不到Snappy的压缩代码。

处理步骤

  1. 进入Spark客户端的“$Spark_Client/conf/spark-defaults.conf”配置文件。
  2. 将spark.executor.extraJavaOptions和spark.driver.extraJavaOptions参数中加入如下参数值。 spark.executor.extraLibraryPath= -Djava.library.path=$HADOOP_HOME/lib/native spark.yarn.cluster.driver.extraLibraryPath= -Djava.library.path=$HADOOP_HOME/lib/native spark.driver.extraLibraryPath= -Djava.library.path=$HADOOP_HOME/lib/native 说明:
    • java.library.path的值对应的是实际环境中的路径。
    • spark.executor.extraJavaOptions和spark.driver.extraJavaOptions参数的等号后面需加空格。

参考信息

无。

关注公众号“大模型全栈程序员”回复“小程序”获取1000个小程序打包源码。更多免费资源在http://www.gitweixin.com/?p=2627

发表评论

邮箱地址不会被公开。 必填项已用*标注