大数据开发 – 第37页

Java 4月 24,2020

hadoop DataNode服务无法启动解决（报java.net.BindException:地址已在使用）

启动DataNode服务后马上又Shutdown，在操作系统没看到有DataNode的日志（可能是服务启动失败，自动删除了日志文件），幸好在界面上可以查看报错的日志：

点开报错信息，可以看到如下信息：

HDFS的端口为50010，但是使用netstat -ntulp | grep 50010查看不到此端口。

分析：

原因：当应用程序崩溃后，它会留下一个滞留的socket，以便能够提前重用socket，当尝试绑定socket并重用它，你需要将socket的flag设置为SO_REUSEADDR，但是HDFS不是这么做的。解决办法是使用设置SO_REUSEADDR的应用程序绑定到这个端口，然后停止这个应用程序。可以使用netcat工具实现。解决办法：安装nc工具，使用nc工具占用50010端口，然后关闭nc服务，再次启动DataNode后正常。

参考链接：http://www.nosql.se/2013/10/hadoop-hdfs-datanode-java-net-bindexception-address-already-in-use/参考文字：

    After an application crashes it might leave a lingering socket, so to reuse that socket early you need to set the socket flag SO_REUSEADDR when attempting to bind to it to be allowed to reuse it. The HDFS datanode doesn’t do that, and I didn’t want to restart the HBase regionserver (which was locking the socket with a connection it hadn’t realized was dead).The solution was to bind to the port with an application that sets SO_REUSEADDR andthen stop that application, I used netcat for that:# nc -l 50010

2017-02-17 20:54:52,250 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Shutdown complete.2017-02-17 20:54:52,251 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMainjava.net.BindException: Address already in use	at sun.nio.ch.Net.bind0(Native Method)	at sun.nio.ch.Net.bind(Net.java:444)	at sun.nio.ch.Net.bind(Net.java:436)	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)	at com.cloudera.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)	at com.cloudera.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:475)	at com.cloudera.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1021)	at com.cloudera.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:455)	at com.cloudera.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:440)	at com.cloudera.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:844)	at com.cloudera.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:194)	at com.cloudera.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:340)	at com.cloudera.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)	at com.cloudera.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)	at com.cloudera.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)	at com.cloudera.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)	at java.lang.Thread.run(Thread.java:745)2017-02-17 20:54:52,262 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 12017-02-17 20:54:52,264 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down DataNode at cdh1/192.168.5.78

作者 east

Java 3月 2,2020

springboot导出excel工具类

try{
    //捕获内存缓冲区的数据，转换为字节数组
    ByteArrayOutputStream out  = new ByteArrayOutputStream();
    workbook.write(out);
    //获取内存缓冲中的数据
    byte[] content = out.toByteArray();
    //将字节数据转化为输入流
    InputStream in = new ByteArrayInputStream(content);
    //通过调用reset()方法可以重新定位
    response.reset();
    //JSONP 解决跨域问题
    response.addHeader("Access-Control-Allow-Origin", "*");
    response.addHeader("Access-Control-Allow-Methods","GET,POST,PUT,DELETE,OPTIONS");
    response.addHeader("Access-Control-Allow-Headers", "WWW-Authenticate,Authorization,Set-Cookie," +
            "X-Requested-With,Accept-Version,Content-Length,Content-Type,Date,X-Api-Version,name");
    response.addHeader("Access-Control-Allow-Credentials", "true");
 //   response.setContentType("application/octet-stream");
    //如果文件名是英文名不需要加编码格式，如果是中文名需要添加"ios-8859-1"防止乱码
    response.setHeader("Content-Disposition", "attachment;filename=" +
            new String((fileName + ".xls").getBytes("gb2312"), "iso-8859-1"));
    response.setHeader("Content-Length", "" + content.length);
    response.setContentType("application/vnd.ms-excel;charset=UTF-8");
    ServletOutputStream outputStream = response.getOutputStream();
    BufferedInputStream bis = new BufferedInputStream(in);
    BufferedOutputStream bos = new BufferedOutputStream(outputStream);
    byte[] buff = new byte[8192];
    int bytesRead;
    while (-1 != (bytesRead = bis.read(buff, 0, buff.length))){
        bos.write(buff, 0, bytesRead);
    }
    bis.close();
    bos.close();
    outputStream.flush();
    outputStream.close();

}catch (IOException ex){
    ex.printStackTrace();
}

作者 east

spring 2月 6,2020

凯利公式源代码

凯利公式如下：

仓位 =（odds *pwin-q）/b

　　odds = 赔率（赔率=期望盈利÷可能亏损=2美元盈利÷1美元亏损，赔率就是2了）

　　pwin = 成功概率（抛硬币正反面都是50%的概率）

　　q = 失败概率（也就是 1-p，赌局中也是50%了）


public class kellyUtil {

    /**
     *   凯利公式
     * @param pwin  胜率
     * @param odds  赔率
     * @return
     */
    public static double kelly(double pwin,double odds){
        return (odds * pwin + pwin - 1)/odds;
    }

    public static double kellyV2(double pwin,double odds, double lossRate){
        return (odds * pwin + pwin - 1)/(odds * lossRate);
    }

    /**
     * 巴菲特版仓位管理
     * @param pwin
     * @return
     */
    public static double buffett(double pwin){
        return 2 * pwin - 1;
    }

    public static void main(String[] args) {
        double odds = kelly(0.5, 3.0);
      //  double odds = kellyV2(0.5, 3.0,1.5);
        System.out.println(("仓位:" + odds * 100 + "%"));
    }
}

作者 east

spring 1月 14,2020

maven 增加本地libs依赖的完美方案

1、首先新建libs文件夹，把外部依赖的jar放进去。

2、在pom文件把外部依赖jar文件导进去

<dependency>
    <groupId>org.codehaus.stax2</groupId>
    <artifactId>stax2</artifactId>
    <version>3.1.4</version>
    <scope>system</scope>
    <systemPath>${project.basedir}/libs/stax2-api-3.1.4.jar</systemPath>
</dependency>

其中groupId、artifactId和version都可以自己定义 scope是system，和provided类似，只是jar包本地提供，这种方式有个缺点，那就是在项目打成jar或war包的时候因为scope是system，只在编译的时候能用，install的时候不会打进去。

在pom中给spring boot的打包插件设置一下includeSystemScope参数即可?

<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<includeSystemScope>true</includeSystemScope>
</configuration>
</plugin>
</plugins>
</build>

作者 east

Hbase 1月 1,2020

hbase的常用操作工具类


public class HbaseUtil {

    private static SimpleDateFormat parse = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

    private static Configuration conf = null;

    static{
        setConf();
    }

    private static void setConf(){
        conf = HBaseConfiguration.create();
        String userDir = System.getProperty("user.dir") + File.separator + "conf" + File.separator;
        Path hconf_path = new Path(userDir + "conf.xml");
        conf.addResource(hconf_path);
    }

    public static Connection getConn() throws IOException {
        return ConnectionFactory.createConnection(conf);
    }

    /**
     * 该方法用于关闭表和connection的连接
     * @param table
     * @param conn
     */
    private static void closeSource(Table table, Connection conn,ResultScanner scanner){
            try {
                if(table != null) table.close();
                if (conn != null) conn.close();
                if (scanner != null) scanner.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
    }

    /**
     * 轨迹查询：根据表名 mac 起始时间 结束时间查询
     * @param tableName
     * @param mac
     * @param startTime
     * @param endTime
     * @return
     * @throws IOException
     */
    public static ResultScanner scan(String tableName, String mac, long startTime, long endTime) throws IOException {
        Connection conn = null;
        Table table = null;
        ResultScanner scanner = null;
        try {
            conn = HbaseUtil.getConn();
            table = conn.getTable(TableName.valueOf(tableName));
            Scan scan = new Scan();

            byte[] startRow = (mac + startTime).getBytes();
            byte[] endRow = (mac + endTime).getBytes();

            scan.setStartRow(startRow);
            scan.setStopRow(endRow);

            scanner = table.getScanner(scan);
            return scanner;
        }catch (Exception e){
            e.printStackTrace();
        }finally {
            closeSource(table,conn,scanner);
        }
        return null;
    }
}

作者 east

Hbase 1月 1,2020

如何使用hbase行键过滤器RowFilter

RowFilter是用来对rowkey进行过滤的,比较符如下:

Operator	Description
LESS	小于
LESS_OR_EQUAL	小于等于
EQUAL	等于
NOT_EQUAL	不等于
GREATER_OR_EQUAL	大于等于
GREATER	大于
NO_OP	排除所有

Comparator	Description
BinaryComparator	使用Bytes.compareTo()比较
BinaryPrefixComparator	和BinaryComparator差不多，从前面开始比较
NullComparator	Does not compare against an actual value but whether a given one is null, or not null.
BitComparator	Performs a bitwise comparison, providing a BitwiseOp class with AND, OR, and XOR operators.
RegexStringComparator	正则表达式
SubstringComparator	把数据当成字符串，用contains()来判断

提取rowkey以01结尾数据
Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL,new RegexStringComparator(“.*01$”));

提取rowkey以包含201407的数据
Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL,new SubstringComparator(“201407”));

提取rowkey以123开头的数据
Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL,new BinaryPrefixComparator(“123”.getBytes()));

作者 east

大数据开发 7月 15,2019

大数据开源项目汇总2019

电信大数据项目
以通话数据去展示如何处理并分析大数据，并最终通过图表可视化展示。

github地址：https://github.com/LittleLawson/ChinaTelecom

基于Spark的电影推荐系统

类似于国内豆瓣网站，能够在该项目-电影网站-进行电影信息浏览和查询，并且-电影网站-会根据用户的 浏览记录和用户评论，点赞（好看）等操作给用户进行实时的电影推荐（Spark）

https://github.com/LuckyZXL2016/Movie_Recommend

大数据项目实战之新闻话题的实时统计分析

一个完整的大数据项目实战，实时|离线统计分析用户的搜索话题，并用酷炫的前端界面展示出来。所用到的框架包括：Flume+KafKa+Hbase+Hive+Spark（SQL、Structured Streaming ）+Mysql+SpringMVC+Mybatis+Websocket+AugularJs+Echarts。

https://github.com/LuckyZXL2016/Movie_Recommend

基于WIFI探针的商业大数据分析技术

WIFI探针是一种可以记录附近mac地址的嗅探器，可以根据收集到的mac地址进行数据分析，获得附近的人流量、入店量、驻留时长等信息
本系统以Spark + Hadoop为核心，搭建了基于WIFI探针的大数据分析系统

https://github.com/wanghan0501/WiFiProbeAnalysis

作者 east

分类归档大数据开发