Flink统计连续网购时间超过2个小时的女性网民信息例子

Java样例代码

场景说明

假定用户有某个网站周末网民网购停留时间的日志文本，基于某些业务要求，要求开发Flink的DataStream应用程序实现如下功能：

说明：

DataStream应用程序可以在Windows环境和Linux环境中运行。

实时统计总计网购时间超过2个小时的女性网民信息。
周末两天的日志文件第一列为姓名，第二列为性别，第三列为本次停留时间，单位为分钟，分隔符为“,”。 log1.txt：周六网民停留日志。LiuYang,female,20 YuanJing,male,10 GuoYijun,male,5 CaiXuyu,female,50 Liyuan,male,20 FangBo,female,50 LiuYang,female,20 YuanJing,male,10 GuoYijun,male,50 CaiXuyu,female,50 FangBo,female,60
log2.txt：周日网民停留日志。LiuYang,female,20 YuanJing,male,10 CaiXuyu,female,50 FangBo,female,50 GuoYijun,male,5 CaiXuyu,female,50 Liyuan,male,20 CaiXuyu,female,50 FangBo,female,50 LiuYang,female,20 YuanJing,male,10 FangBo,female,50 GuoYijun,male,50 CaiXuyu,female,50 FangBo,female,60

数据规划

DataStream样例工程的数据存储在文本中。

将log1.txt和log2.txt放置在某路径下，例如”/opt/log1.txt”和”/opt/log2.txt”。

开发思路

统计日志文件中本周末网购停留总时间超过2个小时的女性网民信息。

主要分为四个部分：

读取文本数据，生成相应DataStream，解析数据生成UserRecord信息。
筛选女性网民上网时间数据信息。
按照姓名、性别进行keyby操作，并汇总在一个时间窗口内每个女性上网时间。
筛选连续上网时间超过阈值的用户，并获取结果。

功能介绍

统计连续网购时间超过2个小时的女性网民信息，将统计结果直接打印。

java版代码：

Java样例代码



 
 
 
 
 
 
 
 
 // 参数解析:     // <filePath>为文本读取路径，用逗号分隔。     // <windowTime>为统计数据的窗口跨度,时间单位都是分。 public class FlinkStreamJavaExample {     public static void main(String[] args) throws Exception {         // 打印出执行flink run的参考命令         System.out.println("use command as: ");         System.out.println("./bin/flink run --class com.huawei.bigdata.flink.examples.FlinkStreamJavaExample /opt/test.jar --filePath /opt/log1.txt,/opt/log2.txt --windowTime 2");         System.out.println("******************************************************************************************");         System.out.println("<filePath> is for text file to read data, use comma to separate");         System.out.println("<windowTime> is the width of the window, time as minutes");         System.out.println("******************************************************************************************");         // 读取文本路径信息，并使用逗号分隔         final String[] filePaths = ParameterTool.fromArgs(args).get("filePath", "/opt/log1.txt,/opt/log2.txt").split(",");         assert filePaths.length > 0;         // windowTime设置窗口时间大小，默认2分钟一个窗口足够读取文本内的所有数据了         final int windowTime = ParameterTool.fromArgs(args).getInt("windowTime", 2);         // 构造执行环境，使用eventTime处理窗口数据         final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();         env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);         env.setParallelism(1);         // 读取文本数据流         DataStream<String> unionStream = env.readTextFile(filePaths[0]);         if (filePaths.length > 1) {             for (int i = 1; i < filePaths.length; i++) {                 unionStream = unionStream.union(env.readTextFile(filePaths[i]));             }         }         // 数据转换，构造整个数据处理的逻辑，计算并得出结果打印出来         unionStream.map(new MapFunction<String, UserRecord>() {             @Override             public UserRecord map(String value) throws Exception {                 return getRecord(value);             }         }).assignTimestampsAndWatermarks(                 new Record2TimestampExtractor()         ).filter(new FilterFunction<UserRecord>() {             @Override             public boolean filter(UserRecord value) throws Exception {                 return value.sexy.equals("female");             }         }).keyBy(             new UserRecordSelector()         ).window(             TumblingEventTimeWindows.of(Time.minutes(windowTime))         ).reduce(new ReduceFunction<UserRecord>() {             @Override             public UserRecord reduce(UserRecord value1, UserRecord value2)                     throws Exception {                 value1.shoppingTime += value2.shoppingTime;                 return value1;             }         }).filter(new FilterFunction<UserRecord>() {             @Override             public boolean filter(UserRecord value) throws Exception {                 return value.shoppingTime > 120;             }         }).print();         // 调用execute触发执行         env.execute("FemaleInfoCollectionPrint java");     }     // 构造keyBy的关键字作为分组依据     private static class UserRecordSelector implements KeySelector<UserRecord, Tuple2<String, String>> {         @Override         public Tuple2<String, String> getKey(UserRecord value) throws Exception {             return Tuple2.of(value.name, value.sexy);         }     }     // 解析文本行数据，构造UserRecord数据结构     private static UserRecord getRecord(String line) {         String[] elems = line.split(",");         assert elems.length == 3;         return new UserRecord(elems[0], elems[1], Integer.parseInt(elems[2]));     }     // UserRecord数据结构的定义，并重写了toString打印方法     public static class UserRecord {         private String name;         private String sexy;         private int shoppingTime;         public UserRecord(String n, String s, int t) {             name = n;             sexy = s;             shoppingTime = t;         }         public String toString() {             return "name: " + name + "  sexy: " + sexy + "  shoppingTime: " + shoppingTime;         }     }     // 构造继承AssignerWithPunctuatedWatermarks的类，用于设置eventTime以及waterMark     private static class Record2TimestampExtractor implements AssignerWithPunctuatedWatermarks<UserRecord> {         // add tag in the data of datastream elements         @Override         public long extractTimestamp(UserRecord element, long previousTimestamp) {             return System.currentTimeMillis();         }         // give the watermark to trigger the window to execute, and use the value to check if the window elements is ready         @Override         public Watermark checkAndGetNextWatermark(UserRecord element, long extractedTimestamp) {             return new Watermark(extractedTimestamp - 1);         }     } }

scala版本：

Scala样例代码



 
 
 
 
 
 
 
 
 // 参数解析:     // filePath为文本读取路径，用逗号分隔。     // windowTime;为统计数据的窗口跨度,时间单位都是分。     object FlinkStreamScalaExample {   def main(args: Array[String]) {     // 打印出执行flink run的参考命令     System.out.println("use command as: ")     System.out.println("./bin/flink run --class com.huawei.bigdata.flink.examples.FlinkStreamScalaExample /opt/test.jar --filePath /opt/log1.txt,/opt/log2.txt --windowTime 2")     System.out.println("******************************************************************************************")     System.out.println("<filePath> is for text file to read data, use comma to separate")     System.out.println("<windowTime> is the width of the window, time as minutes")     System.out.println("******************************************************************************************")     // 读取文本路径信息，并使用逗号分隔     val filePaths = ParameterTool.fromArgs(args).get("filePath",       "/opt/log1.txt,/opt/log2.txt").split(",").map(_.trim)     assert(filePaths.length > 0)     // windowTime设置窗口时间大小，默认2分钟一个窗口足够读取文本内的所有数据了     val windowTime = ParameterTool.fromArgs(args).getInt("windowTime", 2)     // 构造执行环境，使用eventTime处理窗口数据     val env = StreamExecutionEnvironment.getExecutionEnvironment     env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)     env.setParallelism(1)     // 读取文本数据流     val unionStream = if (filePaths.length > 1) {       val firstStream = env.readTextFile(filePaths.apply(0))       firstStream.union(filePaths.drop(1).map(it => env.readTextFile(it)): _*)     } else {       env.readTextFile(filePaths.apply(0))     }     // 数据转换，构造整个数据处理的逻辑，计算并得出结果打印出来     unionStream.map(getRecord(_))       .assignTimestampsAndWatermarks(new Record2TimestampExtractor)       .filter(_.sexy == "female")       .keyBy("name", "sexy")       .window(TumblingEventTimeWindows.of(Time.minutes(windowTime)))       .reduce((e1, e2) => UserRecord(e1.name, e1.sexy, e1.shoppingTime + e2.shoppingTime))       .filter(_.shoppingTime > 120).print()     // 调用execute触发执行     env.execute("FemaleInfoCollectionPrint scala")   }   // 解析文本行数据，构造UserRecord数据结构   def getRecord(line: String): UserRecord = {     val elems = line.split(",")     assert(elems.length == 3)     val name = elems(0)     val sexy = elems(1)     val time = elems(2).toInt     UserRecord(name, sexy, time)   }   // UserRecord数据结构的定义   case class UserRecord(name: String, sexy: String, shoppingTime: Int)   // 构造继承AssignerWithPunctuatedWatermarks的类，用于设置eventTime以及waterMark   private class Record2TimestampExtractor extends AssignerWithPunctuatedWatermarks[UserRecord] {     // add tag in the data of datastream elements     override def extractTimestamp(element: UserRecord, previousTimestamp: Long): Long = {       System.currentTimeMillis()     }     // give the watermark to trigger the window to execute, and use the value to check if the window elements is ready     def checkAndGetNextWatermark(lastElement: UserRecord,                                   extractedTimestamp: Long): Watermark = {       new Watermark(extractedTimestamp - 1)     }   } }

Flink统计连续网购时间超过2个小时的女性网民信息例子

Flink统计连续网购时间超过2个小时的女性网民信息例子

场景说明

数据规划

开发思路

功能介绍

关注公众号“大模型全栈程序员”回复“小程序”获取1000个小程序打包源码。更多免费资源在http://www.gitweixin.com/?p=2627

发表评论
取消回复

发表评论

Flink统计连续网购时间超过2个小时的女性网民信息例子

场景说明

数据规划

开发思路

功能介绍

关注公众号“大模型全栈程序员”回复“小程序”获取1000个小程序打包源码。更多免费资源在http://www.gitweixin.com/?p=2627

发表评论 取消回复

发表评论

发表评论
取消回复