Java 11月 8,2020

布控、告警流程

上下级的概念同订阅（布控），通知（告警）。

布控-告警

布控-告警流程见图5：

图5 布控-告警流程

Step1：布控者（上级）向被布控者（下级）发送HTTP POST请求/VIID/Dispositions。

Step2：被布控者（下级）将布控成功与否的响应消息返回给布控者（上级）。

布控成功后，若被布控者（下级）发现布控相关信息，其会进行告警任务。

Step3：被布控者（下级）向布控者（上级）发送HTTP POST请求/VIID/DispositionNotifications。

Step4：布控者（上级）返回响应消息。

Step5：布控信息产生重复Step3，4的操作，如此循环。

撤销布控

如果布控者想撤销布控，流程如图6：

图6 撤销布控流程

Step1：布控者（上级）可以向被布控者（下级）发送HTTP PUT请求/VIID/Dispositions/<ID>。

Step2：被布控者（下级）将撤控成功与否的响应消息返回给布控者（上级）。 Step3：撤控成功后对应的告警流程将终止。

告警相关接口

批量告警

1.接口概览

URI	/VIID/DispositionNotifications
方法	查询字符串	消息体	返回结果
POST	无	<DispositionNotificationList>	<ResponseStatusList>
备注

2.消息体特征参数

消息体结构参考C.18，字段定义参考A.18 。

3.返回结果

结构参考C.25，字段定义参考A.26。

4.示例

URI	/VIID/DispositionNotifications
请求方法	POST
请求头	参见请求参数
请求体	{ “DispositionNotificationListObject”: { “DispositionNotificationObject”: [ { “NotificationID”: “003301020002022017121416234045581”, “DispositionID”: “003301020001012017121416234045581”, “Title”: “test”, “TriggerTime”: “20170920121314”, “CntObjectID”: “330203000011900000010220171218150200000010201928”, “MotorVehicleObject”: { “MotorVehicleID”: “330203000011900000010220171218150200000010201928”, “InfoKind”: 1, “SourceID”: “33020300001190000001022017122111111100001”, “DeviceID”: “65010000001190000001”, “StorageUrl1”: “http://localhost:80/1.jpg”, “StorageUrl2”: “http://localhost:80/2.jpg”, “StorageUrl3”: “http://localhost:80/3.jpg”, “StorageUrl4”: “http://localhost:80/4.jpg”, “StorageUrl5”: “http://localhost:80/5.jpg”, “LeftTopX”: 1, “LeftTopY”: 2, “RightBtmX”: 3, “RightBtmY”: 4, “MarkTime”: “20171102101241”, “DisappearTime”: “20171102101241”, “LaneNo”: 1, “HasPlate”: “1”, “PlateClass”: “99”, “PlateColor”: “1”, “PlateNo”: “京A66666”, “PlateNoAttach”: “京A88888”, “PlateDescribe”: “这是一个车牌号的描述”, “IsDecked”: “1”, “IsAltered”: “1”, “IsCovered”: “1”, “Speed”: 80, “Direction”: “9”, “DrivingStatusCode”: “01”, “VehicleClass”: “X99”, “VehicleBrand”: “0”, “VehicleModel”: “这是车辆型号描述”, “VehicleStyles”: “2000年”, “VehicleLength”: 12345, “VehicleColor”: “1”, “VehicleColorDepth”: “1”, “VehicleHood”: “这是车前盖的描述”, “VehicleTrunk”: “这是车后盖的描述”, “VehicleWheel”: “这是车轮的描述”, “WheelPrintedPattern”: “11”, “VehicleWindow”: “这是车窗的描述”, “VehicleRoof”: “这是车顶的描述”, “VehicleDoor”: “这是车门的描述”, “SideOfVehicle”: “这是车侧的描述”, “CarOfVehicle”: “这是车厢的描述”, “RearviewMirror”: “这是后视镜的描述”, “VehicleChassis”: “这是底盘的描述”, “VehicleShielding”: “这是遮挡的描述”, “FilmColor”: “1”, “HitMarkInfo”: “1”, “VehicleBodyDesc”: “这是车身的描述”, “VehicleFrontItem”: “1”, “DescOfFrontItem”: “这是车前部物品描述”, “VehicleRearItem”: “1”, “DescOfRearItem”: “这是车后部物品描述”, “PassTime”: “20171218150122”, “NameOfPassedRoad”: “这是经过道路的名称描述”, “IsSuspicious”: “1”, “Sunvisor”: 1, “SafetyBelt”: 1, “Calling”: 1, “PlateReliability”: “80”, “PlateCharReliability”: “苏-80,B-90,1-90,2-88,3-90,4-67,5-87”, “BrandReliability”: “88” } } ] } }
响应体	{ “ResponseStatusListObject”: { “ResponseStatusObject”: [ { “RequestURL”: “http://localhost:8080/VIID/DispositionNotifications”, “StatusCode”: 0, “StatusString”: “正常”, “Id”: “003301020002022017121416234045581”, “LocalTime”: “20171222155008” } ] } }

告警查询

1.接口概览

URI	/VIID/DispositionNotifications
方法	查询字符串	消息体	返回结果
GET	DispositionNotification属性键-值对	无	<DispositionNotificationList>
备注

2. 查询字符串

查询指令为Disposition属性键-值对,特征属性定义参考A.18 。

3.返回结果

结构参考C.25，字段定义参考A.26。

4.示例

URI	/VIID/DispositionNotifications
请求方法	GET
请求头	参见请求参数
查询参数	?NotificationID=003301020002022017121416234045581 &DispositionID=003301020001012017121416234045581 &CntObjectID=330203000011900000010220171218150200000010201928&…
响应体	{ “DispositionNotificationListObject”: { “DispositionNotificationObject”: [ { “NotificationID”: “003301020002022017121416234045581”, “DispositionID”: “003301020001012017121416234045581”, “Title”: “test”, “TriggerTime”: “20170920121314”, “CntObjectID”: “330203000011900000010220171218150200000010201928”, “MotorVehicleObject”: { “MotorVehicleID”: “330203000011900000010220171218150200000010201928”, “InfoKind”: 1, “SourceID”: “33020300001190000001022017122111111100001”, “DeviceID”: “65010000001190000001”, “StorageUrl1”: “http://192.168.1.1:80/1.jpg”, “StorageUrl2”: “http://192.168.1.1:80/2.jpg”, “StorageUrl3”: “http://192.168.1.1:80/3.jpg”, “StorageUrl4”: “http://192.168.1.1:80/4.jpg”, “StorageUrl5”: “http://192.168.1.1:80/5.jpg”, “LeftTopX”: 1, “LeftTopY”: 2, “RightBtmX”: 3, “RightBtmY”: 4, “MarkTime”: “20171102101241”, “DisappearTime”: “20171102101241”, “LaneNo”: 1, “HasPlate”: “1”, “PlateClass”: “99”, “PlateColor”: “1”, “PlateNo”: “京A66666”, “PlateNoAttach”: “京A88888”, “PlateDescribe”: “这是一个车牌号的描述”, “IsDecked”: “1”, “IsAltered”: “1”, “IsCovered”: “1”, “Speed”: 80, “Direction”: “9”, “DrivingStatusCode”: “01”, “VehicleClass”: “X99”, “VehicleBrand”: “0”, “VehicleModel”: “这是车辆型号描述”, “VehicleStyles”: “2000年”, “VehicleLength”: 12345, “VehicleColor”: “1”, “VehicleColorDepth”: “1”, “VehicleHood”: “这是车前盖的描述”, “VehicleTrunk”: “这是车后盖的描述”, “VehicleWheel”: “这是车轮的描述”, “WheelPrintedPattern”: “11”, “VehicleWindow”: “这是车窗的描述”, “VehicleRoof”: “这是车顶的描述”, “VehicleDoor”: “这是车门的描述”, “SideOfVehicle”: “这是车侧的描述”, “CarOfVehicle”: “这是车厢的描述”, “RearviewMirror”: “这是后视镜的描述”, “VehicleChassis”: “这是底盘的描述”, “VehicleShielding”: “这是遮挡的描述”, “FilmColor”: “1”, “HitMarkInfo”: “1”, “VehicleBodyDesc”: “这是车身的描述”, “VehicleFrontItem”: “1”, “DescOfFrontItem”: “这是车前部物品描述”, “VehicleRearItem”: “1”, “DescOfRearItem”: “这是车后部物品描述”, “PassTime”: “20171218150200”, “NameOfPassedRoad”: “这是经过道路的名称描述”, “IsSuspicious”: “1”, “Sunvisor”: 1, “SafetyBelt”: 1, “Calling”: 1, “PlateReliability”: “80”, “PlateCharReliability”: “苏-80,B-90,1-90,2-88,3-90,4-67,5-87”, “BrandReliability”: “88”, “SubImageList”: { “SubImageInfoObject”: [ { “ImageID”: “33020300001190000001022017122111111100001”, “EventSort”: 4, “DeviceID”: “55220299011190000253”, “StoragePath”: “http://10.33.6.108:9080/testx_108_20170908/a2421c4fde6d4a74ac923e8470d6e7fa.jpg”, “Type”: “01”, “FileFormat”: “Jpeg”, “ShotTime”: “20170925032455”, “Width”: 437, “Height”: 350, “Data”: “/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAMCAgMCAgMDAwMEAwMEBQgFBQQEBQoHBwYIDAoMDAsKCwsNDhIQDQ4RDgsLEBYQERMUFR…UVDA8XGBYUGBIUFRT/2wBDAQMEBAUEBQkFBQkUDQsNFBQUFBVt7IDwSOM1pKehagf/9k=” } ] } } } ] } }

删除告警

1.接口概览

URI	/VIID/DispositionNotifications
方法	查询字符串	消息体	返回结果
DELETE	键IDList，值为用英文半角分号”,”分隔的字符串	无	<ResponseStatusList>
备注

2. 请求参数

IDList=<DispositionNotificationID>，<DispositionNotificationID>。

3.返回结果

结构参考C.25，字段定义参考A.26。

4.示例

URI	/VIID/DispositionNotifications
请求方法	DELETE
请求头	参见请求参数
请求体	?IDList=003301020002022017121416234045581, 303301020002022017121417234045582
响应体	{ “ResponseStatusListObject”: { “ResponseStatusObject”: [ { “RequestURL”: “http://localhost:8080/VIID/DispositionNotifications”, “StatusCode”: 0, “StatusString”: “正常”, “LocalTime”: “20171223101737” }, { “RequestURL”: “http://localhost:8080/VIID/DispositionNotifications”, “StatusCode”: 0, “StatusString”: “正常”, “LocalTime”: “20171223101737” } ] } }

作者 east

Java 11月 8,2020

订阅、通知流程（必读）

订阅-通知是视图库接口最核心的业务，也是大华和友商使用最频繁的一个功能。本节将描述完整的一个订阅-通知流程。

上下级概念

订阅通知过程本质是数据转移，例如A想通过视图库获得B的机动车数据，那么A就是上级、B就是下级；假如A想通过B间接获得C的机动车数据，那么A就是B的上级，B是C的上级，这个过程为跨级订阅/通知。

订阅-通知

订阅-通知流程见图2：

图2 订阅-通知流程

Step1：订阅者（上级）向被订阅者（下级）发送HTTP POST请求/VIID/Subscribes。

Step2：被订阅者（下级）将订阅成功与否的响应消息返回给订阅者（上级）。

订阅成功后，被订阅者（下级）如果有订阅信息，便会进行通知任务。

Step3：被订阅者（下级）向订阅者（上级）发送HTTP POST请求/VIID/SubscribeNotifications。

Step4：订阅者（上级）返回响应消息。

Step5：被订阅者（下级）接收到Step4订阅者（上级）正确返回结果才会再重复Step3，4的操作，如此循环。

取消订阅

如果订阅者想取消订阅，流程如图3：

图3 取消订阅流程

Step1：订阅者（上级）可以向被订阅者（下级）发送HTTP PUT请求/VIID/Subscribes/<ID>，写入订阅取消单位、订阅取消人、取消时间、取消原因。

Step2：被订阅者（下级）将取消订阅成功与否的响应消息返回给订阅者（上级）。

Step3：取消订阅成功后对应的通知流程将终止。

订阅相关接口

批量订阅

1.接口概览

URI	/VIID/Subscribes
方法	查询字符串	消息体	返回结果
POST	无	<SubscribeList>	<ResponseStatusList>
备注

2.消息体特征参数

消息体结构参考C.19，字段定义参考A.19 。

3.返回结果

结构参考C.25，字段定义参考A.26。

4.示例

URI	/VIID/Subscribes
请求方法	POST
请求头	参见请求参数
请求体	{ “SubscribeListObject”: { “SubscribeObject”: [ { “SubscribeID”: “330101020001032017113010580006371”, “ApplicantName”: “admin”, “ApplicantOrg”: “11111”, “BeginTime”: “20171201000000”, “EndTime”: “20191230000000”, “OperateType”: “0”, “Reason”: “测试”, “ReceiveAddr”: “http://172.6.3.107:80/VIID/SubscribeNotifications”, “ResourceURI”: “00330102015030000004”, “SubscribeDetail”: “13”, “Title”: “过车” } ] } }
响应体	{ “ResponseStatusListObject”: { “ResponseStatusObject”: [ { “RequestURL”: “http://localhost:8080/VIID/Subscribes”, “StatusCode”: 0, “StatusString”: “正常”, “Id”: “330101020001032017113010580006371”, “LocalTime”: “20171220204451” } ] } }

订阅任务查询

1.接口概览

URI	/VIID/Subscribes
方法	查询字符串	消息体	返回结果
GET	Subscribes属性键-值对	无	<SubscribeList>
备注

2.查询字符串

查询指令为Subscribes属性键-值对,特征属性定义参考A.19 。

3.返回结果

结构参考C.19，字段定义参考A.19。

4.示例

URI	/VIID/Subscribes
请求方法	GET
请求头	参见请求参数
查询参数	?SubscribeID=330101020001032017113010580006371 &ApplicantName=admin &ApplicantOrg=11111 &BeginTime=20171201000000 &EndTime=20191230000000&…
响应体	{ “SubscribeListObject”: { “SubscribeObject”: [ { “SubscribeID”: “330101020001032017113010580006371”, “ApplicantName”: “admin”, “ApplicantOrg”: “11111”, “BeginTime”: “20171201000000”, “EndTime”: “20191230000000”, “OperateType”: “0”, “Reason”: “测试”, “ReceiveAddr”: “http://172.6.3.107:80/VIID/SubscribeNotifications”, “ResourceURI”: “00330102015030000004”, “SubscribeDetail”: “13”, “Title”: “过车” } ] } }

订阅任务修改

1.接口概览

URI	/VIID/Subscribes
方法	查询字符串	消息体	返回结果
PUT	无	<SubscribeList>	<ResponseStatusList>
备注	消息体中SubscribeID必填，否则操作无效

2.消息体特征参数

消息体结构参考C.19，字段定义参考A.19 。

3.返回结果

结构参考C.25，字段定义参考A.26。

4.示例

URI	/VIID/Subscribes
请求方法	PUT
请求头	参见请求参数
请求体	{ “SubscribeListObject”: { “SubscribeObject”: [ { “SubscribeID”: “330101020001032017113010580006371”, “ApplicantName”: “admin”, “ApplicantOrg”: “11111”, “BeginTime”: “20171201000000”, “EndTime”: “20191230000000”, “OperateType”: “0”, “Reason”: “上下级汇聚”, “ReceiveAddr”: “http://172.6.3.107:80/VIID/SubscribeNotifications”, “ResourceURI”: “00330102015030000004”, “SubscribeDetail”: “13”, “Title”: “过车测试” } ] } }
响应体	{ “ResponseStatusListObject”: { “ResponseStatusObject”: [ { “RequestURL”: “http://localhost:8080/VIID/Subscribes”, “StatusCode”: 0, “StatusString”: “正常”, “Id”: “330101020001032017113010580006371”, “LocalTime”: “20171220204451” } ] } }

订阅任务删除

1.接口概览

URI	/VIID/Subscribes
方法	请求参数	消息体	返回结果
DELETE	键为IDList，值为用英文半角分号”,”分隔的字符串	无	<ResponseStatusList>
备注

2.请求参数

IDList=<SubscribeID>，<SubscribeID>。

3.返回结果

结构参考C.25，字段定义参考A.26。

4.示例

URI	/VIID/Subscribes
请求方法	DELETE
请求头	参见请求参数
请求参数	?IDList=330101020001032017113010580006371，330101020001032017113010580006372
响应体	{ “ResponseStatusListObject”: { “ResponseStatusObject”: [ { “RequestURL”: “http://localhost:8080/VIID/Subscribes”, “StatusCode”: 0, “StatusString”: “正常”, “Id”: “330101020001032017113010580006371”, “LocalTime”: “20171220204451” }, { “RequestURL”: “http://localhost:8080/VIID/Subscribes”, “StatusCode”: 0, “StatusString”: “正常”, “Id”: “330101020001032017113010580006372”, “LocalTime”: “20171220204451” } ] } }

取消订阅

1.接口概览

URI	/VIID/Subscribes/<ID>
方法	请求参数	消息体	返回结果
PUT	无	<Subscribe>	<ResponseStatus>
备注	PUT更新Subscribe写入订阅取消单位、订阅取消人、取消时间、取消原因。

2.消息体特征参数

订阅对象中的取消单位、订阅取消人、取消时间、取消原因，字段定义参考A.19。

3.返回结果

结构参考C.25，字段定义参考A.26。

4.示例

URI	/VIID/Subscribes/330101020001032017113010580006371
请求方法	PUT
请求头	参见请求参数
消息体	{ “SubscribeObject”: { “SubscribeCancelOrg”: “省公安厅”, “SubscribeCancelPerson”: “admin”, “CancelTime”: “20171201000000”, “CancelReason”: “服务到期” } }
响应体	{ “ResponseStatusObject”: { “RequestURL”: “http://localhost:8080/VIID/Subscribes/330101020001032017113010580006371”, “StatusCode”: 0, “StatusString”: “正常”, “Id”: “330101020001032017113010580006371”, “LocalTime”: “20171220204451” } }

通知相关接口

订阅通知

1.接口概览

URI	/VIID/SubscribeNotifications
方法	查询字符串	消息体	返回结果
POST	无	<SubscribeNotificationList>	<ResponseStatusList>
备注

2.消息体特征参数

消息体结构参考C.20，字段定义参考A.20。

3.返回结果

结构参考C.25，字段定义参考A.26。

4.示例

URI	/VIID/SubscribeNotifications
请求方法	POST
请求头	参见请求参数
请求体	{ “SubscribeNotificationListObject”: { “SubscribeNotificationObject”: [ { “NotificationID”: “650100010000042017040112010100001”, “Title”: “通知主题”, “SubscribeID”: “650100010000032017040112010100001”, “TriggerTime”: “20171102101205”, “InfoIDs”: “650100000013200000010120170330120000000010100001”, “DeviceList”: { “APEObject”: [ { “Name”: “这是一个采集设备”, “Port”: 8888, “Password”: “p@ssword”, “Model”: “这是采集设备的型号”, “ApeID”: “65010000001190000001”, “MonitorAreaDesc”: “监控区域说明”, “IPAddr”: “192.168.1.1”, “IPV6Addr”: “fe80::69fd:1871:b9ba:24e7%13”, “Longitude”: 56.654321, “Latitude”: 56.123456, “PlaceCode”: “650100”, “OrgCode”: “650100010000”, “CapDirection”: 1, “MonitorDirect”: “1”, “IsOnline”: “1”, “OwnerApsID”: “65010000001200000001”, “UserId”: “Administrator”, “Place”: “新疆乌鲁木齐” } ] } } ] } }
响应体	{ “ResponseStatusListObject”: { “ResponseStatusObject”: [{ “RequestURL”: “http://localhost:8080/VIID/SubscribeNotifications”, “StatusCode”: 0, “StatusString”: “正常”, “Id”: “330101020001032017113010580006371”, “LocalTime”: “20171220204451” } ]} }

通知查询

1.接口概览

URI	/VIID/SubscribeNotifications
方法	查询字符串	消息体	返回结果
GET	SubscribeNotifaication属性键-值对	无	<SubscribeNotificationList>
备注

2.查询字符串

查询指令为SubscribeNotifications属性键-值对,特征属性定义参考A.20。

3.返回结果

结构参考C.20，字段定义参考A.20。

4.示例

URI	/VIID/SubscribeNotifications
请求方法	GET
请求头	参见请求参数
查询参数	?NotificationID=650100010000042017040112010100001 &Title=通知主题&…
响应体	{ “SubscribeNotificationListObject”: { “SubscribeNotificationObject”: [ { “NotificationID”: “650100010000042017040112010100001”, “Title”: “通知主题”, “SubscribeID”: “650100010000032017040112010100001”, “TriggerTime”: “20171102101205”, “InfoIDs”: “650100000013200000010120170330120000000010100001”, “DeviceList”: { “APEObject”: [ { “Name”: “这是一个采集设备”, “Port”: 8888, “Password”: “p@ssword”, “Model”: “这是采集设备的型号”, “ApeID”: “65010000001190000001”, “MonitorAreaDesc”: “监控区域说明”, “IPAddr”: “192.168.1.1”, “IPV6Addr”: “fe80::69fd:1871:b9ba:24e7%13”, “Longitude”: 56.654321, “Latitude”: 56.123456, “PlaceCode”: “650100”, “OrgCode”: “650100010000”, “CapDirection”: 1, “MonitorDirect”: “1”, “IsOnline”: “1”, “OwnerApsID”: “65010000001200000001”, “UserId”: “Administrator”, “Place”: “新疆乌鲁木齐” } ] } } ] } }

通知删除

1.接口概览

URI	/VIID/SubscribeNotifications
方法	请求参数	消息体	返回结果
DELETE	键为IDList，值为用英文半角分号”,”分隔的字符串	无	<ResponseStatusList>
备注

2.请求参数

IDList=<NotificationID>，<NotificationID>。

3.返回结果

结构参考C.20，字段定义参考A.20。

4.示例

URI	/VIID/SubscribeNotifications
请求方法	DELETE
请求头	参见请求参数
请求参数	?IDList=650100010000042017040112010100001，650100010000042017040112010100002
响应体	{ “ResponseStatusListObject”: { “ResponseStatusObject”: [ { “RequestURL”: “http://localhost:8080/VIID/SubscribeNotifications”, “StatusCode”: 0, “StatusString”: “正常”, “Id”: “650100010000042017040112010100001”, “LocalTime”: “20171220204451” }, { “RequestURL”: “http://localhost:8080/VIID/SubscribeNotifications”, “StatusCode”: 0, “StatusString”: “正常”, “Id”: “650100010000042017040112010100002”, “LocalTime”: “20171220204451” } ] } }

C.19 订阅对象

//订阅对象

</sequence>

</complexType>

//订阅对象列表

</sequence>

</complexType>

C.20 通知对象

//通知对象

</sequence>

</complexType>

//通知对象列表

</sequence>

</complexType>

作者 east

Java 11月 8,2020

调用方式（必读）

所有接口(以下简称API)以HTTP/REST方式接入，使用URI 唯一标识，各类接口的URI参照API概览

请求结构

对视图库API调用是通过向视图库服务端地址发送请求，并按照接口说明在请求中加入相应的请求参数来完成的。视图库接口的请求结构由以下几个部分组成：

1.服务地址

HTTP URL格式的形式为：

<Protocol>://<Hostname>:<Port><URI>(?P1=v1&p2=v2…&pn=vn)。

其中：Protocol为HTTP；Hostname指视图库服务设备的主机名称、IP地址或域名；Port指端口号；URI指资源URI；(?P1=v1&p2=v2…&pn=vn)指查询字符串，每个资源都会定义需要的或可选的查询字符串参数，查询字符串参数以名字/值对形式出现。

2.请求方法

视图库API的HTTP请求方法包括GET、PUT、POST、DELETE。

方法的选取参照对应接口说明。

注意：

1、如果接口的请求方法是POST， PUT则需要对请求内容进行指定编码处理，且内容均从消息体中取得。

2、如果接口的请求方法是GET，则对所有请求参数值均需要做URL编码。

3.请求参数

视图库API每个请求都需要指定请求头参数，参数列表如下

表1 请求参数列表

序号	参数名称	参数说明	必填	备注
1	Content-Type	消息体对象的媒体类型	是/否	带有消息体的POST,PUT请求必填，支持 application/json或xml;charset=utf-8
2	Accept	请求方能接收的媒体类型	否	支持application/json或xml;charset=utf-8，默认返回application/json;charset=utf-8
3	User-Identify	请求方系统编码	是/否	用于校验访问者身份，参考注册、保活、注销
备注：参数命名和取值，大小写敏感

返回结果

REST HTTP请求响应保留HTTP协议相关内容，其中返回结果（响应消息体）由视图库API维护。

视图库API的返回结果由请求方法决定，总结如下表：

表2 各类请求返回结果

序号	请求方法	说明	返回结果
1	GET	查询单个目标对象属性	单个目标结果
2	GET	查询符合条件的多个对象	多个目标集合
3	GET	查询异常	ResponseStatus
4	POST	提交单个目标对象	ResponseStatus
5	POST	提交多个目标对象集合	List<ResponseStatus>
6	PUT	修改单个目标对象	ResponseStatus
7	PUT	修改多个目标对象集合	List<ResponseStatus>
8	DELETE	修改单个目标对象	ResponseStatus
9	DELETE	修改多个目标对象集合	List<ResponseStatus>

ResponseStatus（应答状态对象）特征属性参照A.26 应答状态对象

其他的对象特征参见附录A

注册、保活、注销

如果现场环境开启了注册认证机制，则需要关注本小节内容。

注册

视图库API的访问，需要进行身份验证，身份信息放在请求头部参数User-Identify，即一个新请求方的访问需要向视图库进行注册，注册成功后，才能进行后续操作。注册流程如下：

注意：在上面流程图中对接程序向视图库发起的两次请求中，请求体都应该为key为DeviceID，value为对应发起方系统的标识ID的JSON字符串，例：{“RegisterObject”:{“DeviceID”:”33010299011190000253″}}，注册成功后，每次请求需添加请求头部参数User-Identify，值为DeviceID的value（33010299011190000253），详情参考接口注册。

保活

注册成功后，会有一定的有效期（一般为5分钟），在有效期内，如果没有接收到新的请求，注册信息将会失效，下次访问将需要重新注册；当然任何形式的成功请求，都能触发保活机制，重置有效期。

注销

当请求方无需和视图库进行通信时，可以请求注销接口，撤销注册消息。

注册

1.接口概览

URI	/VIID/System/Register
方法	查询字符串	消息体	返回结果
POST	无	<Register>	<ResponseStatus>
备注

2.消息体特征参数

消息体结构参考C.26，字段定义参考A.27 。

3.返回结果

结构参考C.25，字段定义参考A.26。

4.示例

URI	/VIID/System/Register
请求方法	POST
请求头	参见请求参数其他请求头参照注册流程图
请求体	{“RegisterObject”:{“DeviceID”:”33010299011190000253″}}
响应体	{ “ResponseStatusObject”: { “RequestURL”: “http://localhost:8080/VIID/Register”, “StatusCode”: 0, “StatusString”: “正常”, “Id”: “33010299011190000253”, “LocalTime”: “20171220204451” } }

注销

1.接口概览

URI	/VIID/System/UnRegister
方法	查询字符串	消息体	返回结果
POST	无	<Register>	<ResponseStatus>
备注

2.消息体特征参数

消息体结构参考C.28，字段定义参考A.29 。

3.返回结果

结构参考C.25，字段定义参考A.26。

4.示例

URI	/VIID/System/UnRegister
请求方法	POST
请求头	参见请求参数
请求体	{“DeviceID”:”33010299011190000253″}
响应体	{ “ResponseStatusObject”: [ { “RequestURL”: “http://localhost:8080/VIID/UnRegister”, “StatusCode”: 0, “StatusString”: “正常”, “Id”: “33010299011190000253”, “LocalTime”: “20171220204451” } ] }

保活

1.接口概览

URI	/VIID/System/Keepalive
方法	查询字符串	消息体	返回结果
POST	无	< Keepalive >	<ResponseStatus>
备注

2.消息体特征参数

消息体结构参考C.27，字段定义参考A.28 。

3.返回结果

结构参考C.25，字段定义参考A.26。

4.示例

URI	/VIID/System/Keepalive
请求方法	POST
请求头	参见请求参数
请求体	{“DeviceID”:”33010299011190000253″}
响应体	{ “ResponseStatusObject”: [ { “RequestURL”: “http://localhost:8080/VIID/Keepalive”, “StatusCode”: 0, “StatusString”: “正常”, “Id”: “33010299011190000253”, “LocalTime”: “20171220204451” } ] }

C.25 应答状态对象

//应答状态对象

</sequence>

</ complexType>

//应答状态对象列表

</sequence>

</complexType>

C.26 注册对象

//注册对象

</sequence>

</ complexType>

C.27 保活对象

//保活对象

</sequence>

</complexType>

C.28 注销对象

//注销对象

</sequence>

</ complexType>

作者 east

Flink 10月 26,2020

Flink 向Kafka生产并消费数据程序

场景说明

假定某个Flink业务每秒就会收到1个消息记录。

基于某些业务要求，开发的Flink应用程序实现功能：实时输出带有前缀的消息内容。

数据规划

Flink样例工程的数据存储在Kafka组件中。向Kafka组件发送数据（需要有Kafka权限用户），并从Kafka组件接收数据。

确保集群安装完成，包括HDFS、Yarn、Flink和Kafka。
创建Topic。创建topic的命令格式： bin/kafka-topics.sh –create –zookeeper {zkQuorum}/kafka –partitions {partitionNum} –replication-factor {replicationNum} –topic {Topic} 表1 参数说明参数名说明 {zkQuorum} ZooKeeper集群信息，格式为IP:port。 {partitionNum} topic的分区数。 {replicationNum} topic中每个partition数据的副本数。 {Topic} topic名称。示例：在Kafka的客户端路径下执行命令，此处以ZooKeeper集群的IP:port是10.96.101.32:24002,10.96.101.251:24002,10.96.101.177:24002,10.91.8.160:24002，Topic名称为topic1的数据为例。bin/kafka-topics.sh –create –zookeeper 10.96.101.32:24002,10.96.101.251:24002,10.96.101.177:24002,10.91.8.160:24002/kafka –partitions 5 –replication-factor 1 –topic topic1

开发思路

启动Flink Kafka Producer应用向Kafka发送数据。
启动Flink Kafka Consumer应用从Kafka接收数据，保证topic与producer一致。
在数据内容中增加前缀并进行打印。

Java样例代码

功能介绍

在Flink应用中，调用flink-connector-kafka模块的接口，生产并消费数据。

代码样例

下面列出producer和consumer主要逻辑代码作为演示。

完整代码参见com.huawei.bigdata.flink.examples.WriteIntoKafka和com.huawei.bigdata.flink.examples.ReadFromKafka

//producer代码
public class WriteIntoKafka {
  public static void main(String[] args) throws Exception {
    // 打印出执行flink run的参考命令
    System.out.println("use command as: ");
    System.out.println("./bin/flink run --class com.huawei.bigdata.flink.examples.WriteIntoKafka" +
        " /opt/test.jar --topic topic-test --bootstrap.servers 10.91.8.218:21005");
    System.out.println("******************************************************************************************");
    System.out.println("<topic> is the kafka topic name");
    System.out.println("<bootstrap.servers> is the ip:port list of brokers");
    System.out.println("******************************************************************************************");

    // 构造执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    // 设置并发度
    env.setParallelism(1);
    // 解析运行参数
    ParameterTool paraTool = ParameterTool.fromArgs(args);
    // 构造流图，将自定义Source生成的数据写入Kafka
    DataStream<String> messageStream = env.addSource(new SimpleStringGenerator());
    messageStream.addSink(new FlinkKafkaProducer010<>(paraTool.get("topic"),
        new SimpleStringSchema(),
        paraTool.getProperties()));
    // 调用execute触发执行
    env.execute();
  }

  // 自定义Source，每隔1s持续产生消息
  public static class SimpleStringGenerator implements SourceFunction<String> {
    private static final long serialVersionUID = 2174904787118597072L;
    boolean running = true;
    long i = 0;

    @Override
    public void run(SourceContext<String> ctx) throws Exception {
      while (running) {
        ctx.collect("element-" + (i++));
        Thread.sleep(1000);
      }
    }

    @Override
    public void cancel() {
      running = false;
    }
  }
}

//consumer代码
public class ReadFromKafka {
  public static void main(String[] args) throws Exception {
    // 打印出执行flink run的参考命令
    System.out.println("use command as: ");
    System.out.println("./bin/flink run --class com.huawei.bigdata.flink.examples.ReadFromKafka" +
        " /opt/test.jar --topic topic-test -bootstrap.servers 10.91.8.218:21005");
    System.out.println("******************************************************************************************");
    System.out.println("<topic> is the kafka topic name");
    System.out.println("<bootstrap.servers> is the ip:port list of brokers");
    System.out.println("******************************************************************************************");

    // 构造执行环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    // 设置并发度
    env.setParallelism(1);
    // 解析运行参数
    ParameterTool paraTool = ParameterTool.fromArgs(args);
    // 构造流图，从Kafka读取数据并换行打印
    DataStream<String> messageStream = env.addSource(new FlinkKafkaConsumer010<>(paraTool.get("topic"),
        new SimpleStringSchema(),
        paraTool.getProperties()));
    messageStream.rebalance().map(new MapFunction<String, String>() {
      @Override
      public String map(String s) throws Exception {
        return "Flink says " + s + System.getProperty("line.separator");
      }
    }).print();
    // 调用execute触发执行
    env.execute();
  }
}

Scala样例代码

功能介绍

在Flink应用中，调用flink-connector-kafka模块的接口，生产并消费数据。

代码样例

下面列出producer和consumer主要逻辑代码作为演示。完整代码参见com.huawei.bigdata.flink.examples.WriteIntoKafka和com.huawei.bigdata.flink.examples.ReadFromKafka

//producer代码
object WriteIntoKafka {
  def main(args: Array[String]) {
    // 打印出执行flink run的参考命令
    System.out.println("use command as: ")
    System.out.println("./bin/flink run --class com.huawei.bigdata.flink.examples.WriteIntoKafka" +
      " /opt/test.jar --topic topic-test --bootstrap.servers 10.91.8.218:21005")
    System.out.println("******************************************************************************************")
    System.out.println("<topic> is the kafka topic name")
    System.out.println("<bootstrap.servers> is the ip:port list of brokers")
    System.out.println("******************************************************************************************")

    // 构造执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    // 设置并发度
    env.setParallelism(1)
    // 解析运行参数
    val paraTool = ParameterTool.fromArgs(args)
    // 构造流图，将自定义Source生成的数据写入Kafka
    val messageStream: DataStream[String] = env.addSource(new SimpleStringGenerator)
    messageStream.addSink(new FlinkKafkaProducer010(
      paraTool.get("topic"), new SimpleStringSchema, paraTool.getProperties))
    // 调用execute触发执行
    env.execute
  }
}

// 自定义Source，每隔1s持续产生消息
class SimpleStringGenerator extends SourceFunction[String] {
  var running = true
  var i = 0

  override def run(ctx: SourceContext[String]) {
    while (running) {
      ctx.collect("element-" + i)
      i += 1
      Thread.sleep(1000)
    }
  }

  override def cancel() {
    running = false
  }
}

//consumer代码
object ReadFromKafka {
  def main(args: Array[String]) {
    // 打印出执行flink run的参考命令
    System.out.println("use command as: ")
    System.out.println("./bin/flink run --class com.huawei.bigdata.flink.examples.ReadFromKafka" +
      " /opt/test.jar --topic topic-test -bootstrap.servers 10.91.8.218:21005")
    System.out.println("******************************************************************************************")
    System.out.println("<topic> is the kafka topic name")
    System.out.println("<bootstrap.servers> is the ip:port list of brokers")
    System.out.println("******************************************************************************************")

    // 构造执行环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    // 设置并发度
    env.setParallelism(1)
    // 解析运行参数
    val paraTool = ParameterTool.fromArgs(args)
    // 构造流图，从Kafka读取数据并换行打印
    val messageStream = env.addSource(new FlinkKafkaConsumer010(
      paraTool.get("topic"), new SimpleStringSchema, paraTool.getProperties))
    messageStream
      .map(s => "Flink says " + s + System.getProperty("line.separator")).print()
    // 调用execute触发执行
    env.execute()
  }
}

作者 east

Flink 10月 26,2020

Spark Streaming从Kafka读取数据再写入HBase 实例

Java样例代码

功能介绍

在Spark应用中，通过使用Spark Streaming调用Kafka接口来获取数据，然后把数据经过分析后，找到对应的HBase表记录，再写到HBase表。

代码样例

下面代码片段仅为演示，具体代码参见：com.huawei.bigdata.spark.examples.SparkOnStreamingToHbase

/**  * 运行Spark Streaming任务，根据value值从hbase table1表读取数据，把两者数据做操作后，更新到hbase table1表  */ public class SparkOnStreamingToHbase {   public static void main(String[] args) throws Exception {     if (args.length < 3) {       printUsage();     }     String checkPointDir = args[0];     String topics = args[1];     final String brokers = args[2];     Duration batchDuration = Durations.seconds(5);     SparkConf sparkConf = new SparkConf().setAppName("SparkOnStreamingToHbase");     JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, batchDuration);     // 设置Spark Streaming的CheckPoint目录     if (!"nocp".equals(checkPointDir)) {       jssc.checkpoint(checkPointDir);     }     final String columnFamily = "cf";     HashMap<String, String> kafkaParams = new HashMap<String, String>();     kafkaParams.put("metadata.broker.list", brokers);     String[] topicArr = topics.split(",");     Set<String> topicSet = new HashSet<String>(Arrays.asList(topicArr));     // 通过brokers和topics直接创建kafka stream     // 接收Kafka中数据，生成相应DStream     JavaDStream<String> lines = KafkaUtils.createDirectStream(jssc, String.class, String.class,       StringDecoder.class, StringDecoder.class, kafkaParams, topicSet).map(       new Function<Tuple2<String, String>, String>() {         public String call(Tuple2<String, String> tuple2) {           // map(_._1)是消息的key, map(_._2)是消息的value           return tuple2._2();         }       }     );     lines.foreachRDD(       new Function<JavaRDD<String>, Void>() {         public Void call(JavaRDD<String> rdd) throws Exception {           rdd.foreachPartition(             new VoidFunction<Iterator<String>>() {               public void call(Iterator<String> iterator) throws Exception {                 hBaseWriter(iterator, columnFamily);               }             }           );           return null;         }       }     );     jssc.start();     jssc.awaitTermination();   }   /**    * 在executor端写入数据    * @param iterator  消息    * @param columnFamily    */   private static void hBaseWriter(Iterator<String> iterator, String columnFamily) throws IOException {     Configuration conf = HBaseConfiguration.create();     Connection connection = null;     Table table = null;     try {       connection = ConnectionFactory.createConnection(conf);       table = connection.getTable(TableName.valueOf("table1"));       List<Get> rowList = new ArrayList<Get>();       while (iterator.hasNext()) {         Get get = new Get(iterator.next().getBytes());         rowList.add(get);       }       // 获取table1的数据       Result[] resultDataBuffer = table.get(rowList);       // 设置table1的数据       List<Put> putList = new ArrayList<Put>();       for (int i = 0; i < resultDataBuffer.length; i++) {         String row = new String(rowList.get(i).getRow());         Result resultData = resultDataBuffer[i];         if (!resultData.isEmpty()) {           // 根据列簇和列，获取旧值           String aCid = Bytes.toString(resultData.getValue(columnFamily.getBytes(), "cid".getBytes()));           Put put = new Put(Bytes.toBytes(row));           // 计算结果           int resultValue = Integer.valueOf(row) + Integer.valueOf(aCid);           put.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes("cid"), Bytes.toBytes(String.valueOf(resultValue)));           putList.add(put);         }       }       if (putList.size() > 0) {         table.put(putList);       }     } catch (IOException e) {       e.printStackTrace();     } finally {       if (table != null) {         try {           table.close();         } catch (IOException e) {           e.printStackTrace();         }       }       if (connection != null) {         try {           // 关闭Hbase连接.           connection.close();         } catch (IOException e) {           e.printStackTrace();         }       }     }   }     private static void printUsage() {     System.out.println("Usage: {checkPointDir} {topic} {brokerList}");     System.exit(1);   } }

Scala样例代码

功能介绍

在Spark应用中，通过使用Spark Streaming调用Kafka接口来获取数据，然后把数据经过分析后，找到对应的HBase表记录，再写到HBase表。

代码样例

下面代码片段仅为演示，具体代码参见：com.huawei.bigdata.spark.examples.SparkOnStreamingToHbase

/**
  * 运行Spark Streaming任务，根据value值从hbase table1表读取数据，把两者数据做操作后，更新到hbase table1表
  */
object SparkOnStreamingToHbase {
  def main(args: Array[String]) {
    if (args.length < 3) {
      printUsage
    }

    val Array(checkPointDir, topics, brokers) = args
    val sparkConf = new SparkConf().setAppName("SparkOnStreamingToHbase")
    val ssc = new StreamingContext(sparkConf, Seconds(5))

    // 设置Spark Streaming的CheckPoint目录
    if (!"nocp".equals(checkPointDir)) {
      ssc.checkpoint(checkPointDir)
    }

    val columnFamily = "cf"
    val kafkaParams = Map[String, String](
      "metadata.broker.list" -> brokers
    )

    val topicArr = topics.split(",")
    val topicSet = topicArr.toSet
    // map(_._1)是消息的key, map(_._2)是消息的value
    val lines = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicSet).map(_._2)
    lines.foreachRDD(rdd => {
      //partition运行在executor上
      rdd.foreachPartition(iterator => hBaseWriter(iterator, columnFamily))
    })

    ssc.start()
    ssc.awaitTermination()
  }

  
  /**
   * 在executor端写入数据
   * @param iterator  消息
   * @param columnFamily
   */
  def hBaseWriter(iterator: Iterator[String], columnFamily: String): Unit = {
    val conf = HBaseConfiguration.create()
    var table: Table = null
    var connection: Connection = null
    try {
      connection = ConnectionFactory.createConnection(conf)
      table = connection.getTable(TableName.valueOf("table1"))
      val iteratorArray = iterator.toArray
      val rowList = new util.ArrayList[Get]()
      for (row <- iteratorArray) {
        val get = new Get(row.getBytes)
        rowList.add(get)
      }
      // 获取table1的数据
      val resultDataBuffer = table.get(rowList)
      // 设置table1的数据
      val putList = new util.ArrayList[Put]()
      for (i <- 0 until iteratorArray.size) {
        val row = iteratorArray(i)
        val resultData = resultDataBuffer(i)
        if (!resultData.isEmpty) {
          // 根据列簇和列，获取旧值
          val aCid = Bytes.toString(resultData.getValue(columnFamily.getBytes, "cid".getBytes))
          val put = new Put(Bytes.toBytes(row))
          // 计算结果
          val resultValue = row.toInt + aCid.toInt
          put.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes("cid"), Bytes.toBytes(resultValue.toString))
          putList.add(put)
        }
      }
      if (putList.size() > 0) {
        table.put(putList)
      }
    } catch {
      case e: IOException =>
        e.printStackTrace();
    } finally {
      if (table != null) {
        try {
          table.close()
        } catch {
          case e: IOException =>
            e.printStackTrace();
        }
      }
      if (connection != null) {
        try {
          // 关闭Hbase连接.
          connection.close()
        } catch {
          case e: IOException =>
            e.printStackTrace()
        }
      }
    }
  }
  

  private def printUsage {
    System.out.println("Usage: {checkPointDir} {topic} {brokerList}")
    System.exit(1)
  }
}

作者 east

Hbase 10月 26,2020

Hbase使用过滤器Filter例子

使用过滤器Filter

功能简介

HBase Filter主要在Scan和Get过程中进行数据过滤，通过设置一些过滤条件来实现，如设置RowKey、列名或者列值的过滤条件。

代码样例

以下代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的testSingleColumnValueFilter方法中。

public void testSingleColumnValueFilter() {    
 LOG.info("Entering testSingleColumnValueFilter.");  
 Table table = null;     
 ResultScanner rScanner = null;     
try {       
table = conn.getTable(tableName);    
 Scan scan = new Scan();     
 scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name")); 
      // Set the filter criteria.      
 SingleColumnValueFilter filter = new SingleColumnValueFilter(           Bytes.toBytes("info"), Bytes.toBytes("name"), CompareOp.EQUAL,           Bytes.toBytes("Xu Bing"));    
   scan.setFilter(filter);      
 // Submit a scan request.    
   rScanner = table.getScanner(scan);     
  // Print query results.     
  for (Result r = rScanner.next(); r != null; r = rScanner.next()) {         for (Cell cell : r.rawCells()) {           LOG.info(Bytes.toString(CellUtil.cloneRow(cell)) + ":"               + Bytes.toString(CellUtil.cloneFamily(cell)) + ","               + Bytes.toString(CellUtil.cloneQualifier(cell)) + ","               + Bytes.toString(CellUtil.cloneValue(cell)));         }       }       LOG.info("Single column value filter successfully.");     } catch (IOException e) {      
 LOG.error("Single column value filter failed " ,e);    
 } finally {       
  if (rScanner != null) {       
      // Close the scanner object.     
        rScanner.close();       
    }     
  if (table != null) {     
    try {      
     // Close the HTable object.   
        table.close();      
   } catch (IOException e) {        
   LOG.error("Close table failed " ,e);   
      }      
 }   
  }    
 LOG.info("Exiting testSingleColumnValueFilter."); 
  }

注意事项

当前二级索引不支持使用SubstringComparator类定义的对象作为Filter的比较器。

例如，如下示例中的用法当前不支持：

Scan scan = new Scan();
filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
filterList.addFilter(new SingleColumnValueFilter(Bytes
.toBytes(columnFamily), Bytes.toBytes(qualifier),
CompareOp.EQUAL, new SubstringComparator(substring)));
scan.setFilter(filterList);

作者 east

Hive 10月 26,2020

Hive用户自定义函数

用户自定义函数

当Hive的内置函数不能满足需要时，可以通过编写用户自定义函数UDF（User-Defined Functions）插入自己的处理代码并在查询中使用它们。

按实现方式，UDF分如下分类：

普通的UDF，用于操作单个数据行，且产生一个数据行作为输出。
用户定义聚集函数UDAF（User-Defined Aggregating Functions），用于接受多个输入数据行，并产生一个输出数据行。
用户定义表生成函数UDTF(User-Defined Table-Generating Functions)，用于操作单个输入行，产生多个输出行。

按使用方法，UDF有如下分类：

临时函数，只能在当前会话使用，重启会话后需要重新创建。
永久函数，可以在多个会话中使用，不需要每次创建。

下面以编写一个AddDoublesUDF为例，说明UDF的编写和使用方法：

功能介绍

AddDoublesUDF主要用来对两个及多个浮点数进行相加。在该样例中可以掌握如何编写和使用UDF。

说明：

一个普通UDF必须继承自“org.apache.hadoop.hive.ql.exec.UDF”。
一个普通UDF必须至少实现一个evaluate()方法，evaluate函数支持重载。
开发自定义函数需要在工程中添加hive-exec-1.3.0.jar依赖包，可从hive安装目录下获取。

样例代码

以下为UDF示例代码：

package com.huawei.bigdata.hive.example.udf;
import org.apache.hadoop.hive.ql.exec.UDF;

public class AddDoublesUDF extends UDF { 
 public Double evaluate(Double... a) { 
    Double total = 0.0; 
    // 处理逻辑部分. 
    for (int i = 0; i < a.length; i++) 
      if (a[i] != null) 
        total += a[i]; 
    return total; 
  } 
}

如何使用

把以上程序打包成AddDoublesUDF.jar，并上传到HDFS指定目录下(如“/user/hive_examples_jars/”)且创建函数的用户与使用函数的用户有该文件的可读权限。示例语句： hdfs dfs -put ./hive_examples_jars /user/hive_examples_jars hdfs dfs -chmod 777 /user/hive_examples_jars
需要使用一个具有admin权限的用户登录beeline客户端，执行如下命令： kinit Hive业务用户 beeline set role admin;
在Hive Server中定义该函数，以下语句用于创建永久函数： CREATE FUNCTION addDoubles AS ‘com.bigdata.hive.example.udf.AddDoublesUDF’ using jar ‘hdfs://hacluster/user/hive_examples_jars/AddDoublesUDF.jar’; 其中addDoubles是该函数的别名，用于SELECT查询中使用。以下语句用于创建临时函数： CREATE TEMPORARY FUNCTION addDoubles AS ‘com.bigdata.hive.example.udf.AddDoublesUDF’ using jar ‘hdfs://hacluster/user/hive_examples_jars/AddDoublesUDF.jar’;
- addDoubles是该函数的别名，用于SELECT查询中使用。
- 关键字TEMPORARY说明该函数只在当前这个Hive Server的会话过程中定义使用。
在Hive Server中使用该函数，执行SQL语句： SELECT addDoubles(1,2,3); 说明：若重新连接客户端再使用函数出现[Error 10011]的错误，可执行reload function;命令后再使用该函数。
在Hive Server中删除该函数，执行SQL语句： DROP FUNCTION addDoubles;

作者 east

Flink 10月 26,2020

flink调优经验

数据倾斜

当数据发生倾斜（某一部分数据量特别大），虽然没有GC（Gabage Collection，垃圾回收），但是task执行时间严重不一致。

需要重新设计key，以更小粒度的key使得task大小合理化。
修改并行度。
调用rebalance操作，使数据分区均匀。

缓冲区超时设置

由于task在执行过程中存在数据通过网络进行交换，数据在不同服务器之间传递的缓冲区超时时间可以通过setBufferTimeout进行设置。
当设置“setBufferTimeout(-1)”，会等待缓冲区满之后才会刷新，使其达到最大吞吐量；当设置“setBufferTimeout(0)”时，可以最小化延迟，数据一旦接收到就会刷新；当设置“setBufferTimeout”大于0时，缓冲区会在该时间之后超时，然后进行缓冲区的刷新。示例可以参考如下：env.setBufferTimeout(timeoutMillis); env.generateSequence(1,10).map(new MyMapper()).setBufferTimeout(timeoutMillis);

作者 east

Spark 10月 26,2020

Spark Streaming调优经验

Spark Streaming调优

操作场景

Streaming作为一种mini-batch方式的流式处理框架，它主要的特点是：秒级时延和高吞吐量。因此Streaming调优的目标：在秒级延迟的情景下，提高Streaming的吞吐能力，在单位时间处理尽可能多的数据。

说明：

本章节适用于输入数据源为Kafka的使用场景。

操作步骤

一个简单的流处理系统由以下三部分组件组成：数据源 + 接收器 + 处理器。数据源为Kafka，接受器为Streaming中的Kafka数据源接收器，处理器为Streaming。

对Streaming调优，就必须使该三个部件的性能都最优化。

数据源调优 在实际的应用场景中，数据源为了保证数据的容错性，会将数据保存在本地磁盘中，而Streaming的计算结果全部在内存中完成，数据源很有可能成为流式系统的最大瓶颈点。对Kafka的性能调优，有以下几个点：
- 使用Kafka-0.8.2以后版本，可以使用异步模式的新Producer接口。
- 配置多个Broker的目录，设置多个IO线程，配置Topic合理的Partition个数。
详情请参见Kafka开源文档中的“性能调优”部分：http://kafka.apache.org/documentation.html
接收器调优 Streaming中已有多种数据源的接收器，例如Kafka、Flume、MQTT、ZeroMQ等，其中Kafka的接收器类型最多，也是最成熟一套接收器。 Kafka包括三种模式的接收器API：
- KafkaReceiver：直接接收Kafka数据，进程异常后，可能出现数据丢失。
- ReliableKafkaReceiver：通过ZooKeeper记录接收数据位移。
- DirectKafka：直接通过RDD读取Kafka每个Partition中的数据，数据高可靠。
从实现上来看，DirectKafka的性能会是最好的，实际测试上来看，DirectKafka也确实比其他两个API性能好了不少。因此推荐使用DirectKafka的API实现接收器。数据接收器作为一个Kafka的消费者，对于它的配置优化，请参见Kafka开源文档：http://kafka.apache.org/documentation.html
处理器调优 Streaming的底层由Spark执行，因此大部分对于Spark的调优措施，都可以应用在Streaming之中，例如：
- 数据序列化
- 配置内存
- 设置并行度
- 使用External Shuffle Service提升性能
说明：在做Spark Streaming的性能优化时需注意一点，越追求性能上的优化，Streaming整体的可靠性会越差。例如： “spark.streaming.receiver.writeAheadLog.enable”配置为“false”的时候，会明显减少磁盘的操作，提高性能，但由于缺少WAL机制，会出现异常恢复时，数据丢失。因此，在调优Streaming的时候，这些保证数据可靠性的配置项，在生产环境中是不能关闭的。
日志归档调优 参数“spark.eventLog.group.size”用来设置一个应用的JobHistory日志按照指定job个数分组，每个分组会单独创建一个文件记录日志，从而避免应用长期运行时形成单个过大日志造成JobHistory无法读取的问题，设置为“0”时表示不分组。大部分Spark Streaming任务属于小型job，而且产生速度较快，会导致频繁的分组，产生大量日志小文件消耗磁盘I/O。建议增大此值，例如改为“1000”或更大值。

作者 east

Spark 10月 26,2020

Spark Core调优经验

使用mapPartitions，按每个分区计算结果

如果每条记录的开销太大，例：

rdd.map{x=>conn=getDBConn;conn.write(x.toString);conn.close}

则可以使用MapPartitions，按每个分区计算结果，如

rdd.mapPartitions(records => conn.getDBConn;for(item <- records)
write(item.toString); conn.close)

使用mapPartitions可以更灵活地操作数据，例如对一个很大的数据求TopN，当N不是很大时，可以先使用mapPartitions对每个partition求TopN，collect结果到本地之后再做排序取TopN。这样相比直接对全量数据做排序取TopN效率要高很多。

使用coalesce调整分片的数量

coalesce可以调整分片的数量。coalesce函数有两个参数：

coalesce(numPartitions: Int, shuffle: Boolean = false)

当shuffle为true的时候，函数作用与repartition(numPartitions: Int)相同，会将数据通过Shuffle的方式重新分区；当shuffle为false的时候，则只是简单的将父RDD的多个partition合并到同一个task进行计算，shuffle为false时，如果numPartitions大于父RDD的切片数，那么分区不会重新调整。

遇到下列场景，可选择使用coalesce算子：

当之前的操作有很多filter时，使用coalesce减少空运行的任务数量。此时使用coalesce(numPartitions, false)，numPartitions小于父RDD切片数。
当输入切片个数太大，导致程序无法正常运行时使用。
当任务数过大时候Shuffle压力太大导致程序挂住不动，或者出现linux资源受限的问题。此时需要对数据重新进行分区，使用coalesce(numPartitions, true)。

localDir配置

Spark的Shuffle过程需要写本地磁盘，Shuffle是Spark性能的瓶颈，I/O是Shuffle的瓶颈。配置多个磁盘则可以并行的把数据写入磁盘。如果节点中挂载多个磁盘，则在每个磁盘配置一个Spark的localDir，这将有效分散Shuffle文件的存放，提高磁盘I/O的效率。如果只有一个磁盘，配置了多个目录，性能提升效果不明显。

Collect小数据

大数据量不适用collect操作。

collect操作会将Executor的数据发送到Driver端，因此使用collect前需要确保Driver端内存足够，以免Driver进程发生OutOfMemory异常。当不确定数据量大小时，可使用saveAsTextFile等操作把数据写入HDFS中。只有在能够大致确定数据大小且driver内存充足的时候，才能使用collect。

使用reduceByKey

reduceByKey会在Map端做本地聚合，使得Shuffle过程更加平缓，而groupByKey等Shuffle操作不会在Map端做聚合。因此能使用reduceByKey的地方尽量使用该算子，避免出现groupByKey().map(x=>(x._1,x._2.size))这类实现方式。

广播map代替数组

当每条记录需要查表，如果是Driver端用广播方式传递的数据，数据结构优先采用set/map而不是Iterator，因为Set/Map的查询速率接近O(1)，而Iterator是O(n)。

数据倾斜

当数据发生倾斜（某一部分数据量特别大），虽然没有GC（Gabage Collection，垃圾回收），但是task执行时间严重不一致。

需要重新设计key，以更小粒度的key使得task大小合理化。
修改并行度。

优化数据结构

把数据按列存放，读取数据时就可以只扫描需要的列。
使用Hash Shuffle时，通过设置spark.shuffle.consolidateFiles为true，来合并shuffle中间文件，减少shuffle文件的数量，减少文件IO操作以提升性能。最终文件数为reduce tasks数目。

作者 east

python, 人工智能, 数据挖掘 10月 8,2020

python多项式回归代码实现

多项式回归是在上文python源码实现线性回归并绘图

基础上实现的，要实现下面的多项式

可以用矩阵相乘来实现

代码如下：

import numpy as np
import matplotlib.pyplot as plt

# 读入训练数据
train = np.loadtxt('click.csv', delimiter=',', dtype='int', skiprows=1)
train_x = train[:,0]
train_y = train[:,1]

# 标准化
mu = train_x.mean()
sigma = train_x.std()
def standardize(x):
    return (x - mu) / sigma

train_z = standardize(train_x)

# 参数初始化
theta = np.random.rand(3)

# 创建训练数据的矩阵
def to_matrix(x):
    return np.vstack([np.ones(x.size), x, x ** 2]).T

X = to_matrix(train_z)

# 预测函数
def f(x):
    return np.dot(x, theta)

# 目标函数
def E(x, y):
    return 0.5 * np.sum((y - f(x)) ** 2)

# 学习率
ETA = 1e-3

# 误差的差值
diff = 1

# 更新次数
count = 0

# 直到误差的差值小于 0.01 为止，重复参数更新
error = E(X, train_y)
while diff > 1e-2:
    # 更新结果保存到临时变量
    theta = theta - ETA * np.dot(f(X) - train_y, X)

    # 计算与上一次误差的差值
    current_error = E(X, train_y)
    diff = error - current_error
    error = current_error

    # 输出日志
    count += 1
    log = '第 {} 次 : theta = {}, 差值 = {:.4f}'
    print(log.format(count, theta, diff))

# 绘图确认
x = np.linspace(-3, 3, 100)
plt.plot(train_z, train_y, 'o')
plt.plot(x, f(to_matrix(x)))
plt.show()

最后输出效果如下：

作者 east

请求头名称	取值
Content-Type	application/json;charset=utf-8
Accept	application/json;charset=utf-8

分类归档大数据开发