Hbase – 第2页 – gitweixin

Hbase 3月 1,2021

Hbase 基于二级索引的查询

基于二级索引的查询

功能介绍

针对添加了二级索引的用户表，您可以通过Filter来查询数据。其数据查询性能高于针对无二级索引用户表的数据查询。

二级索引的使用规则如下：

针对某一列或者多列创建了单索引的场景下：
- 当查询时使用此列进行过滤时，不管是AND还是OR操作，该索引都会被利用来提升查询性能。例如：Filter_Condition(IndexCol1) AND/OR Filter_Condition(IndexCol2)
- 当查询时使用“索引列AND非索引列”过滤时，此索引会被利用来提升查询性能。例如：Filter_Condition(IndexCol1) AND Filter_Condition(IndexCol2) AND Filter_Condition(NonIndexCol1)
- 当查询时使用“索引列OR非索引列”过滤时，此索引将不会被使用，查询性能不会因为索引得到提升。例如：Filter_Condition(IndexCol1) AND/OR Filter_Condition(IndexCol2) OR Filter_Condition(NonIndexCol1)
针对多个列创建的联合索引场景下：
- 当查询时使用的列（多个），是联合索引所有对应列的一部分或者全部，且列的顺序与联合索引一致时，此索引会被利用来提升查询性能。例如，针对C1、C2、C3列创建了联合索引，生效的场景包括： Filter_Condition(IndexCol1) AND Filter_Condition(IndexCol2) AND Filter_Condition(IndexCol3) Filter_Condition(IndexCol1) AND Filter_Condition(IndexCol2) Filter_Condition(IndexCol1) 不生效的场景包括： Filter_Condition(IndexCol2) AND Filter_Condition(IndexCol3) Filter_Condition(IndexCol1) AND Filter_Condition(IndexCol3) Filter_Condition(IndexCol2) Filter_Condition(IndexCol3)
- 当查询时使用“索引列AND非索引列”过滤时，此索引会被利用来提升查询性能。例如： Filter_Condition(IndexCol1) AND Filter_Condition(NonIndexCol1) Filter_Condition(IndexCol1) AND Filter_Condition(IndexCol2) AND Filter_Condition(NonIndexCol1)
- 当查询时使用“索引列OR非索引列”过滤时，此索引不会被使用，查询性能不会因为索引得到提升。例如： Filter_Condition(IndexCol1) OR Filter_Condition(NonIndexCol1) (Filter_Condition(IndexCol1) AND Filter_Condition(IndexCol2))OR ( Filter_Condition(NonIndexCol1))
- 当查询时使用多个列进行范围查询时，只有联合索引中最后一个列可指定取值范围，前面的列只能设置为“=”。例如：针对C1、C2、C3列创建了联合索引，需要进行范围查询时，只能针对C3设置取值范围，过滤条件为“C1=XXX，C2=XXX，C3=取值范围”。
针对添加了二级索引的用户表，可以通过Filter来查询数据，在单列索引和复合列索引上进行过滤查询，查询结果都与无索引结果相同，且其数据查询性能高于无二级索引用户表的数据查询性能。

代码样例

下面代码片段在com.huawei.hadoop.hbase.example包的“HBaseSample”类的testScanDataByIndex方法中：

样例：使用二级索引查找数据

  public void testScanDataByIndex() {
    LOG.info("Entering testScanDataByIndex.");
    Table table = null;
    ResultScanner scanner = null;
    try {
      table = conn.getTable(tableName);
      
      // Create a filter for indexed column.
      Filter filter = new SingleColumnValueFilter(Bytes.toBytes("info"), Bytes.toBytes("name"),
          CompareOp.EQUAL, "Li Gang".getBytes());
      Scan scan = new Scan();
      scan.setFilter(filter);
      scanner = table.getScanner(scan);
      LOG.info("Scan indexed data.");
      
      for (Result result : scanner) {
        for (Cell cell : result.rawCells()) {
          LOG.info(Bytes.toString(CellUtil.cloneRow(cell)) + ":"
              + Bytes.toString(CellUtil.cloneFamily(cell)) + ","
              + Bytes.toString(CellUtil.cloneQualifier(cell)) + ","
              + Bytes.toString(CellUtil.cloneValue(cell)));
        }
      }
      LOG.info("Scan data by index successfully.");
    } catch (IOException e) {
      LOG.error("Scan data by index failed.");
    } finally {
      if (scanner != null) {
        // Close the scanner object.
        scanner.close();
      }
      try {
        if (table != null) {
          table.close();
        }
      } catch (IOException e) {
        LOG.error("Close table failed.");
      }
    }
    
    LOG.info("Exiting testScanDataByIndex.");
  }

注意事项

需要预先对字段name创建二级索引。

Hbase创建二级索引

创建二级索引

功能简介

一般都通过调用org.apache.hadoop.hbase.hindex.client.HIndexAdmin中方法进行HBase二级索引的管理，该类中提供了创建索引的方法。

说明：

二级索引不支持修改，如果需要修改，请先删除旧的然后重新创建。

代码样例

以下代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的createIndex方法中。

public void createIndex() {     
LOG.info("Entering createIndex.");  
   String indexName = "index_name";   
  // Create index instance     
TableIndices tableIndices = new TableIndices();   
  IndexSpecification iSpec = new IndexSpecification(indexName);     iSpec.addIndexColumn(new HColumnDescriptor("info"), "name", ValueType.String);//注[1]   
  tableIndices.addIndex(iSpec);    
 HIndexAdmin iAdmin = null;     
Admin admin = null;     
try {       
admin = conn.getAdmin();     
  iAdmin = new IndexAdmin(conf);     
  // add index to the table      
 iAdmin.addIndices(tableName, tableIndices);    
   LOG.info("Create index successfully.");  
   } catch (IOException e) {    
   LOG.error("Create index failed " ,e);  
   } finally {      
 if (admin != null) {    
       try {           
  admin.close();         
} catch (IOException e) {        
     LOG.error("Close admin failed " ,e);      
   }     
  }     
  if (iAdmin != null) {      
   try {          
 // Close IndexAdmin Object    
       iAdmin.close();       
  } catch (IOException e) {   
        LOG.error("Close admin failed " ,e);  
       }    
   }     
}     
LOG.info("Exiting createIndex.");  
 }

新创建的二级索引默认是不启用的，如果需要启用指定的二级索引，可以参考如下代码片段。该代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的enableIndex方法中。

  public void enableIndex() {
    LOG.info("Entering createIndex.");

    // Name of the index to be enabled
    String indexName = "index_name";

    List<String> indexNameList = new ArrayList<String>();
    indexNameList.add(indexName);
    HIndexAdmin iAdmin = null;
    try {
      iAdmin = HIndexClient.newHIndexAdmin(conn.getAdmin());
      // Alternately, enable the specified indices
      iAdmin.enableIndices(tableName, indexNameList);
      System.out.println("Successfully enable indices " + indexNameList + " of the table " + tableName);
    } catch (IOException e) {
      System.out.println("Failed to enable indices " + indexNameList + " of the table " + tableName + "." + e);
    } finally {
      if (iAdmin != null) {
        try {
          iAdmin.close();
        } catch (IOException e) {
          LOG.error("Close admin failed ", e);
        }
      }
    }
  }

注意事项

注[1]：创建联合索引

HBase支持在多个字段上创建二级索引，例如在列name和age上。

HIndexSpecification iSpecUnite = new HIndexSpecification(indexName); 
 iSpecUnite.addIndexColumn(new HColumnDescriptor("info"), "name", ValueType.String); 
 iSpecUnite.addIndexColumn(new HColumnDescriptor("info"), "age", ValueType.String);

HBase支持全文索引

功能简介

通过org.apache.luna.client.LunaAdmin对象的createTable方法来创建表和索引，并指定表名、列族名、索引创建请求，mapping文件所在目录路径。也可通过addCollection往已有表中添加索引。查询时通过org.apache.luna.client.LunaAdmin对象的getTable方法来获取Table对象进行scan操作。

说明：

表的列名以及列族名不能包含特殊字符，可以由字母、数字以及下划线组成。

带有全文索引的HBase表限制：

1、不支持多实例；

2、不支持容灾备份恢复；

3、不支持删除行/列族操作；

4、Solr侧查询不支持强一致性；

代码样例片段

以下代码片段在com.huawei.bigdata.hbase.examples包的“LunaSample”类的testFullTextScan方法中。

  public static void testFullTextScan() throws Exception {
    /**
     * Create create request of Solr. Specify collection name, confset name,
     * number of shards, and number of replication factor.
     */
    Create create = new Create();
    create.setCollectionName(COLLECTION_NAME);
    create.setConfigName(CONFSET_NAME);
    create.setNumShards(NUM_OF_SHARDS);
    create.setReplicationFactor(NUM_OF_REPLICATIONFACTOR);
    /**
     * Create mapping. Specify index fields(mandatory) and non-index
     * fields(optional).
     */
    List<ColumnField> indexedFields = new ArrayList<ColumnField>();
    indexedFields.add(new ColumnField("name", "f:n"));
    indexedFields.add(new ColumnField("cat", "f:t"));
    indexedFields.add(new ColumnField("features", "f:d"));
    Mapping mapping = new Mapping(indexedFields);
    /**
     * Create table descriptor of HBase.
     */
    HTableDescriptor desc = new HTableDescriptor(HBASE_TABLE);
    desc.addFamily(new HColumnDescriptor(TABLE_FAMILY));
    /**
     * Create table and collection at the same time.
     */
    LunaAdmin admin = null;
    try {
      admin = new AdminSingleton().getAdmin();
      admin.deleteTable(HBASE_TABLE);
      if (!admin.tableExists(HBASE_TABLE)) {
        admin.createTable(desc, Bytes.toByteArrays(new String[] { "0", "1", "2", "3", "4" }),
            create, mapping);
      }
      /**
       * Put data.
       */
      Table table = admin.getTable(HBASE_TABLE);
      int i = 0;
      while (i < 5) {
        byte[] row = Bytes.toBytes(i + "+sohrowkey");
        Put put = new Put(row);
        put.addColumn(TABLE_FAMILY, Bytes.toBytes("n"), Bytes.toBytes("ZhangSan" + i));
        put.addColumn(TABLE_FAMILY, Bytes.toBytes("t"), Bytes.toBytes("CO" + i));
        put.addColumn(TABLE_FAMILY, Bytes.toBytes("d"), Bytes.toBytes("Male, Leader of M.O" + i));
        table.put(put);
        i++;
      }

      /**
       * Scan table.
       */
      Scan scan = new Scan();
      SolrQuery query = new SolrQuery();
      query.setQuery("name:ZhangSan1 AND cat:CO1");
      Filter filter = new FullTextFilter(query, COLLECTION_NAME);
      scan.setFilter(filter);
      ResultScanner scanner = table.getScanner(scan);
      LOG.info("-----------------records----------------");
      for (Result r = scanner.next(); r != null; r = scanner.next()) {
        for (Cell cell : r.rawCells()) {
          LOG.info(Bytes.toString(CellUtil.cloneRow(cell)) + ":"
              + Bytes.toString(CellUtil.cloneFamily(cell)) + ","
              + Bytes.toString(CellUtil.cloneQualifier(cell)) + ","
              + Bytes.toString(CellUtil.cloneValue(cell)));
        }
      }
      LOG.info("-------------------end------------------");
      /**
       * Delete collection.
       */
      admin.deleteCollection(HBASE_TABLE, COLLECTION_NAME);

      /**
       * Delete table.
       */
      admin.deleteTable(HBASE_TABLE);
    } catch (IOException e) {
      e.printStackTrace();
    } finally {
      /**
       * When everything done, close LunaAdmin.
       */
      admin.close();
    }
  }

解释

（1）创建索引请求

（2）创建表描述符

（3）获取LunaAdmin对象，LunaAdmin提供了建表和索引、添加索引、检查表是否存在、检查索引是否存在、删除索引和删除表等功能。

（4）调用LunaAdmin的建表方法。

（5）往表中插入数据。

（6）构造全文索引条件，设置FullTextFilter，进行查询。

（7）删除索引。

（8）删除表。

（9）关闭admin资源。

注意事项

创建表和索引都必须不存在。
必须使用LunaAdmin获取Table对象进行scan操作。

作者 east

Hbase 10月 26,2020

Hbase使用过滤器Filter例子

使用过滤器Filter

功能简介

HBase Filter主要在Scan和Get过程中进行数据过滤，通过设置一些过滤条件来实现，如设置RowKey、列名或者列值的过滤条件。

代码样例

以下代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的testSingleColumnValueFilter方法中。

public void testSingleColumnValueFilter() {    
 LOG.info("Entering testSingleColumnValueFilter.");  
 Table table = null;     
 ResultScanner rScanner = null;     
try {       
table = conn.getTable(tableName);    
 Scan scan = new Scan();     
 scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name")); 
      // Set the filter criteria.      
 SingleColumnValueFilter filter = new SingleColumnValueFilter(           Bytes.toBytes("info"), Bytes.toBytes("name"), CompareOp.EQUAL,           Bytes.toBytes("Xu Bing"));    
   scan.setFilter(filter);      
 // Submit a scan request.    
   rScanner = table.getScanner(scan);     
  // Print query results.     
  for (Result r = rScanner.next(); r != null; r = rScanner.next()) {         for (Cell cell : r.rawCells()) {           LOG.info(Bytes.toString(CellUtil.cloneRow(cell)) + ":"               + Bytes.toString(CellUtil.cloneFamily(cell)) + ","               + Bytes.toString(CellUtil.cloneQualifier(cell)) + ","               + Bytes.toString(CellUtil.cloneValue(cell)));         }       }       LOG.info("Single column value filter successfully.");     } catch (IOException e) {      
 LOG.error("Single column value filter failed " ,e);    
 } finally {       
  if (rScanner != null) {       
      // Close the scanner object.     
        rScanner.close();       
    }     
  if (table != null) {     
    try {      
     // Close the HTable object.   
        table.close();      
   } catch (IOException e) {        
   LOG.error("Close table failed " ,e);   
      }      
 }   
  }    
 LOG.info("Exiting testSingleColumnValueFilter."); 
  }

注意事项

当前二级索引不支持使用SubstringComparator类定义的对象作为Filter的比较器。

例如，如下示例中的用法当前不支持：

Scan scan = new Scan();
filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
filterList.addFilter(new SingleColumnValueFilter(Bytes
.toBytes(columnFamily), Bytes.toBytes(qualifier),
CompareOp.EQUAL, new SubstringComparator(substring)));
scan.setFilter(filterList);

作者 east

Hbase 1月 1,2020

hbase的常用操作工具类


public class HbaseUtil {

    private static SimpleDateFormat parse = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

    private static Configuration conf = null;

    static{
        setConf();
    }

    private static void setConf(){
        conf = HBaseConfiguration.create();
        String userDir = System.getProperty("user.dir") + File.separator + "conf" + File.separator;
        Path hconf_path = new Path(userDir + "conf.xml");
        conf.addResource(hconf_path);
    }

    public static Connection getConn() throws IOException {
        return ConnectionFactory.createConnection(conf);
    }

    /**
     * 该方法用于关闭表和connection的连接
     * @param table
     * @param conn
     */
    private static void closeSource(Table table, Connection conn,ResultScanner scanner){
            try {
                if(table != null) table.close();
                if (conn != null) conn.close();
                if (scanner != null) scanner.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
    }

    /**
     * 轨迹查询：根据表名 mac 起始时间 结束时间查询
     * @param tableName
     * @param mac
     * @param startTime
     * @param endTime
     * @return
     * @throws IOException
     */
    public static ResultScanner scan(String tableName, String mac, long startTime, long endTime) throws IOException {
        Connection conn = null;
        Table table = null;
        ResultScanner scanner = null;
        try {
            conn = HbaseUtil.getConn();
            table = conn.getTable(TableName.valueOf(tableName));
            Scan scan = new Scan();

            byte[] startRow = (mac + startTime).getBytes();
            byte[] endRow = (mac + endTime).getBytes();

            scan.setStartRow(startRow);
            scan.setStopRow(endRow);

            scanner = table.getScanner(scan);
            return scanner;
        }catch (Exception e){
            e.printStackTrace();
        }finally {
            closeSource(table,conn,scanner);
        }
        return null;
    }
}

作者 east

Hbase 1月 1,2020

如何使用hbase行键过滤器RowFilter

RowFilter是用来对rowkey进行过滤的,比较符如下:

Operator	Description
LESS	小于
LESS_OR_EQUAL	小于等于
EQUAL	等于
NOT_EQUAL	不等于
GREATER_OR_EQUAL	大于等于
GREATER	大于
NO_OP	排除所有

Comparator	Description
BinaryComparator	使用Bytes.compareTo()比较
BinaryPrefixComparator	和BinaryComparator差不多，从前面开始比较
NullComparator	Does not compare against an actual value but whether a given one is null, or not null.
BitComparator	Performs a bitwise comparison, providing a BitwiseOp class with AND, OR, and XOR operators.
RegexStringComparator	正则表达式
SubstringComparator	把数据当成字符串，用contains()来判断

提取rowkey以01结尾数据
Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL,new RegexStringComparator(“.*01$”));

提取rowkey以包含201407的数据
Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL,new SubstringComparator(“201407”));

提取rowkey以123开头的数据
Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL,new BinaryPrefixComparator(“123”.getBytes()));

作者 east

Hbase 12月 14,2018

利用JavaAPI来操作Hbase

例子采用的是完全分布式集群，不是hbase自带的zookeeper，是独立的zookeeper

mvn的依赖如下：

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>1.4.8</version>
</dependency>

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-server</artifactId>
    <version>1.4.8</version>
</dependency>

在hbase中创建表、插入数据、查询数据等操作

import java.text.SimpleDateFormat
import java.util

import hbase.TestHbaeJavaApi.conf
import org.apache.hadoop.hbase.{HBaseConfiguration, TableName}
import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp
import org.apache.hadoop.hbase.filter.{FilterList, SingleColumnValueFilter}
import org.apache.hadoop.hbase.util.Bytes

/**
  * 利用JavaAPI来操作Hbase
  */
object HBaseTool {
  val zkQuorum = "192.168.0.219"
  val port = "2181"
  val table = "test"
  val cf = "cf1"
  val config = HBaseConfiguration.create()
  config.set("hbase.zookeeper.property.clientPort", "2181")
  config.set("hbase.zookeeper.quorum", "192.168.0.219")
  config.set("hbase.master", "192.168.0.219:600000")

  def putData(rowKey:String, cf:String = cf, kv:Seq[(String,String)]): Put ={
    val put = new Put(Bytes.toBytes(rowKey))
    kv.foreach{ kv =>
      put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(kv._1), Bytes.toBytes(kv._2))
    }
    put
  }

  def getData(rowKey:String, qualifier:String =null): Get={
    val get = new Get(Bytes.toBytes(rowKey))
    if(qualifier != null)
      get.addColumn(Bytes.toBytes(cf), Bytes.toBytes(qualifier))
    get
  }

  def getScan(startRow:String, stopRow:String, filters:Map[String,String]= null, columns:Seq[String]= null): Scan= {
    var filterList: FilterList = null
    val scan = new Scan()
      .setStartRow(Bytes.toBytes(startRow))
      .setStopRow(Bytes.toBytes(stopRow))

    if(filters != null){
      filterList = getFilters(filters)
      if(filterList.getFilters.size() > 0)
        scan.setFilter(filterList)
    }

    if(columns != null) {
      columns.foreach{ column =>
        scan.addColumn(Bytes.toBytes(cf), Bytes.toBytes(column))
      }
    }

    scan
  }

  def getFilters(kv:Map[String,String]): FilterList = {
    val filterList = new FilterList()
    kv.toSeq.foreach{ kv =>
      val filter = new SingleColumnValueFilter(
        Bytes.toBytes(cf),
        Bytes.toBytes(kv._1),
        CompareOp.EQUAL,
        Bytes.toBytes(kv._2)
      )
      filter.setFilterIfMissing(true)
      filterList.addFilter(filter)
    }
    filterList
  }

  def main(args: Array[String]): Unit = {
    val rowKey = new SimpleDateFormat("yyyy-MM-dd").format(System.currentTimeMillis())
    val testSchema = Seq("id","name","age")
    val testData = Seq("10001","Jack","22")
    /**
      * 1. 插入数据到HBase
      */
    val hTable: HTable = new HTable(config, TableName.valueOf(table))
    hTable.setAutoFlush(false)
    hTable.setWriteBufferSize(10 * 1024 * 1024)
    //处理任务功能
    hTable.put(putData(rowKey, cf=cf, testSchema zip testData))

    hTable.flushCommits()

    /**
      * 2. 通过Get查询HBase
      */
    val result: Result = hTable.get(getData(rowKey))
    for(kv <- result.raw()) {
      println("key="+Bytes.toString(kv.getQualifier)+", value="+Bytes.toString(kv.getValue))
    }

    val result1: Result = hTable.get(getData(rowKey, testSchema.toList(2)))
    for(kv <- result1.raw()) {
      println("value="+Bytes.toString(kv.getValue))
    }

    /**
      * 3. 通过Scan查询HBase
      */
    val scan = getScan(rowKey, rowKey)
    val resultScan: ResultScanner = hTable.getScanner(scan)
    val ite: util.Iterator[Result] = resultScan.iterator()
    while(ite.hasNext) {
      val result = ite.next()
      for(kv <- result.raw()) {
        println("rowKey="+Bytes.toString(kv.getRow)+", key="+Bytes.toString(kv.getQualifier)+", value="+Bytes.toString(kv.getValue))
      }
    }

    hTable.close()
  }



}

作者 east

分类归档Hbase