Apache HBase is the column oriented NOSQl Database that is based on the Big Table paper by Google. It works on the top of hadoop and thus supports fast storage and retrieval of naturally.

Apache HBase schema is different from the SQL schema in which we have a number of columns which hold the data of specific size. In Hbase all the data is type is stored in bytes and the table is divided into column family which can hold multiple number of columns. The schema must be defined in such a way that the data of the column family are accessed together. This design enables Hbase to access the data in the same column family together thus enabling to retrieve the data faster.

Apache HBase being a NOSQL Database has the slowly changing dimemnsion characterics i.e. we can specify the column name each time we insert the data thus the column can increase every time a user insert the data in the HBase. The HBase also does not store null value for column whose data is not specified thus saving space.

HBase has a weird property of versions i.e. every column in HBase is stored with the timestamp on which the data was added to the HBase.

The HBase shell has following important commands.

  1. create
create '<t_n>', <c_f1>, <c_f2>, ....

The most important fact of the above syntax is that the table name should be included in single quote and only column family must be specified at the time of table creation. 2. put

put '<t_n>', <row-id>, <c_f>:<col_n>, value

The above command is used to insert data on the HBase. We can easily see its stark difference from the SQL Database in which one of the column is primary key and all the colums value is inserted at once but in Hbase the function of primary key is done by row-id and we insert one column value at a time. 3. get

get '<t_n>',`row-id`

The above command is used to get the row of the specific row-id. 4. scan

scan '<t_n>', { COLUMNS => [<cf_1>,<cf_2>,...], FILTER => }

It returns all the record with its columns that passes the specific filter. 5. alter

alter '<t_n'>, prop
  1. delete
delete '<t_n>', <row-id>, <cf_1>:<col_n>