יום רביעי, 29 במאי 2013

Hbase

A great hello world tutorial explaining about how to start with Hbase and Hadoop can be found Here.
This is my summery and notes about the post :
Installing the SSH server:
sudo apt-get install openssh-server

Create the Hadoop user:
sudo addgroup hadoop
sudo adduser --ingroup hadoop huser

Generate the user public keys:
#login as hadoop user
sudo -i -u huser 
#Create the hadoop user public key
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
#Copy the generated public key onto the ssh/authorized_keys
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys 

Setting Up HDFS
#Create a directory used to contain the HDFS file
mkdir /home/huser/my_hdfs_folder

Update the hadoop config with the hdfs directrory
Note:hadoop.tmp.dir is used as the base for temporary directories locally, and also in HDFS.
The following configuration set the created directory as the HDFS directory.

  1: <?xml version=”1.0”?>
  2:  <?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
  3:  <configuration>
  4:  <property>
  5:  <name>hadoop.tmp.dir</name>
  6:  <value>/home/huser/my_hdfs_folder</value>
  7:  </property>
  8:  <property>
  9:  <name>fs.default.name</name>
 10:  <value>hdfs://ubuntu:8020</value>
 11:  </property>
 12:  </configuration>

#Format the HDFS
/usr/local/hadoop/bin/hadoop namenode –format
#start the hadoop single instance
/usr/local/hadoop/bin/start-all.sh

View the lifeness of the hdoop instance in the following url:
http://ubuntu:50070/dfshealth.jsp


Setting up HBase
HBase need a directory inside of the HDFS
We create it using the HDFS fs –mkdir command for example
/usr/local/hadoop/bin/hadoop fs -mkdir myHbase
The new hdfs directoy should be point out in the Hbase site configuration file :
hbase-site.xml.

  1: configuration> 
  2:  <property> 
  3:  <name>hbase.rootdir</name> 
  4:  <value>hdfs://ubuntu:8020/user/huser/myHbase</value> 
  5:  <description> 
  6:  </description> 
  7:  </property> 
  8:  <property> 
  9:  <name>hbase.master</name> 
 10:  <value>ubuntu:60000</value> 
 11:  <description> 
 12:  </description> 
 13:  </property> 
 14:  </configuration>

Start the HBase DB
/usr/local/hbase/bin/start-hbase.sh
Monitor its lifeness
http://ubuntu:60010/master-status
Starting the Shell
/usr/local/hbase/bin/hbase shell

Create and Update a simple DB
#Create a new table named myBlogs along with a column family BlogText 
create ‘myBlogs','BlogText'
#insert some data

  1: put ‘myBlogs','Ruby','BlogText:1','About ruby bla bla.'
  2: 
  3: put ‘myBlogs','Ruby','BlogText:2','about {|X| bla bal.'
  4: 
  5: put ‘myBlogs','Ruby','BlogText:3','for loops.'
  6: 
  7: put ‘myBlogs','Python','BlogText:1','iter tools .'

The following code is used to query the created  hbase DB

  1: package my.learn.hbase;
  2: 
  3: import java.util.NavigableMap;
  4: import java.util.NavigableSet;
  5: 
  6: import org.apache.hadoop.conf.Configuration;
  7: import org.apache.hadoop.hbase.HBaseConfiguration;
  8: import org.apache.hadoop.hbase.client.HBaseAdmin;
  9: import org.apache.hadoop.hbase.client.HTableFactory;
 10: import org.apache.hadoop.hbase.client.HTableInterface;
 11: import org.apache.hadoop.hbase.client.Result;
 12: import org.apache.hadoop.hbase.client.ResultScanner;
 13: import org.apache.hadoop.hbase.client.Scan;
 14: import org.apache.hadoop.hbase.util.Bytes;
 15: 
 16: public class HBaseReadMyBlogsData  {
 17: 
 18:   public static final byte[] TablemyBlogs = Bytes.toBytes("myBlogs");
 19:   // The column family
 20:   public static final byte[] BlogText_FAMILY = Bytes.toBytes("BlogText");
 21: 
 22:   
 23:   private void ShowTheBlogsText() throws Exception {
 24: 
 25:     // Load's the hbase-site.xml config
 26:     Configuration config = HBaseConfiguration.create();
 27:     //Factory for creating HTable instances.
 28:     HTableFactory factory = new HTableFactory();
 29:     
 30:     HBaseAdmin.checkHBaseAvailable(config);
 31: 
 32:     // Link to table
 33:     HTableInterface table = factory.createHTableInterface(config,
 34:         TablemyBlogs);
 35: 
 36:     // Used to retrieve rows from the table
 37:     Scan scan = new Scan();
 38: 
 39:     // Scan through each row in the table
 40:     ResultScanner rs = table.getScanner(scan);
 41:     try {
 42:       // Loop through each retrieved row
 43:       for (Result r = rs.next(); r != null; r = rs.next()) {
 44:         //print out the row key
 45:         System.out.println("Key: " + new String(r.getRow()));
 46:       
 47:         //For each key loop over its qualifier for "ruby" key we will have 1 , 2 , 3 
 48:         
 49:         NavigableMap familyMap = r
 50:             .getFamilyMap(BlogText_FAMILY);
 51:         // This is a list of the qualifier keys
 52:         NavigableSet keySet = familyMap.navigableKeySet();
 53: 
 54:         // Print out each value within each qualifier
 55:         for (byte[] key : keySet) {
 56:           System.out.println("\t Definition: " + (new String(key))
 57:               + ", Value:"
 58:               + new String(r.getValue(BlogText_FAMILY, key)));
 59:         }
 60:       }
 61:     } catch (Exception e) {
 62:       throw e;
 63:     } finally {
 64:       rs.close();
 65:     }
 66: 
 67:   }
 68: }

Notes:
The HBaseAdmin provides an interface to manage HBase database table metadata + general administrative functions. like create, drop, list, enable and disable tables.
The HBaseAdmin can be used to add and drop table column families.

אין תגובות:

הוסף רשומת תגובה