-
Essay / IMPROVING DATA LOCALITY AND AVAILABILITY IN HBASE...
The optimal calculation of allocation of regions to region servers is performed by monitoring the locality index of a region and an overall decision should be made taken by the master when assigning regions to a region server. A maximum throughput at minimum cost algorithm can be followed in this type of situation for optimal performance. The maximum flow at minimum cost algorithm states that for any graph, it should be considered as a maximum flow with the minimum possible traversal cost. This problem combines maximum throughput (getting as much throughput as possible from source to sink) with shortest path (reaching from source to sink with minimum cost). In this document, the minimum cost indicates the performance resources used which should be minimum and the maximum flow should be directed to the region with the highest locality index. Therefore, region allocation should be allocated to preserve data locality, but also keep performance in mind. 4.2 Optimization of locality index There are three main factors by which we can compare the locality index in HBase cluster: -1. Graph:- Let us consider a bipartite graph and consider 2 sets of regions and region servers respectively on the left and right sides of the graph. Each node in the region set is connected to one of the nodes in the source set. Each node in the set of region servers is connected to one of the sink nodes for equal connection.2. Capacity:- Capacity is the connection between any two nodes and the capacity between a source node and a region node is one. And the capacity between region nodes and region server nodes is also one. The purpose of assigning a capacity of one is to indicate that each region server can only be assigned one region. Neve...... middle of article...... In this article we have considered a point where data locality is lost when a region server restarts and regions are then allocated from randomly after the HMaster has checked the META. tableau.The four violations listed above are more than enough reasons to violate data locality. The region locality index method indicates the index of a region and helps the HMaster look at it in the META table to reassign that region by looking at its locality index. The region server that will have the highest locality index for a region will be assigned that region and, thanks to this data locality, can be preserved even after restarting the cluster or after a region server failure due to of an overload. Future work is expected to be more focused on preserving data locality in almost all scenarios, to avoid remote reads and writes more often..