Understanding B-TREE INDEXING…..

Posted: March 19, 2012 in Uncategorized

The essence of our technique is quite simple. Rows
are assigned tag values in the order in which they are
added to the table. Note that tag values identify rows in
a table, not records in an individual partition or in an
individual index. Each tag value appears precisely once
in each index. All vertical partitions are stored in B-tree
format with the tag value as the leading key. The important
novel aspect is how storage of this leading key is
reduced to practically zero.
The essence of our technique is that in each B-tree
page, the page header stores the lowest tag value among
all B-tree entries on that page, and the actual tag value
for each individual B-tree entry is calculated by adding
this value and the slot number of the entry within the
page. There is no need to store the tag value in the individual
B-tree entries; only a single tag value is required
per page. If a page contains tens, hundreds, or even
thousands of B-tree entries, the overhead for storing the
minimal tag value is practically zero for each individual
record. If the size of the row identifier is 4 or 8 bytes
and the size of a B-tree node is 8 KB, the per-page row
identifier imposes an overhead of 0.1% or less.
If all the records in a page have consecutive tag
values, this method not only solves the storage problem
but also reduces “search” for a particular key value in
the index to a little bit of arithmetic followed by a direct
access to the desired B-tree entry. Thus, the access performance
in leaf pages of these B-trees can be even better
than that achieved with interpolation search or in
hash indexes.
If records in a page do not have consecutive tag
values, the proposed method does not work, at least not
immediately. There are multiple ways to design for possible
gaps in the sequence of tags. One way is to prohibit
and avoid gaps, e.g., by means of a strict requirement
that rows in the table are only appended at the end
or they are deleted only in the order in which they were
added. Gaps in the tag sequence within one index usually
imply that the rows in the table lack consecutive tag
values and that the same problem exists in all B-tree
representing vertical partitions. Alternatively, gaps may
occur due to missing values or Null values.
A second way for dealing with gaps in the sequence
of tags is to retain ghost records in the B-tree pages.
During deletion of a single row in the table and the corresponding
records in all the table’s indexes, the B-tree be considered, including concurrency control and recovery,
bulk insertions and deletions, online index creation,
index creation with allocation-only logging, verification
to guard against corruption due to hardware or
software faults, etc.

Additional applications
In the discussions above, a single table was partitioned
vertically and tags assigned per table. Alternatively,
a table may be partitioned in multiple steps. The
first step groups columns into subsets and sort order
defined for each subset. Tags are assigned based on this
sort order. The second step partitions each subset into
storage structures, e.g., B-trees on tag columns with the
compression feature described earlier. Thus, in this storage
architecture, the earlier discussions apply not to a
traditional logical table or view but to each vertical partition
of such a table or view.
In more traditional database settings, there are some
real-world business processes in which sequential numbers
or identifiers are common, important, or even legally
required. For example, orders, invoices, cheques,
etc. fall into this category. For databases that describe
these real-world objects, indexes that map real-world
identifiers to additional information can benefit from the
compression method described. Even if there are large
gaps in the overall sequence, e.g., in vehicle identification
numbers, data in many index pages will be short
coherent sequences that may benefit. For small gaps,
ghost slots as described above might be sufficient.
In other words to maximize effective scan bandwidth,
column stores should be 100% full. Thus,
changes should be initially retained elsewhere using
techniques like differential files [SL 76] and then applied
in bulk. For efficient capture, e.g., during bulk
loading, the changes should likely be in row format
rather than column format [SAB 05].
A recent study of master-detail clustering within Btree
indexes [G 07] found that the proposed compression
method applies even there. In other words, multiple
columns can be stored in a single B-tree in a columnoriented
format rather than the traditional row-oriented
form. Each column is assigned to key range within the
B-tree such that the individual columns are concatenated.
Each record’s key consists of column identifier
and row identifier. After the column identifier has been
truncated using prefix truncation [BU 77], the row identifier
can be truncated using the presented design. Even
recent insertions and deletions in row format can be
represented in the same B-tree in yet another key range.
Finally, independent of the scan performance in relational
data warehousing environments, vertical partitioning
and columnar storage using B-trees as described
automatically turns row-level locking into column-level
locking.

Ref: http://sigmod.acm.org

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s