( 출처 : 연세대학교 데이터베이스 시스템 수업 (CSI6541) 강의자료 )

Chapter 9. Indexing and Hashing

Contents

Basic Concepts
Ordered Indices
..

(1) Basic Concepts

Indexing mechanisms

used to speed up access to desired data

Search Key :

set of attributes used to look up records in a file

Index file …

consists of records (called index entries) of the form search-key & pointer
typically much smaller than the original file

2 basic kinds of indices

(1) ordered indices
- search keys are stored in sorted order
(2) hash indices
- search keys are distributed uniformly across “buckets” using a “hash function”

(2) Ordered Indicies

( stored in sorted order of search key value )

If the file containing the records is sequentially ordered…

(1) primary index

( = also called clustering index )
- index whose search key also defines the sequential order of the file
(2) secondary index

( = also called non-clustering index )
- index whose search key specifies an order different from the sequential order of the file

Indexed sequential file :

sequential file with a primary index

Primary Index : Dense Index Files

data의 “모든 key가 index에서 표현”
Dense Index : “key들을 data의 순서와 동일하게 유지”
Index : key & pointer만을 보유

\(\rightarrow\) data 자체보다 훨씬 적은 공간 소모

Primary Index : Sparse Index Files

(Dense Index) 모든 key를 가지고 있기엔, data가 너무 많음

\(\rightarrow\) 이를 보완하기 위한 것이 Sparse Index
data (X) data block (O) 마다 1개의 key-pointer 쌍을 가짐
장 & 단 ( vs. Dense Index )
- 장) 훨씬 적은 공간을 사용
- 단) 주어진 record의 key를 찾는데 더 많은 시간 걸림

Primary Index : Multilevel Index Files

Secondary Index

(3) B\(^{+}\)-Tree Index Files

( = alternative to indexed-sequential files )

Indexed-sequential files의 단점

Performance degrades as file grows

( many overflow blocks get created for index files )
Periodic reorganization of entire index file is required

B\(^{+}\)-Tree Index Files의 장점

automatically reorganizes itself with small and local changes
Reorganization of entire file is not required to maintain performance

B\(^{+}\)-Tree Index Files의 단점

extra insertion and deletion overhead, space overhead

B\(^{+}\)-Tree Index Files의 특징

All paths from root to leaf are of the same length
Each node that is not a root or a leaf has between \(\lceil\mathrm{n} / 2\rceil\) and \(n\) children
A leaf node that is not a root has between \(n\lceil(n-1) / 2\rceil\) and \(n-1\) values
Root must have at least two children

Node의 구조

\(K_i\) : search-key
\(P_i\) : pointers …
- to children ( for non-leaf nodes )
- to records ( for leaf nodes )
\(K_i\)s in a node are ordered as…
- \(K_1 < \cdots K_{n-1}\).

Leaf Nodes

( For \(i\) = 1, 2, …, n-1 ) \(P_i\) either points to …
- (1) a file record with \(K_i\)
- (2) a bucket of pointers to file records
( \(L_i\) : leaf node \(i\) ) if \(i < j\) , \(L_i\)’s search-key < \(L_j\)’s search-key

Non-Leaf Nodes

form a multi-level sparse index on the leaf nodes.
For a non-leaf node with \(m\) pointers …
- (1) All the search-keys in the subtree to which \(P_1\) points < \(K_1\)
- (2) All the search-keys in the subtree to which \(P_m\) points \(\geq\) \(K_{m-1}\)
- (3) ( For \(2 \leq i \leq m-1\) )
  - \(K_{i-1}\) \(\leq\) All the search-keys in the subtree to which \(P_i\) points <\(K_{i}\)

Queries On B\(^+\)-Trees

B\(^+\)-Tree Insertion

filled from bottom
Perform a search to find a target leaf node for the new entry
- (a) if the leaf node is NOT FULL
  - add the entry
- (b) if the leaf node is FULL
  - b-1) split the node into 2 parts
  - b-2)
    - 1st half entries are stored in one node
    - 2nd half entries are moved to a new node
  - b-3) The first entry of a new node is copied to the parent of the leaf
If a non-leaf node overflows:
- b-1) SAME
- b-2) SAME
- b-3) The first entry of a new node is moved to the parent of the node

B\(^+\)-Tree Deletion

The target entry is searched &deleted at the leaf node
If underflow occurs after deletion,

\(\rightarrow\) distribute the entries from the node left to it
If distribution is not possible from left,

\(\rightarrow\) distribute from the node right to it
If distribution is not possible from left or from right,

\(\rightarrow\) merge the node with left and right to it

(4) B-Tree Index Files

B-Tree vs B\(^+\)-Tree

Details

allows search-key values to appear only once

( = eliminates redundant storage of search keys )
search keys in non-leaf nodes appear no where else in the B-tree
- an additional pointer field for each search key in a non-leaf node must be included

Advantages of B-Tree indices:

use less tree nodes than a corresponding B+-Tree
Sometimes possible to find search-key value before reaching leaf node

Disadvantages of B-Tree indices:

Non-leaf nodes are larger ( keeps data in it )

-> typically have greater depth than corresponding B+-Tree
Insertion and deletion more complicated than in B+-Trees
Implementation is harder than B+-Trees

Twitter Facebook LinkedIn

(Chapter 9) Indexing and Hashing

Seunghan Lee