( 출처 : 연세대학교 데이터베이스 시스템 수업 (CSI6541) 강의자료 )

Chapter 10. Indexing and Hashing (2)

Contents

Static Hashing
Dynamic Hashing
Comparison of Ordered Indexing and Hashing
Multiple-Key Access
Bitmap Index

(1) Static Hashing

( https://huzz.tistory.com/60 )

Bucket?

unit of storage containing one or more records
( typically a disk block )

In hash file organization…

obtain the bucket directly from its search-key value using a “hash function”

Records with different search-key values may be mapped to the SAME bucket

$\rightarrow$ $\therefore$ entire bucket has to be searched sequentially

Hash Organization

( Key : dpet_name )

Hash Indices

Hash can be used for both..

(1) file organization
(2) index-structure creation

Hash Index organizes the search keys

Deficiencies of Static Hashing

Static Hashing

hash function maps search-key values to a fixed set of bucket addresses

Databases grow with time

$\rightarrow$ If initial number of buckets is too small, performance will degrade due to too many overflows
What if LARGE space at the first place ?
- significant amount of space will be wasted initially
- If database shrinks, space will be wasted
Option : re-organization of the index file with a NEW hash function

$\rightarrow$ verey expensive

$\rightarrow$ solved by using techniques that allow the number of buckets to be modified “dynamically”

Blog Summary

Introduction

Bucket : 여러 개의 records를 저장
Hash function ( = h ) : search key K를 받아 bucket address B를 계산

Static hashing

h(K) 계산해서 bucket을 찾아간 후,
해당 bucket 안에 있는 모든 records에 대해 일치하는 records가 있는지 검사

Worst Possible Hash Function

case : h(K)값이 모두 한 값으로 몰리게 되어 모든 records가 한 bucket에 저장
즉, 저장되어있는 모든 records를 전수 조사하는 꼴

$\rightarrow$ 좋은 hash function을 선택해야!

좋은 hash function의 2가지 조건

1. Uniform: 가능한 모든 search key에 대해서 각 bucket으로 할당되는 key들의 개수가 균일해야 한다.
2. Random: 평균적으로, 각 bucket은 거의 비슷한 수의 key들이 할당되어야!
- ex) string 길이, 알파벳 순 등외적 인 조건들을 이용한 함수 (X),
  
  Random하게 계산되는 함수 (O)

Handling of Bucket Overflows

Bucket : 처음에 지정해준 size 만큼만 저장

$\rightarrow$ 계속 insert하다보면 언젠가 bucket이 꽉 차게 될 것 ( = Bucket Overflow )
해결책 :
- (1) closed hashing
- (2) open hashing

1. Closed hashing:

b1이 꽉 찼다면, 새로운 bucket b2을 만들고 b1 뒤에 b2를 연결
bucket들을 linked list 같이 다루는 방식을 overflow chaining이라 함

2. Open hashing:

( 위처럼 뒤에 연결하는 것이 아니라 ) 일련의 규칙에 의해 다른 bucket에 record를 저장
ex) 선형(또는 제곱)으로 몇 칸을 더 뒤로 간다든가(linear/quadratic probing) hash function을 한 번 더 적용한다든가 등

Open hashing ( vs. Closed hashing )

장점 ) 새로운 공간을 덜 사용
단점) deletion이 빈번히 일어나는 DB에 부적절

$\rightarrow$ 일반적으로 closed hashing을 선호

(2) Dynamic Hashing

( https://huzz.tistory.com/61 )

Details :

Suitable for DB that grows and shrinks in size
Allows the hash function to be modified dynamically
Ex) Extendable Hashing

Extendable Hashing

Hash function : generates values over a large range
use only a prefix of the hash function to index into a bucket address table

( 아래 참조하기 )

Blog Summary

Drawback of Static Hashing

Current size에 맞게 hash function을 구성

$\rightarrow$ data 증가에 따른 성능 저하
Anticipated size에 맞게 hash function을 구성

$\rightarrow$ Anticipated size에 도달하기 전까지는 wasted space가 발생
Size에 맞게 hash function을 변경하고 reorganization을 수행

$\rightarrow$ TOO EXPENSIVE & reorganization 동안 data access 불가

Solution : Extendable hashing ( dynamic hashing의 기법 )

extendable : 확장 가능한
hash function h를 잘 선택해야!
- (1) unformity & randomness를 잘 만족
- (2) hash value가 32bit의 binary integer로 나타나게끔
  - hash value와 정확히 1:1로 대응하는 bucket (X)
  - 적당히 32bit의 prefix에 해당하는 n비트만을 사용 (O)
    - n비트 이용 시 : 총 $2^n$개의 bucket
    - 이 prefix bits들과 bucket들의 주소를 연결해주는 bucket address table을 함께 만들게됨
lookup(search), insertion, deletion은 모두 간단하게 구현
요약 :
- (1) Search key를 hash function의 input으로 넣음
- (2) output 값의 앞 n비트만을 마치 hash value인 것처럼 사용
문제 : bucket overflow 시?
- 간단하게 bucket을 추가로 만들고, 사용하는 prefix 비트 수를 n 대신 n+1로

Example

https://www.youtube.com/watch?v=Bxfo2LwOIj4 참고하기

Extendable Hashing ( vs. Other Schemes )

Pros

Hash performance does not degrade with growth of file
Minimal space overhead

Cons

Extra level of indirection to find desired record
Bucket address table may itself become very big
Changing size of bucket address table is an expensive operation

(3) Comparison of Ordered Indexing and Hashing

(1) conditioning “=” $\rightarrow$ Hashing

(2) conditioning “>”, “<” $\rightarrow$ Ordered Indexing

(4) Multiple-Key Access

ex)

SELECT	ID
FROM	instructor
WHERE	dept_name = “Finance” and  salary = 80000

Possible strategies ( using indices on single attributes )

Use index on dept_name to find all instructors in the Finance department & test salary = 80000
Use index on salary to find all instructors with salary of $$80000 & test dept_name = “Finance”
Use dept_name index to find pointers to all records pertaining to the Finance department.

Similarly use index on salary.

Take intersection of both sets of pointers obtained

index on combined search-key (dept_name, salary)

fetch only records that satisfy both conditions
Using separate indices is less efficient
- may fetch many records that satisfy only one of the conditions
Efficient ex ) where dept_name = “Finance” and salary < 80000
Inefficient ex ) where dept_name < “Finance” and salary = 80000

(5) Bitmap Index

( Bitmap = an array of bits )

special type of index designed for efficient querying on multiple keys
records in a relation are assumed to be numbered sequentially
Applicable on attributes that take on a relatively small number of distinct values
- ex) gender, country, state, …
Given a number n it must be easy to retrieve record n

simplest form

a bitmap index on an attribute has a bitmap for each value of the attribute

Bitmap has as many bits as records ( = nunique )
1 if the record has the value v for the attribute, and is 0 o.w

Details

Queries are answered using bitmap operations

AND, OR, NOT

Each operation takes two bitmaps of the same size

ex) Males with income level L1
- 10010 AND 10100 = 10000

Twitter Facebook LinkedIn

(Chapter 11) Indexing and Hashing (2)

Seunghan Lee

Chapter 10. Indexing and Hashing (2)

(1) Static Hashing

Hash Organization

Hash Indices

Deficiencies of Static Hashing

Blog Summary

(2) Dynamic Hashing

Blog Summary

Example

Extendable Hashing ( vs. Other Schemes )

(3) Comparison of Ordered Indexing and Hashing

(4) Multiple-Key Access

index on combined search-key (dept_name, salary)

(5) Bitmap Index

simplest form

Details

You May Also Enjoy