數(shù)據(jù)庫管理系統(tǒng)概述英文版課件:9 Introduction to Indexing and Hash Index_第1頁
數(shù)據(jù)庫管理系統(tǒng)概述英文版課件:9 Introduction to Indexing and Hash Index_第2頁
數(shù)據(jù)庫管理系統(tǒng)概述英文版課件:9 Introduction to Indexing and Hash Index_第3頁
數(shù)據(jù)庫管理系統(tǒng)概述英文版課件:9 Introduction to Indexing and Hash Index_第4頁
數(shù)據(jù)庫管理系統(tǒng)概述英文版課件:9 Introduction to Indexing and Hash Index_第5頁
已閱讀5頁,還剩54頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、COMP2311COMP231File and Index StructureCOMP2312OutlineDisks and FilesIndexes and DatabasesHash FilesCOMP2313Disks and FilesDBMS stores information on (“hard”) disks.A disk is a sequence of bytes, each has a disk address.READ: transfer data from disk to main memory (RAM).WRITE: transfer data from RAM

2、 to disk.Data are stored and retrieved in units called disk blocks or pages.Each page has a fixed size, say 512 bytes. It contains a sequence of records.In many (not always) cases, records in a page have the same size, say 100 bytes.COMP2314Why Not Store Everything in Main Memory?Costs too much. RAM

3、 is much more expensive than disk.Main memory is volatile. We want data to be saved between runs. Typical storage hierarchy:Main memory (RAM) for currently used data.Disk for the main database (secondary storage).Tapes for archiving older versions of the data (tertiary storage).COMP2315Components of

4、 a DiskPlattersSpindleDisk headArm movementArm assemblyTracksSectorBlock size is a multiple of sector size (which is fixed).COMP2316File OrganizationDatabase is stored as a collection of files. Each file is a sequence of records. A record is a sequence of fields.one file one tableone record one tupl

5、ea record/tuple has a fixed lengthEasy to implement but limited by the file systemCOMP2317Fixed-Length RecordsSimple approach:store record i starting from byte n(i - 1), where n is the size of each record.Record access is simple but records may cross disk blocks.When record i is deleted, how do you

6、handle the released space?Shifting records:move records i+1,n to i, n-1move record n to ilink all free records on a free listiCOMP2318Sequential File OrganizationSuitable for application that require sequential processing of the entire file The records in the file are ordered by a search-keyCOMP2319

7、Sequential File Organization (cont.)Deletion use pointer chainsInsertion must locate the position in the file where the record is to be recordif there is free space insert thereif no free space, insert the record in an overflow blockIn either case, pointer chain must be updatedNeed to reorganize the

8、 file from time to time to restore sequential orderCOMP23110Clustering File OrganizationGood for queries involving depositor customer, and for queries involving one single customer and his accountsBad for queries involving only customerResult in variable size recordsSimple file structure stores each

9、 relation in a separate file can instead store several relations in one file using a clustering file organizationE.g., clustering organization of customer and depositor.1 clustercustomerdepositorCOMP23111OutlineDisks and FilesIndexes and DatabasesHash FilesHierarchy structure12Indexes and DatabasesA

10、 table(conceptual)文件綱要Index 1(Ordered indices, B-tree, hash)Index 2(Ordered indices, B-tree, hash)Records are physicallystored in a hash tablebased on a selected keyCOMP23113Basic ConceptsIndexing mechanisms speed up access to desired dataE.g. If you know the call number of a book, you can go direct

11、ly to the book shelve in the library; otherwise you have to search through all the book shelvesSearch key attributes used to look up records in a file.An index file consists of records (called index entries) of the format:關鍵字 物理內存COMP23114Basic ConceptsIndex files are typically much smaller than the

12、 original file Two basic kinds of indices:Ordered indices: search keys are stored in sorted orderHash indices: search keys are distributed uniformly across “buckets” using a “hash function”.COMP23115Index Evaluation MetricsHow are indexing techniques evaluated?What access operations are supported ef

13、ficiently? E.g.,records with a specified value in an attribute (equality queries)or records with an attribute value falling in a specified range of values (range queries)Access timeInsertion timeDeletion timeSpace overhead (size of indexes / size of data records)COMP23116Ordered IndicesIn an ordered

14、 index, index entries are stored based on the search key value. E.g., author catalog in library.Indexes are mostly ordered indexes except those based on hash filesCOMP23117Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.Also called

15、 clustering indexThe search key of a primary index is usually but not necessarily the primary key.Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index.COMP23118Index Structure Design: Dense Index FilesEvery se

16、arch-key value in the data file is indexed.Data recordsDense indexCOMP23119Sparse Index FilesNot all of the search-key values are indexedAdv: reduce index size Disadv: slowerRecords in table are sorted by primary key valuesSearch is done in two steps: (i) find the largest possible key (ii) search th

17、e record forwardCOMP23120Example of Sparse Index FilesTo locate a record with search-key value K we:find index record with largest search-key value Ksearch file sequentially starting at the record to which index record pointsLess space and less maintenance overhead for insertion and deletions.Genera

18、lly slower than dense index for locating records.Good Tradeoff: sparse index with an index entry for every block in file. The disk must read a block of data into main memory anyway, so searching within the block cost little.COMP23121Multilevel IndexIf primary index does not fit in memory, access bec

19、omes expensive. (Why?)To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and construct a sparse index on it.Outer index a sparse index of primary indexInner index the primary index fileIf even outer index is too large to fit in main memory, yet

20、another level of index can be created, and so on (multi-level indices).Indices at all levels must be updated on insertion or deletion from the file.COMP23122Multilevel Index (Cont.)IndexBlock 0IndexBlock 1Inner indexouter indexDatablock 0Datablock 1Datablock 2Datablock 3COMP23123Index Update: Deleti

21、onIf deleted record was the only record in the file with its particular search-key value, the search-key is deleted from the index also.Single-level index deletion:Dense indices similar to file record deletionCOMP23124Index Update: Deletion in Sparse IndicesSparse indices if an entry for the search

22、key exists in the index, it is deleted by replacing the entry in the index with the next search-key value in the file (in search-key order). If the next search-key value already has an index entry, the entry is deleted instead of being replaced.DowntownCOMP23125Index Update: InsertionSingle-level in

23、dex insertion:Perform a lookup using the search-key value appearing in the record to be inserted.Dense indices - if the search-key value does not appear in the index, insert it.Sparse indices - if index stores an entry for each block of the file, no change needs to be made to the index unless a new

24、block is created. In this case, the first search-key value appearing in the new block is inserted into the index.Multilevel insertion (as well as deletion) algorithms are simple extensions of the single-level algorithms.COMP23126Secondary IndicesYou can organize a file (say, into indexed sequential

25、file) based on one attribute only (e.g., emp#), but it is not adequate in practiceFrequently, one wants to find all the records whose values in a certain field (which is not the search-key of the primary index) satisfy some condition.Example 1: if the account database stored sequentially by account

26、number, we may want to find all accounts in a particular branch.Example 2: as above, but where we want to find all accounts with a specified balance or range of balances.We can have a secondary index with an index record for each search-key value: an index record points to a bucket that contains poi

27、nters to all the actual records with that particular search-key value.COMP23127Secondary Index on Balance Field of AccountNote: the file is already indexed basedon branch-nameSecondary IndexCOMP23128Primary and Secondary IndicesSecondary indices have to be dense because records are not sorted by the

28、 secondary index valuesIndices offer substantial benefits when searching for records, always select some attributes to index and re-examine the selection periodicallyWhen a file is modified, every index on the file must be updated. Updating indices imposes overhead on database modificationSequential

29、 scan using primary index is efficient, but a sequential scan using a secondary index is expensive (each record access may fetch a new block from disk.)COMP23129OutlineDisks and FilesIndexes and DatabasesHash FilesCOMP23130Hash FilesIn previous data structure course, you may learn hash table, which

30、is main-memory residentA hash method includes a hash function and a collision handling mechanismIn main memory, the unit of access is based on byte or word sizesIn disk-based hash file, the unit of access is based on disk block size (from 512 to 2048 bytes, depending on OS)COMP23131Hash FilesStatic

31、HashingDynamic HashingCOMP23132Static External HashingBucket 0Bucket iA hash file consists of M buckets of the same size: bucket0, bucket1,. bucketM-l Collisions occur when a new record hashes to a bucket that is already full. A new bucket is created and chained to the overflowed bucketTo reduce ove

32、rflow records, a hash file is typically kept 70-80% fullRecord to beinserted withkey Ki = h(K)Hashfunctionh()OverflowbucketCOMP23133Static External Hashing PropertiesThe hash function h() should distribute the records uniformly among the buckets; otherwise, search time will increase due to many over

33、flow records An ideal hash function will assign roughly the same number of records to each bucket irrespective of the actual distribution of search-key values in the fileThe number of buckets M must be fixed when the file is created; not good when the file size changes widelyAs a design criteria, yo

34、u need to determine the load factorCOMP23134Static External Hashing ExampleHash file organization of account file, using branch-name as key.An example of a hash function defined on a set of charaters in a file organization:There are 10 bucketsTake the ASCII code of each character as an integerSum up

35、 the binary representations of the characters and then applies modulo 10 to the sum to get the bucket numberCOMP23135Example of hash File OrganizationBucket 4Bucket 5Bucket 7Bucket 8COMP23136Hash IndicesHashing can be used not only for file organization, but also for index-structure creation. A hash

36、 index organizes the search keys, with their associated record pointers, into a hash file structure.Hash indices are always secondary indiceshash index(e.g., on salary)data file (e.g., hash or index sequential on emp#)COMP23137Example of hash IndexBucket 0Bucket 1Bucket 2Bucket 3Bucket 4Bucket 5Buck

37、et 6Primary indexUsed as asecondaryindexOverflowbucketHash Index on account-numberCOMP23138Hash FilesStatic HashingDynamic HashingCOMP23139Dynamic HashingGood for database that grows and shrinks in sizeAllows the hash function to be modified dynamicallyWhen the hash function takes modulo 10 in the p

38、revious example, the number of buckets is fixed to 10The trick is how to change the hash function so that the number of buckets can changeAnd at the same time, without the need of rehashing the existing records!Imagine if you change modulo 10 to modulo 13, then every existing record has to be rehash

39、ed not a good ideaCOMP23140Extendable HashingExtendable hashing - one form of dynamic hashingHashing function generates values over a large range - typically b-bit integers, with b = 32.At any time, use only a prefix of the b-bit integers to index into a table of bucket addresses. Let the length of

40、the prefix be i bits, 0 i 32Initially i = 1, meaning that it can index at most 2 bucketsWhen the 2 buckets are full, we can use 2 bits (i = 2), meaning that we can now index at most 4 buckets, and so on and so forth.i grows and shrinks as the size of the database grows and shrinks.Actual number of b

41、uckets is ij (more than one pointer to bucket j)allocate a new bucket z, and set ij and iz to the old ij+1.make the second half of the bucket address table entries pointing to j to point to zremove and reinsert each record in bucket j.recompute new bucket for Kj and insert record in the bucket (furt

42、her splitting is required if the bucket is still full)1222could be any number 1COMP23150Split in Extendable hash StructureTo split a bucket j when inserting record with search-key value Kj;If i = ij (only one pointer to bucket j)increment i and double the size of the bucket address table.Replace eac

43、h entry in the table by two entries that point to the same bucket.Re-compute new bucket address table entry for Kj, now i ij, so use the first case above. i0=2i1=2i2=2i3=2new recordCOMP23151Example: Use of Extendable Hash StructureBranch-nameh(branch-name)Brighton0010 1101 1111 1011 0010 1100 0011 0

44、000Downtown1010 0011 1010 0000 1100 0110 1001 1111Mianus1100 0111 1110 1101 1011 1111 0011 1010Perryridge1111 0001 0010 0100 1001 0011 0110 1101Redwood 0011 0101 1010 0110 1100 1001 1110 1011Round hill1101 1000 0011 1111 1001 1100 0000 0001Initial Hash structure, Bucket size=2Bucket 0hash address ta

45、ble0COMP23152Insert:0010 1101 1111 1011 0010 1100 0011 0000no bit is needed from the hash value (i=0)Brighton, A-217, 750Brighton, A-217, 7500COMP23153Insert:1010 0011 1010 0000 1100 0110 1001 1111no bit is needed from the hash value (i=0)Downtown, A-101, 500Brighton, A-217, 7500Downtown, A-101, 500Insert:bucket full, split records according to first bit (i=1)Downtown, A-101, 600COMP2315411Brighton0010 1101 1111 1011 0010 1100 0011 0000Downtown1010 0011 1010 0000 1100 0110 1001 1111Brighton, A-217, 750Downtown, A-101, 500Downtown, A-101, 600Insert:1

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論