




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
1、Chapter 13: Query ProcessingChapter 13: Query ProcessingOverview Measures of Query CostSelection Operation Sorting Join Operation Other OperationsEvaluation of ExpressionsBasic Steps in Query Processing1.Parsing and translation 語法分析和翻譯2.Optimization3.Evaluation 求值Basic Steps in Query Processing (Con
2、t.)1. Parsing and translationtranslate the query into its internal form. This is then translated into relational algebra.Parser checks syntax, verifies relations3. EvaluationThe query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.Basic Steps
3、 in Query Processing : 2.OptimizationA relational algebra expression may have many equivalent expressionsE.g., balance2500(balance(account) = balance(balance2500(account)Each relational algebra operation can be evaluated using one of several different algorithmsCorrespondingly, a relational-algebra
4、expression can be evaluated in many ways. Annotated expression (帶注釋的表達(dá)式)specifying detailed evaluation strategy is called an evaluation-plan. 求值計(jì)劃E.g., can use an index on balance to find accounts with balance v; do not use indexA7 (secondary index, comparison). For A V(r) use index to find first in
5、dex entry v and scan index sequentially from there, to find pointers to records.For AV (r) just scan leaf pages of index finding pointers to records, till first entry v 直接訪問B+樹的葉子層In either case, retrieve records that are pointed torequires an I/O for each record Linear file scan may be cheaperImple
6、mentation of Complex SelectionsConjunction: 1 2. . . n(r) A8 (conjunctive selection using one index). Select a combination of i and algorithms A1 through A7 that results in the least cost for i (r). Test other conditions on tuple after fetching it into memory buffer.A9 (conjunctive selection using m
7、ultiple-key index). Use appropriate composite (multiple-key) index if available.A10 (conjunctive selection by intersection of identifiers). Requires indices with record pointers. Use corresponding index for each condition, and take intersection of all the obtained sets of record pointers. Then fetch
8、 records from fileAlgorithms for Complex SelectionsDisjunction:1 2 . . . n (r). A11 (disjunctive selection by union of identifiers). if all conditions have available indices, Use corresponding index for each condition, and take union of all the obtained sets of record pointers. Then fetch records fr
9、om fileOtherwise use linear scan.Negation: (r)Use linear scan on fileIf very few records satisfy , and an index is applicable to Find satisfying records using index and fetch from fileSortingWe may build an index on the relation, and then use the index to read the relation in sorted order. 邏輯有序,物理無序
10、 May lead to one disk block access for each tuple. To make the relation physically sorted, we employ technique external sort-merge. External Sort-MergeCreate initial sorted runs 初始?xì)w并段. Let i be 0 initially. Repeatedly do the following till the end of the relation: (a) Read M blocks of relation into
11、memory (b) Sort the in-memory blocks (c) Write sorted data to run Ri; increment i.Let the final value of i be NMerge the runs (next slide).Let M denote memory size (in pages). External Sort-Merge (Cont.)Merge the runs (N-way merge). We assume (for now) that N M. Use N blocks of memory to buffer inpu
12、t runs, and 1 block to buffer output. Read the first block of each run into its buffer pagerepeatSelect the first record (in sort order) among all buffer pagesWrite the record to the output buffer. If the output buffer is full write it to disk.Delete the record from its input buffer page.If the buff
13、er page becomes empty then read the next block (if any) of the run into the buffer. until all input buffer pages are empty:External Sort-Merge (Cont.)If N M, several merge passes are required.In each pass, contiguous groups of M - 1 runs are merged. A pass reduces the number of runs by a factor of M
14、 -1, and creates runs longer by the same factor. E.g. If M=11, and there are 90 runs, one pass reduces the number of runs to 9, each 10 times the size of the initial runsRepeated passes are performed till all runs have been merged into one.Example: External Sorting Using Sort-MergeExternal Merge Sor
15、t (Cont.)Cost analysis:Initial stage (run generation): read br blocks, and write br blocks number of block transfers= 2 br merge stage: Total number of merge passes required: 歸并樹的高度= logM1(br/M) in each pass read br blocks, and write br blocks, except the final pass, we dont count write cost (the ou
16、tput of an operation may be sent to the parent operation without being written to disk) number of block transfers= br (2 logM1(br / M) -1) total number of block transfers for external sorting: 2 br + br (2 logM1(br / M) -1) External Merge Sort (Cont.)Cost of seeksDuring Initial stage : one seek to r
17、ead each run and one seek to write each run 2 br / MDuring the merge phaseread/write bb blocks at a time (內(nèi)存工作塊大小)Need 2 br / bb seeks for each merge pass, 因?yàn)橐惶俗x寫r各一次 except the final one which does not require a writeTotal number of seeks: 2 br / M + br / bb (2 logM1(br / M) -1)Join OperationSevera
18、l different algorithms to implement joinsNested-loop joinBlock nested-loop joinIndexed nested-loop joinMerge-joinHash-joinChoice based on cost estimateExamples use the following informationNumber of records of customer: 10,000 depositor: 5000Number of blocks of customer: 400 depositor: 100Nested-Loo
19、p Join嵌套循環(huán)連接To compute the theta join r sfor each tuple tr in r do beginfor each tuple ts in s do begintest pair (tr,ts) to see if they satisfy the join condition if they do, add tr ts to the result.endendr is called the outer relation and s the inner relation of the join.Requires no indices and can
20、 be used with any kind of join condition.Expensive since it examines every pair of tuples in the two relations. Nested-Loop Join (Cont.)In the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is nr bs (遍歷s, nr次)+ br (遍歷r一次)block transfers, plus nr + b
21、r seeksIf the smaller relation fits entirely in memory, use that as the inner relation. Reduces cost to br + bs block transfers and 2 seeksAssuming worst case memory availability cost estimate iswith depositor as outer relation:5000 400 + 100 = 2,000,100 block transfers,5000 + 100 = 5100 seeks with
22、customer as the outer relation 10000 100 + 400 = 1,000,400 block transfers and 10,400 seeksIf smaller relation (depositor) fits entirely in memory, the cost estimate will be 500 block transfers. Block nested-loops algorithm (next slide) is preferable.Block Nested-Loop JoinVariant of nested-loop join
23、 in which every block of inner relation is paired with every block of outer relation.for each block Br of r do beginfor each block Bs of s do beginfor each tuple tr in Br do beginfor each tuple ts in Bs do beginCheck if (tr,ts) satisfy the join condition if they do, add tr ts to the result.endendend
24、endBlock Nested-Loop Join (Cont.)Worst case estimate: br bs + br block transfers + 2 * br seeksEach block in the inner relation s is read once for each block in the outer relation (instead of once for each tuple in the outer relationBest case: br + bs block transfers + 2 seeks. 與前者相同Improvements to
25、nested loop and block nested loop algorithms:In block nested-loop, use M - 2 disk blocks as blocking unit for outer relations, where M = memory size in blocks; use remaining two blocks to buffer inner relation and output Cost = br / (M-2) bs + br block transfers + 2 br / (M-2) seeks可行性非常強(qiáng)的改進(jìn)If equi-
26、join attribute forms a key or inner relation, stop inner loop on first matchScan inner loop forward and backward alternately, to make use of the blocks remaining in buffer (with LRU replacement)Use index on inner relation if available (next slide)Indexed Nested-Loop JoinIndex lookups can replace fil
27、e scans ifjoin is an equi-join or natural join andan index is available on the inner relations join attributeCan construct an index just to compute a join.For each tuple tr in the outer relation r, use the index to look up tuples in s that satisfy the join condition with tuple tr.Worst case: buffer
28、has space for only one page of r, and, for each tuple in r, we perform an index lookup on s.Cost of the join: br (tT + tS) + nr cWhere c is the cost of traversing index and fetching all matching s tuples for one tuple or rc can be estimated as cost of a single selection on s using the join condition
29、.If indices are available on join attributes of both r and s,use the relation with fewer tuples as the outer relation.Example of Nested-Loop Join CostsCompute depositor customer, with depositor as the outer relation.Examples use the following informationNumber of records of customer: 10,000 deposito
30、r: 5000Number of blocks of customer: 400 depositor: 100Cost of block nested loops join 公式的直接應(yīng)用 400*100 + 100 = 40,100 block transfers + 2 * 100 = 200 seeks assuming worst case memory may be significantly less with more memory Cost of indexed nested loops join公式的直接應(yīng)用 100 + 5000 * 5 = 25,100 block tra
31、nsfers and seeks.CPU cost likely to be less than that for block nested loops join Merge-Join 簡單講解Sort both relations on their join attribute (if not already sorted on the join attributes).Merge the sorted relations to join themJoin step is similar to the merge stage of the sort-merge algorithm. Main
32、 difference is handling of duplicate values in join attribute every pair with same value on join attribute must be matchedDetailed algorithm in bookMerge-Join (Cont.)Can be used only for equi-joins and natural joins每個(gè)關(guān)系被讀取一次, Each block needs to be read only once (assuming all tuples for any given v
33、alue of the join attributes fit in memory)設(shè)DBS為每個(gè)關(guān)系開辟一個(gè)大小為bb的緩沖區(qū) , the cost of merge join is: br + bs block transfers + br / bb + bs / bb seeks+ the cost of sorting if relations are unsorted.hybrid merge-join: If one relation r is sorted, and the other s has a secondary B+-tree index on the join att
34、ributeMerge the sorted relation with the leaf entries of the B+-tree, 將s的索引與r進(jìn)行歸并 . Sort the result on the addresses of the unsorted relations tuplesScan the unsorted relation in physical address order and merge with previous result, to replace addresses by the actual tuplesSequential scan more effi
35、cient than random lookupHash-JoinApplicable for equi-joins and natural joins.A hash function h is used to partition tuples of both relations h maps JoinAttrs 公共屬性 values to 0, 1, ., n, where JoinAttrs denotes the common attributes of r and s used in the natural join. r0, r1, . . ., rn denote partiti
36、ons of r tuplesEach tuple tr r is put in partition ri where i = h(tr JoinAttrs).s0, s1. . ., sn denotes partitions of s tuplesEach tuple ts s is put in partition si, where i = h(ts JoinAttrs).Note: In book, ri is denoted as Hri, si is denoted as Hsi and n is denoted as nh. Hash-Join (Cont.)Hash-Join
37、 (Cont.)r tuples in ri need only to be compared with s tuples in si Need not be compared with s tuples in any other partition, since:an r tuple and an s tuple that satisfy the join condition will have the same value for the join attributes.If that value is hashed to some value i, the r tuple has to
38、be in ri and the s tuple in si.Hash-Join Algorithm1.Partition the relation s using hashing function h. When partitioning a relation, one block of memory is reserved as the output buffer for each partition.2.Partition r similarly.3.For each i:(a)Load si into memory and build an in-memory hash index o
39、n it using the join attribute. This hash index uses a different hash function than the earlier one h.(b)Read the tuples in ri from the disk one by one. For each tuple tr locate each matching tuple ts in si using the in-memory hash index. Output the concatenation of their attributes.The hash-join of
40、r and s is computed as follows.Relation s is called the build input 建立輸入 and r is called the probe input 探索輸入.Hash-Join algorithm (Cont.) 劃分的要點(diǎn): 1. 每個(gè)劃分的大小c, 內(nèi)存M, 有cM 成立 2. 關(guān)系s的大小bs, 劃分?jǐn)?shù)目n, 有 c *n M*nbs 成立1. 當(dāng)Mn+1時(shí) , Rarely required: e.g., recursive partitioning not needed for relations of 1GB or le
41、ss with memory size of 2MB, with block size of 4KB. Handling of Overflows 簡單了解The value n and the hash function h is chosen such that each si should fit in memory.Typically n is chosen as bs/M * f where f is a “fudge factor”, 避讓因子 typically around 1.2The probe relation partitions si need not fit in
42、memory Handling of Overflows 簡單了解Partitioning is said to be skewed 偏斜 if some partitions have significantly more tuples than some othersHash-table overflow occurs in partition si if si does not fit in memory. Reasons could beMany tuples in s with same value for join attributesBad hash functionOverfl
43、ow resolution can be done in build phasePartition si is further partitioned using different hash function. Partition ri must be similarly partitioned.Overflow avoidance performs partitioning carefully to avoid overflows during build phaseE.g. partition build relation into many partitions, then combi
44、ne themBoth approaches fail with large numbers of duplicatesFallback option: use block nested loops join on overflowed partitionsCost of Hash-JoinIf recursive partitioning is not required: cost of hash join is 1. 劃分階段 : 設(shè) bb =內(nèi)存工作塊大小, 2(br + bs) block transfers 2( br / bb + bs / bb) seeks 2.建立和探測(cè)階段:
45、 Complex JoinsJoin with a conjunctive condition:r 1 2. n sJoin with a disjunctive condition r 1 2 . n s Other Operations 大致了解Duplicate elimination can be implemented via hashing or sorting.On sorting duplicates will come adjacent to each other, and all but one set of duplicates can be deleted. Optim
46、ization: duplicates can be deleted during run generation as well as at intermediate merge steps in external sort-merge.Hashing is similar duplicates will come into the same bucket.Projection:perform projection on each tuple followed by duplicate elimination. Other Operations : Aggregation大致了解Aggrega
47、tion can be implemented in a manner similar to duplicate elimination.Sorting or hashing can be used to bring tuples in the same group together, and then the aggregate functions can be applied on each group. Optimization: combine tuples in the same group during run generation and intermediate merges,
48、 by computing partial aggregate valuesFor count, min, max, sum: keep aggregate values on tuples found so far in the group. When combining partial aggregate for count, add up the aggregatesFor avg, keep sum and count, and divide sum by count at the endOther Operations : Set Operations大致了解Set operatio
49、ns (, and ): can either use variant of merge-join after sorting, or variant of hash-join.E.g., Set operations using hashing:Partition both relations using the same hash functionProcess each partition i as follows. Using a different hashing function, build an in-memory hash index on ri.Process si as
50、followsr s: Add tuples in si to the hash index if they are not already in it. At end of si add the tuples in the hash index to the result.r s: output tuples in si to the result if they are already there in the hash index r s: for each tuple in si, if it is there in the hash index, delete it from the
51、 index. At end of si add remaining tuples in the hash index to the result. Other Operations : Outer Join大致了解Outer join can be computed either as A join followed by addition of null-padded non-participating tuples.by modifying the join algorithms.Modifying merge join to compute r sIn r s, non partici
52、pating tuples are those in r R(r s)Modify merge-join to compute r s: During merging, for every tuple tr from r that do not match any tuple in s, output tr padded with nulls.Right outer-join and full outer-join can be computed similarly.Modifying hash join to compute r sIf r is probe relation, output
53、 non-matching r tuples padded with nullsIf r is build relation, when probing keep track of which r tuples matched s tuples. At end of si output non-matched r tuples padded with nulls Evaluation of Expressions表達(dá)式求值So far: we have seen algorithms for individual operationsAlternatives for evaluating an
54、 entire expression treeMaterialization實(shí)體化: generate results of an expression whose inputs are relations or are already computed, materialize (store) it on disk Pipelining流水線: pass on tuples to parent operations even as an operation is being executedWe study above alternatives in more detail方法1: Mate
55、rializationMaterialized evaluation: evaluate one operation at a time, starting at the lowest-level. Use intermediate results materialized into temporary relations to evaluate next-level operations.E.g., in figure below, compute and storethen compute the store its join with customer, and finally comp
56、ute the projections on customer-name. Materialization (Cont.)Materialized evaluation is always applicableCost of writing results to disk and reading them back can be quite highOur cost formulas for operations ignore cost of writing results to disk, soOverall cost = Sum of costs of individual operati
57、ons + cost of writing intermediate results to diskDouble buffering: use two output buffers for each operation, when one is full write it to disk while the other is getting filledAllows overlap of disk writes with computation and reduces execution time方法2: Pipelining 流水線Pipelined evaluation : evaluat
58、e several operations simultaneously, passing the results of one operation on to the next.E.g., dont store result of join, pass tuples directly to projection. Much cheaper than materialization: no need to store a temporary relation to disk.Pipelining may not always be possible e.g., sort, hash-join.
59、Pipelines can be executed in two ways: demand driven 需求驅(qū)動(dòng) and producer driven 生產(chǎn)驅(qū)動(dòng)Pipelining (Cont.)In demand driven or lazy evaluation 需求從上向下傳遞system repeatedly requests next tuple from top level operationEach operation requests next tuple from children operations as required, in order to output it
60、s next tupleIn between calls, operation has to maintain “state” so it knows what to return nextIn producer-driven or eager pipelining 產(chǎn)品從下向上推送Operators produce tuples eagerly and pass them up to their parentsBuffer maintained between operators, child puts tuples in buffer, parent removes tuples from
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 理賠業(yè)務(wù)風(fēng)險(xiǎn)培訓(xùn)效果評(píng)估靈活性風(fēng)險(xiǎn)基礎(chǔ)知識(shí)點(diǎn)歸納
- 構(gòu)建思政課程教學(xué)評(píng)價(jià)體系的現(xiàn)狀及總體形勢(shì)
- 色彩氛圍營造基礎(chǔ)知識(shí)點(diǎn)歸納
- 構(gòu)建出版業(yè)融合發(fā)展的現(xiàn)狀及總體形勢(shì)
- 跨界合作在健美操創(chuàng)新中的應(yīng)用
- 抽水蓄能產(chǎn)業(yè)創(chuàng)新驅(qū)動(dòng)的核心路徑
- 2025不銹鋼型材采購協(xié)議合同
- 智聯(lián)車市場(chǎng)策略
- 答辯成功實(shí)戰(zhàn)手冊(cè)
- 研究生之路全攻略
- 揚(yáng)塵污染控制工作臺(tái)帳(揚(yáng)塵防治全套資料)
- 2021年英語專業(yè)四級(jí)TEM4考試真題和答案
- 各科室臨床路徑(衛(wèi)生部)2022
- 學(xué)習(xí)宣傳貫徹反有組織犯罪法工作經(jīng)驗(yàn)材料
- 大學(xué)生德育論文范文3000字
- 美術(shù)作品使用授權(quán)書.docx
- 金屬軋制工藝學(xué)1軋制過程基本參數(shù)
- 低壓電纜頭制作安裝施工工藝標(biāo)準(zhǔn)
- 初中英語語法講解PPT課件(共210頁)
- 排骨架檢驗(yàn)標(biāo)準(zhǔn)_圖文
- 工程變更申請(qǐng)表(ECR)
評(píng)論
0/150
提交評(píng)論