在優化器轉換系列博客的第一部分,我們討論了子查詢展開。我們解釋了子查詢如何被評估,以及EXISTS和ANY子查詢的展開。
這是該系列的第二篇,我們將會討論NOT EXISTS和出現在WHERE子句中的單行子查詢的展開。我要再一次感謝Rafi ---- 優化器的高級開發人員之一,他為本博文提供了內容。
快速回顧子查詢展開的意義
子查詢展開是一種優化,它轉換子查詢為外連接查詢,從而允許優化器在訪問路徑,連接方法和連接順序的選擇時考慮子查詢表。展開要么合并子查詢為外部查詢,要么轉換為內聯視圖(譯者注:出現在FROM子句中的子查詢)
NOT EXISTS 子查詢
反連接總是用來展開NOT EXISTS和ALL子查詢。我們使用如下的非標準語法來表示反連接:T1.x A= T2.y,T1是反連接中的左表,T2是右表。反連接的語義是:只要T1.x在T2.y中的任一個值中找到匹配,則拒絕T1表中的這一行。只有當T1.x未匹配到T2.y中的任一個值時,才會返回T1表中的這一行。
考慮查詢F,它是一個包含兩個表的NOT EXISTS子查詢。
F.
SELECT C.cust_last_name, C.country_id
FROM customers C
WHERE NOT EXISTS (SELECT 1
FROM sales S, products P
WHERE P.prod_id = S.prod_id and
P.prod_min_price > 90 and
S.cust_id = C.cust_id);
F中的子查詢可以使用反連接展開;然而,子查詢中的表是內連接,SALES和PRODUCTS表必須在反連接前被執行。為了強制連接的順序,產生了內聯視圖。該展開產生了查詢G;這時,內聯視圖成為了反連接中的右表。
G.
SELECT C.cust_last_name, C.country_id
FROM customers C,
(SELECT S.cust_id AS s_cust_id
FROM sales S, products P
WHERE P.prod_id = S.prod_id and P.prod_min_price > 90) VW
WHERE C.cust_id A= VW.s_cust_id;
G的執行計劃如XG所示。在三種連接方法中(即,嵌套循環連接,HASH連接和排序合并連接),HASH連接被優化器選中來完成反連接。
XG.
explain_plan_for_G.png

單行聚合子查詢
考慮查詢H,它包含一個聚合的,相關的和單行的子查詢。
H.
SELECT C.cust_last_name, C.cust_income_level
FROM customers C
WHERE C.cust_credit_limit < (SELECT SUM(S.amount_sold)
FROM sales S
WHERE S.cust_id = C.cust_id);
對多個等值相關的謂詞做聚合,類似于在子查詢列上做GROUP-BY的聚合,然后與外層表在GROUP-BY列上做連接。H中的子查詢是以消除相關并轉換為GROUP-BY視圖的方式來展開的,其與外層表CUSTOMER是內連接。相關條件和連接條件都被轉換為了連接謂詞。該轉換如I所示:
I.
SELECT C.cust_last_name, C.cust_income_level
FROM customers C,
SELECT SUM(S.amount_sold) AS sum_amt, S.cust_id
FROM sales S
GROUP BY S.cust_id) VW
WHERE C.cust_credit_limit < VW.sum_amt and
C.cust_id = VW.cust_id;
I的執行計劃如XI所示:
XI.
explain_plan_for_I.png

展開的有效性
每一個子查詢,在被展開前,要通過一系列的有效性檢查。優化器決定展開或者決定不展開一個子查詢,可以通過指定適當的HINT來跳過,但是有效性不可以被跳過。因為這將不能保證展開在語義上是等價的。
下面,我們列舉了一些目前會使子查詢展開無效的檢查。注意,這個檢查清單并不代表全部。
- 與非父表發生相關的子查詢;例如,子查詢SQ3被SQ2所包含(SQ2是SQ3的父親),SQ2又被SQ1所包含,而SQ3與SQ1中涉及的列是相關的。
- GROUP-BY子查詢是相關的;這種情況下,展開隱含是在GROUP-BY之后做連接,但特定兩個操作次序的改變,并不總是對的。
- 連接和相關條件格式不佳(例如,它們的本地列與相關列出現在謂詞的同一側)并且子查詢需要生成內聯視圖,當這樣的謂詞出現時,是不允許分離視圖列和外層表列的。
- 對于反意連接(譯者注:即OR邏輯)子查詢,出現在連接條件或相關條件的外層表列是不同的。
使用視圖合并轉換,Oracle可以合并展開時生成的Group-by或distinct內聯視圖,因此,當預期會有一個視圖出現時,但在執行計劃中可能卻并沒有出現任何視圖。
總結
在這些博文中,我們嘗試通過簡單的示例查詢,來演示不同類型子查詢展開背后的基礎理論。Oracle可以處理更復雜的查詢–在一層或多層中有多個子查詢的語句,有多個表,相關或連接條件包含非等謂詞和表達式,子查詢包含集合操作符,帶有GROUP-BY和COUNT聚合函數的子查詢,ALL子查詢在連接條件列上含有可為空的列,以及反意連接子查詢。
如果子查詢的展開,并不需要生成內聯視圖,那么子查詢總是展開的。因為這種展開通過允許更多的連接方法和連接次序,從而提供更有效率的執行計劃。如果子查詢中相關謂詞中的本地列(例如查詢A中的S.cust_id)上有索引,那么子查詢會評估為類似于基于索引的嵌套循環連接。因此,在某些情況下,不展開也許會比生成內聯視圖,僅允許與外層表做排序合并和HASH連接的展開更有效率。故而,Oracle中,對于生成內聯視圖的子查詢展開,是在基于成本的查詢轉換框架下,基于成本進行的。
再次感謝Rafi在本博文中的貢獻。請關注我們在該系列中的下一篇博文,是有關視圖合并的。
原文鏈接:https://blogs.oracle.com/optimizer/post/optimizer-transformations-subquery-unesting-part-2
原文內容:
Optimizer Transformations: Subquery Unesting part 2
January 1, 2020 | 4 minute read
Maria Colgan
Distinguished Product Manager
In Part one of our blog series on Optimizer transformations we discussed Subquery unesting. We explained how subqueries are evaluated, and the unnesting of EXISTS and ANY subqueries.
Here in part two of our series we will discuss the unnesting of NOT EXISTS and single-row subqueries, which appear in the WHERE clause. Once again I need to give credit to Rafi one of the senior Optimizer developers for the content of this blog!
Quick recap on what we mean by Subquery Unesting
Subquery unnesting is an optimization that converts a subquery into a join in the outer query and allows the optimizer to consider subquery table(s) during access path, join method, and join order selection. Unnesting either merges the subquery into the body of the outer query or turns it into an inline view.
NOT EXISTS Subqueries
Anti-join is always used for unnesting NOT EXISTS and ALL subqueries. We represent anti-join by the following non-standard syntax: T1.x A= T2.y, where T1 is the left table and T2 is the right table of the anti-join. The semantics of anti-join is the following: A row of T1 is rejected as soon as T1.x finds a match with any value of T2.y. A row of T1 is returned, only if T1.x does not match with any value of T2.y.
Consider query F, which has a NOT EXISTS subquery containing two tables.
F.
SELECT C.cust_last_name, C.country_id
FROM customers C
WHERE NOT EXISTS (SELECT 1
FROM sales S, products P
WHERE P.prod_id = S.prod_id and
P.prod_min_price > 90 and
S.cust_id = C.cust_id);
The subquery in F can be unnested by using an anti-join; however, the inner join of the tables in the subquery, sales and products must take place before the anti-join is performed. An inline view is generated in order to enforce the join order. This unnesting produces query G; here the inline view becomes the right table of anti-join.
G.
SELECT C.cust_last_name, C.country_id
FROM customers C,
(SELECT S.cust_id AS s_cust_id
FROM sales S, products P
WHERE P.prod_id = S.prod_id and P.prod_min_price > 90) VW
WHERE C.cust_id A= VW.s_cust_id;
The execution plan of G is shown as XG. Of the three join methods (i.e., nested-loop, hash and sort-merge), the hash method was selected by the optimizer to do the anti-join.
XG.
explain_plan_for_G.png

Single-Row Aggregated Subqueries
Consider query H that contains an aggregated, correlated, single-row subquery.
H.
SELECT C.cust_last_name, C.cust_income_level
FROM customers C
WHERE C.cust_credit_limit < (SELECT SUM(S.amount_sold)
FROM sales S
WHERE S.cust_id = C.cust_id);
Doing aggregation for multiple values of equality correlation predicate is like doing aggregation and group-by on the local column, and then joining the view with the outer table on the group-by columns. The subquery in H is unnested by decorrelating it and converting it into a group-by view, which is inner joined with the outer table, customer; here both the correlating and connecting conditions have been turned into join predicates. The transformed query is shown as I.
I.
SELECT C.cust_last_name, C.cust_income_level
FROM customers C,
SELECT SUM(S.amount_sold) AS sum_amt, S.cust_id
FROM sales S
GROUP BY S.cust_id) VW
WHERE C.cust_credit_limit < VW.sum_amt and
C.cust_id = VW.cust_id;
XI shows the execution plan of I.
XI.
explain_plan_for_I.png

Validity of Unnesting
Every subquery, before it can be unnested, goes through a set of validity checks. The optimizer decisions to unnest or not to unnest a subquery can be overridden by specifying an appropriate hint, but the validity requirements cannot be, since unnesting in such cases would not guarantee a semantically equivalent query.
In the following, we enumerate some important checks that currently invalidate subquery unnesting. Note that this list of checks is by no means exhaustive.
-
Subqueries that are correlated to non-parent; for example, subquery SQ3 is contained by SQ2 (parent of SQ3) and SQ2 in turn is contained by SQ1 and SQ3 is correlated to tables defined in SQ1.
-
A group-by subquery is correlated; in this case, unnesting implies doing join after group-by. Changing the given order of the two operations may not be always legal.
-
Connecting or correlating conditions are not well-formed (e.g., they contains a mix of local and correlated columns on either side of the predicate) and the subquery requires inline view generation, as predicates of this kind do not allow separating out view columns and outer table columns.
-
For disjunctive subqueries, the outer columns in the connecting or correlating conditions are not the same.
Using view-merging transformation, Oracle may merge the group-by or distinct inline view generated during unnesting, and therefore the execution plan may not show any view even when a view is expected.
Summary
In these posts we have tried to illustrate the basic ideas behind unnesting of different types of subquery by showing simple example queries. Oracle can handle far more complex queries - query statements with multiple subqueries at one or more levels, multiple tables, correlated and connecting conditions containing inequality predicates and expressions, subqueries that contain set operators, subqueries with group-by and COUNT aggregate function, ALL subqueries containing nullable columns in its connecting condition, and subqueries in disjunction.
If unnesting of a subquery does not require generation of an inline view, then the subquery is always unnested, because this unnesting provides a more efficient execution plan by allowing more join methods and join orders. If the local column (e.g., S.cust_id in A) in the correlating predicate of the subquery has an index on it, then the subquery evaluation becomes akin to doing index-based nested-loop join; and thus, in some cases, not unnesting may be more efficient than unnesting that generates an inline view and enables only sort-merge and hash join of the view with outer tables. Therefore, in Oracle, subquery unnesting which generates inline views, is done based on cost under the cost-based query transformation framework.
Once again many thanks to Rafi for all his work on this blog post. Watch out for our next post in this series on view merging.




