介紹
本貼涵蓋了Oracle Database 12c Release 2中是如何為分區表改進增量統計信息的內容。如果你還不是很熟悉增量統計信息語境下的synopses,那請在閱讀前,查看(本系列的)第1和第2部分。
從Oracle Database 12c Release 2開始,采用了新的synopsis格式,其比早期版本中使用的格式有更顯著的壓縮。為了方便描述,我將稱Oracle Database 12c Release 2的格式為新格式,而之前的格式為舊格式。
如果你之前了解過增量統計信息維護,但因為SYSAUX中額外的空間使用而決定不使用的話,則這個增強是特別適合的。Oracle Database 12c Release 2解決了這個問題,并且,在多種場景下,降低了管理synopses所需的系統資源量。
一種synopsis的新類型
synopsis是存儲在一對數據字典表(SYS. WRI $ _OPTSTAT_SYNOPSIS$ 和SYS.WRI $ _OPTSTAT_SYNOPSIS_HEAD $ )中的元數據。該元數據由Oracle自動管理,所以,通常并不需要知道其底層實現。盡管如此,我還是會介紹一些細節,因為這將幫助我們理解其為何變化得如此顯著。
Oracle Database 12c Release 2中的Synopses目前使用(默認情況下)稱為HyperLogLog (HLL)的算法創建。這是計算表列值中大致唯一值數量的最先進算法。盡管它是近似值,但它仍然是非常精確的,典型的誤差率小于2%。在Oracle Database 12c Release 2之前,synopses由存儲在 WRI $ _OPTSTAT_SYNOPSIS $ 表中的行構成。如果有大量的分區和表列,并且列上包含有大量的唯一值,則表中的行數可能是非常大的。新格式的synopses不在該表中存儲行,取而代之的是部分額外的哈希數據被存儲到WRI $ _OPTSTAT_SYNOPSIS_HEAD $ 表中(使用SPARE2列)
新的synopses能小多少?好吧,正如你可能已經猜到的,顧問們的說法:“視情況而定”。如前所述,synopses使用的空間是分區數、列數和列上唯一值數量的函數。我可以給你舉一個例子:在我們的測試系統有一個8TB的測試表,有84個分區。舊格式的synopses的大小是大約160MB,而新格式的synopses只有6MB。我在Github上創建了一個示例(見下文),用于產生大量的synopsis數據。在這個示例中,新格式的synopses幾乎不占用任何空間,而老格式的synopses則占用了大約160MB(在SYSAUX中)。
如何使用
為了使用新格式的synopses,你需要做什么嗎?什么都不需要!如果你選擇使用增量統計信息,會默認使用。而且,相較于早期版本,需要你在Oracle Database 12c Release 2中做的,并沒有什么不同。
Synopses控制
你可以使用DBMS_STATS中的,名為APPROXIMATE_NDV_ALGORITHM偏好參數來控制創建的類型。
默認值是REPEAT 或 HYPERLOGLOG:如果一個表正在使用舊格式的synopses,那么它將繼續這樣使用舊格式,而如果一個表正在使用新格式的synopses,那么也將繼續使用新格式。
沒有什么理由使用默認值之外的值,除非你正在升級數據庫到Oracle Database 12c Release 2。如果是這種情況,你可能要考慮其選項。下面會談及。
升級
如果你正在升級一個使用增量統計的數據庫,那么你會希望遷移到使用新格式的Synopses。你要怎么做呢?最初需要注意的是那些同一張表中,即有舊格式,又有新格式synopses的表分區。好消息是你可以控制何時,以及如何從一種類型的synopses過渡到另一種。
有一個名為INCREMENTAL_STALENESS的DBMS_STATS的偏好參數,它控制是否允許在從舊格式過渡到新格式時,表中可以擁有不同類型synopses的分區。讓我們來看一下,在升級到Oracle Database 12c Release 2后,不同的場景下是如何處理的。從“非常保守”(即,保持舊行為)到“積極”(即,立即利用新特性)是有一系列選擇的。下表描述了從最保守的場景案例到最激進的場景案例。
| 使用場景 | 動作 |
|---|---|
| 最初,你希望對所有表繼續使用舊格式的synopses。我們建議使用新格式的,如果你愿意的話,也可以晚些再使用它們。HLL之前使用的算法稱為自適應采樣 | EXEC DBMS_STATS.SET_TABLE_PREFS(‘table_owner’,‘table-name’, ‘APPROXIMATE_NDV_ALGORITHM’, ‘ADAPTIVE SAMPLING’) |
| 你希望使用舊格式的表繼續使用舊格式,新創建的增量管理的表使用新格式的synopses。沒有synopses的增量管理的表,收集統計信息時,會使用新格式的synopses,而使用老格式的增量管理的表,則會繼續使用它。 | 無須操作。這是缺省的行為。 APPROXIMATE_NDV_ALGORYTHM 的缺省值,是REPEAT 或 HYPERLOGLOG. |
| 你有一些非常大的分區表。它們正使用舊格式的synopses,而你希望逐步地用新的替換掉舊的 | 舊格式的synopses不會立即被新格式的替代,而新分區將會采用新格式。雖然混合模式會產生不太準確的統計信息,但其優點是不需要在前臺重新收集所有表的統計信息。自動統計信息收集任務,將逐步的收集老格式分區上的統計信息,并生成新格式的synopses.最終,所有的分區均會使用新格式,統計信息也會更準確。 EXEC DBMS_STATS.SET_TABLE_PREFS (table_owner, table_name,‘APPROXIMATE_NDV_ALGORITHM’,‘HYPERLOGLOG’) 注意:INCREMENTAL_STALENESS 偏好參數必須是 ALLOW_MIXED_FORMAT, 但是,并不需要顯式設置它,因為這就是默認值(除非你修改了它)。 |
| 你有時間重新收集所有統計信息。增量管理的表使用的是舊格式,而你希望立即用新格式替換掉舊格式。 | 如果你有一個窗口期用來完成分區表統計信息的重新收集,那么這是建議的方法。 EXEC DBMS_STATS.SET_TABLE_PREFS(table_owner,table_name, ‘APPROXIMATE_NDV_ALGORITHM’, ‘HYPERLOGLOG’) 你還需要指定不希望新舊格式混合存在于同一個表上: EXEC DBMS_STATS.SET_TABLE_PREFS (table_owner, table_name, ‘INCREMENTAL_STALENESS’, ‘NULL’) 你要仔細一些,偏好參數的值要設為’NULL’ (有引號) ,而不是NULL (無引號)。 NULL (無引號)會設置偏好參數為其默認值,在本例中是ALLOW_MIXED_FORMAT。一旦這些偏好參數設置后,你就需要重新收集表的統計信息了。 |
記住,你也可以在庫級,全局級和SCHEMA級設置DBMS_STATS的偏好參數(比如APPROXIMATE_NDV_ALGORITHM) ,就像在表級上設置那樣(如上例所示)。
總結
Oracle Database 12c Release 2中的synopsis格式比之前的格式更緊實。如果你的數據庫非常大,希望在維護統計信息良好精確度的同時,還能節省大量SYSAUX中的空間。希望維護synopses的系統開銷也會下降(比如,交換分區時)。有關此腳本和一些示例腳本的更多信息,請查看GitHub。
如果您對本文或GitHub中的腳本有任何評論,請在下面繼續。
原文鏈接: https://blogs.oracle.com/optimizer/post/efficient-statistics-maintenance-for-partitioned-tables-using-incremental-statistics-part-3
Efficient Statistics Maintenance for Partitioned Tables Using Incremental Statistics – Part 3
March 23, 2017 | 6 minute read
Nigel Bayliss
Product Manager
Introduction
This post covers how Oracle has improved incremental statistics for partitioned tables in Oracle Database 12c Release 2. If you’re not already familiar with synopses in the context of incremental statistics then take a look at Part 1 and Part 2 before you read on.
Beginning with Oracle Database 12c Release 2 there’s a new synopsis format that’s significantly more compact than the format used in earlier releases. For brevity, I’ll refer to the Oracle Database 12c Release 2 format as new and the previous format as old.
This enhancement is particularity relevant if you looked at incremental statistics maintenance in the past but decided not to use it because of the additional space usage in SYSAUX. Oracle Database 12c Release 2 resolves this issue and, in many cases, reduces the amount of system resource required to manage synopses.
A New Type of Synopsis
A synopsis is metadata stored in a couple of tables in the data dictionary (SYS.WRI and SYS.WRI). The metadata is managed automatically by the Oracle Database, so there’s generally no reason to be aware of the underlying implementation. Nevertheless, I’ll cover some of the details here because it will help you to see why the change is so significant.
Synopses in Oracle Database 12c Release 2 are now created (by default) using an algorithm called HyperLogLog (HLL). This is a state-of-the-art algorithm that calculates the approximate number of distinct values for table column values. Even though it is an approximation, it is nevertheless very accurate with a typical error rate of less than 2%. Prior to Oracle Database 12c Release 2, synopses consisted of rows stored in the WRI table. The number of rows in this table can be very large if there are a large number of partitions and table columns, and if the columns contain a large number of distinct values. New-style synopses do not store rows in this table. Instead, some additional (and compact) hash data is stored in the WRI table (in the SPARE2 column).
How much smaller are the new synopses? Well, as you’ve probably guessed, it’s the consultants’ answer: “It Depends”. As outlined above, the space used by synopses is a function of the number of partitions, columns and distinct values in columns. I can give you an example from one of our test systems containing an 8TB test table with 84-partitions. The total size of the old-style synopses was around 160MB and the new-style synopses totaled only 6MB. The example I created in GitHub (see below) was contrived to generate a particularly large amount of synopsis data. In the example I’ve given, new-style synopses take up virtually no space at all and the old-style synopses take up about 160MB (in SYSAUX).
How to Use Them
What do you need to do to use new-style synopses? Nothing! They are used by default if you choose to use incremental statistics and you don’t need to do anything different in Oracle Database 12c Release 2 compared to earlier releases.
Controlling Synopses
You can control the type of that will be created using a DBMS_STATS preference called APPROXIMATE_NDV_ALGORITHM.
The default is REPEAT OR HYPERLOGLOG: if a table is using old-style synopses then it will continue to do so, and tables using new-style synopses will continue to use those!
There’s no reason to use anything other than the default unless you are upgrading a database to Oracle Database 12c Release 2. If this is the case then you might want to consider the options. That’s covered next.
Upgrading
If you are upgrading a database that’s using incremental statistics, then you will want to migrate to using the new-style synopses. How do you go about doing that? It’s worth noting from the outset that it’s possible to have partitions with old-style and new-style synopses in the same table. Also, the good news is that you can control when and how to transition from one type of synopses to the other.
There is a DBMS_STATS preference called INCREMENTAL_STALENESS. It the controls whether or not you want to allow partitions within an individual table to have different types of synopses during the transition period from old-style to new-style. Let’s look at the different scenarios and how to proceed after you have upgraded to Oracle Database 12c Release 2. There is a spectrum of choice from “very conservative” (i.e., maintaining old behaviors) to “aggressive” (i.e., taking advantage of new features immediately). The chart below describes the different scenarios from the most conservative cases to most aggressive cases.
| Use-case | Action |
|---|---|
| Initially,you want to continue to use old-format synopses for all tables.We recommend that you use the new-style synopses, but can choose to use them later on if you prefer. The algorithm used prior to HLL is called adaptive sampling. | EXEC DBMS_STATS.SET_TABLE_PREFS(‘table_owner’,‘table-name’, ‘APPROXIMATE_NDV_ALGORITHM’, ‘ADAPTIVE SAMPLING’) |
| You want tables using old-style synopses to continue to use them.Newly created incrementally-managed tables will use new-style synopses. Incrementally-managed tables without synopses will use new-style when statistics are gathered.Incrementally-managed tables with old-style synopses will continue to use them. | No action. This is the default behavior.The APPROXIMATE_NDV_ALGORYTHM is, by default, REPEAT OR HYPERLOGLOG. |
| You have some very large partitioned tables. They are using old-style synopses and you want to gradually replace the old with the new. | Old-format synopses are not immediately replaced and new partitions will have synopses in the new format. Mixed formats will yield less accurate statistics but the advantage is that there is no need to re-gather all table statistics in the foreground. The statistics auto job will gradually re-gather statistics on partitions with old format synopses and generate new format synopses. Eventually, new format synopses will be used for all partitions and statistics will be accurate. EXEC DBMS_STATS.SET_TABLE_PREFS (table_owner, table_name,‘APPROXIMATE_NDV_ALGORITHM’,‘HYPERLOGLOG’)Note that INCREMENTAL_STALENESS preference must have the value ALLOW_MIXED_FORMAT, but it does not need to be set explicity (unless you’ve changed it) because it is the default setting。 |
| You have time to re-gather all statistics. Incrementally managed tables are using old-style synopses and you want to replace the old-style with the new immediately. | If you have a window of time to completely re-gather statistics for partitioned tables, then this is the recommended approach. EXEC DBMS_STATS.SET_TABLE_PREFS(table_owner,table_name, ‘APPROXIMATE_NDV_ALGORITHM’, ‘HYPERLOGLOG’) You also need to specify that you don’t want a mix of old synopses and new synopses in the same table: EXEC DBMS_STATS.SET_TABLE_PREFS (table_owner, table_name, ‘INCREMENTAL_STALENESS’, ‘NULL’)You need to take some care here. The preference value should be set to ‘NULL’ (in quotes) and not NULL (without quotes). NULL (without quotes) sets a preference to its default value, which in this case is ALLOW_MIXED_FORMAT.Once these preferences are set you will need to re-gather the table’s statistics. |
Remember that you can also set DBMS_STATS preferences (such as APPROXIMATE_NDV_ALGORITHM) at the database, global and schema-level as well as at the table level (as per the examples above).
Summary
The synopsis format in Oracle Database 12c Release 2 is much more compact than the previous format. If your database is very large, expect to save a lot of space in SYSAUX while maintaining very good accuracy for your statistics. You can expect the system overhead required to manage synopses to drop too (for example, when you exchange partitions). For more on this and some example scripts, take a look at GitHub.
If you have comments on this post or the scripts in GitHub, please go ahead below.




