PostgreSQL 与 Oracle 访问分区表执行计划差异

2022-11-08,,,,

熟悉Oracle 的DBA都知道,Oracle 访问分区表时,对于没有提供分区条件的,也就是在无法使用分区剪枝情况下,优化器会根据全局的统计信息制定执行计划,该执行计划针对所有分区适用。在分析利弊之前,我们先来看个例子:

一、Oracle

构建数据:

create table part_tab01(part_key char(1),state char(1),desc_content varchar(4000))
partition by range(part_key)
(
partition part_0 values less than(1),
partition part_1 values less than(2)
); insert into part_tab01 select '0','0',rpad('a',1000,'a') from dba_objects where rownum<10001;
insert into part_tab01 select '1','1',rpad('a',1000,'a') from dba_objects where rownum<10001;
insert into part_tab01 select * from part_tab01;
insert into part_tab01 select * from part_tab01;
insert into part_tab01 select * from part_tab01;
insert into part_tab01 select * from part_tab01;
insert into part_tab01 select * from part_tab01;
insert into part_tab01 select * from part_tab01;
insert into part_tab01 select '1','0',rpad('a',1000,'a') from dba_objects where rownum<11;
insert into part_tab01 select '0','1',rpad('a',1000,'a') from dba_objects where rownum<11; create index idx_part_tab01_state on part_tab01(state) local;

从数据的分布可以得出结论,最优的访问方法:对于不同的分区,访问不同的state 值,应采用不同的表访问方法。

执行计划:

SQL> select * from part_tab01 where state='1';

640010 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4116343635 --------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 640K| 613M| 49576 (1)| 00:00:02 | | |
| 1 | PARTITION RANGE ALL| | 640K| 613M| 49576 (1)| 00:00:02 | 1 | 2 |
|* 2 | TABLE ACCESS FULL | PART_TAB01 | 640K| 613M| 49576 (1)| 00:00:02 | 1 | 2 |
-------------------------------------------------------------------------------------------------- SQL> select * from part_tab01 where state='1' and part_key='0'; 10 rows selected. Execution Plan
----------------------------------------------------------
Plan hash value: 1952449058 -----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 10050 | 5 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE | | 10 | 10050 | 5 (0)| 00:00:01 | 1 | 1 |
|* 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PART_TAB01 | 10 | 10050 | 5 (0)| 00:00:01 | 1 | 1 |
|* 3 | INDEX RANGE SCAN | IDX_PART_TAB01_STATE | 10 | | 3 (0)| 00:00:01 | 1 | 1 |
----------------------------------------------------------------------------------------------------------------------------------- SQL> select * from part_tab01 where state='1' and part_key='1'; 640000 rows selected. Execution Plan
----------------------------------------------------------
Plan hash value: 4278184147 -----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 640K| 613M| 24793 (1)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 640K| 613M| 24793 (1)| 00:00:01 | 2 | 2 |
|* 2 | TABLE ACCESS FULL | PART_TAB01 | 640K| 613M| 24793 (1)| 00:00:01 | 2 | 2 |
-----------------------------------------------------------------------------------------------------

可以看到,在没有分区条件的情况下,Oracle 是针对全表采用统一的执行。实际针对该SQL,最好的访问方法应该是:part_0 全表,part_1 索引

二、PostgreSQL 执行计划

构建数据:

create table part_tab01(part_key char(1),state char(1),desc_content text)
partition by range(part_key)
(
partition part_0 values less than(1),
partition part_1 values less than(2)
); insert into part_tab01 select '0','0',repeat('a',1000) from generate_series(1,1000000);
insert into part_tab01 select '0','1',repeat('b',1000) from generate_series(1,10);
insert into part_tab01 select '1','1',repeat('a',1000) from generate_series(1,1000000);
insert into part_tab01 select '1','0',repeat('b',1000) from generate_series(1,10); create index idx_part_tab01_state on part_tab01(state);

执行计划:针对不同分区,有不同的执行计划。

test=# explain analyze select * from part_tab01 where state='1';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.42..160363.43 rows=1000000 width=1008) (actual time=0.022..484.005 rows=1000010 loops=1)
-> Index Scan using part_tab01_part_0_state_idx on part_tab01_part_0 (cost=0.42..4.44 rows=1 width=1008) (actual time=0.022..0.024 rows=10 loops=1)
Index Cond: (state = '1'::bpchar)
-> Seq Scan on part_tab01_part_1 (cost=0.00..155358.99 rows=999999 width=1008) (actual time=0.011..424.713 rows=1000000 loops=1)
Filter: (state = '1'::bpchar)
Rows Removed by Filter: 10
Planning Time: 0.293 ms
Execution Time: 515.549 ms
(8 rows) test=# explain analyze select * from part_tab01 where state='0';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..160363.68 rows=1000014 width=1008) (actual time=0.022..517.127 rows=1000010 loops=1)
-> Seq Scan on part_tab01_part_0 (cost=0.00..155359.16 rows=1000013 width=1008) (actual time=0.022..451.523 rows=1000000 loops=1)
Filter: (state = '0'::bpchar)
Rows Removed by Filter: 10
-> Index Scan using part_tab01_part_1_state_idx on part_tab01_part_1 (cost=0.42..4.44 rows=1 width=1008) (actual time=0.032..0.035 rows=10 loops=1)
Index Cond: (state = '0'::bpchar)
Planning Time: 0.090 ms
Execution Time: 547.486 ms
(8 rows)

  

三、结论

从本例可以看出,在不同分区数据分布不同的场景下,PostgreSQL针对不同分区有独立的执行计划是更优方法。现实中典型的场景,如:按时间分区的工单表,历史分区可能大部分工单是结束状态,而当前分区工单可能大部分是非结束状态。

PostgreSQL 与 Oracle 访问分区表执行计划差异的相关教程结束。

《PostgreSQL 与 Oracle 访问分区表执行计划差异.doc》

下载本文的Word格式文档,以方便收藏与打印。