分布式数据库系统原理中的查询过程
一、定义与说明
1 优化
为了减少优化消耗,查询过程的优化的目标为得到一个较优解,而不是最优解.其中解为一个消耗较小的关系运算序列或操作序列.
2片段
片段是指被分散并保存在不同位置的(全局)关系的子集(局部关系集合)
二、查询优化过程
1 查询语句分解
1.1 解析查询语句 1.2 判断查询语句语义正确性 1.3 消除重复出现的谓词 1.4 将查询语句重构为较优的关系运算序列
2 数据定位
2.1 将分布式(全局)查询映射为片段查询 2.2 简化和重构碎片上的查询,优化片段查询
3 全局查询优化
3.1 结合片段上的信息(包括所查询的属性元素数等)进行优化,得到较优片段查询操作与通讯操作的序列。其中很重要的一方面则是联结操作的执行顺序。很多现代分布式数据库都使用半联结的方式进行优化。
3.2 半联结:只将表中那些将会实际参与联结的元组进行传输,并在站点上对这些元组和相关表进行联结操作。
4 局部查询优化
每个站点上针对本地查询进行优化。
通过Oracle执行计划推测优化策略
一、测试前提:
1 如何查看Oracle执行计划
第一步:登入sql/plus 执行命令(无先后顺序)
set timeon; (说明:打开时间显示)
set autotracetraceonly; (说明:打开自动分析统计,不显示SQL语句的运行结果)。 第二步:输入你要查看的sql执行
第三步:查看结果
2 用户、库表说明
2.1 用户1
用户名:XUQIU@ORG004
密码:XUQIU
2.2 用户2
用户名:需求分析定制网站@ ORACLEDB_192.168.2.115
密码:XUQIU
2.3用户3
用户名:PADISTEST@ORAGS7_192.168.2.7
密码:apple
2.3 库表说明
用户1、用户2所对应的库都是需求库且具有相同的库表结构,且每个对应的表中具有相同的数据;用户3所对应的库为仿真库。
二、测试过程
1 登录SQL/PLUS、开启执行计划并创建DBLINK
1.1 SQL/PLUS记录
Microsoft Windows XP [版本 5.1.2600]
(C) 版权所有 1985-2001 Microsoft Corp.
C:\Documents and Settings\Administrator>sqlplus
SQL*Plus: Release 10.2.0.1.0 - Production on 星期二 1月 18 14:45:04 2011
Copyright (c) 1982, 2005, Oracle. All rights reserved.
请输入用户名: xuqiu@org004
输入口令:
连接到:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
SQL> set time on;
14:45:35 SQL> set autotracetraceonly;
14:45:46 SQL> create database link mopishv0link connect to "需求分析定制网站" identified by "XUQIU" using'ORAC
LEDB_192.168.2.115';
数据库链接已创建。
14:47:52 SQL> select * from WORK_INFO@mopishv0link;
已选择1161行。
执行计划
----------------------------------------------------------
0 SELECT STATEMENT (REMOTE) Optimizer=CHOOSE
1 0 TABLE ACCESS (FULL) OF 'WORK_INFO' ORACLEDB .US.ORAC
LE.COM
统计信息
----------------------------------------------------------
0 recursive calls
1 db block gets
0 consistent gets
0 physical reads
212 redo size
172850 bytes sent via SQL*Net to client
1219 bytes received via SQL*Net from client
158 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1161 rows processed
15:35:37 SQL> create database link mopishv1link connect to "PADISTEST" identified by "apple" using'ORAGS7_192.168.2.7';
数据库链接已创建。
16:44:09 SQL> select * from ACTIVITY@mopishv1link;
已选择452行。
执行计划
----------------------------------------------------------
0 SELECT STATEMENT (REMOTE) Optimizer=CHOOSE
1 0 TABLE ACCESS (FULL) OF 'ACTIVITY' ORAGS7.U
S.ORACLE .COM 2有条件查询本地库表
2.1 SQL/PLUS记录
14:48:18 SQL> select work_id from WORK_RULE_INFO where RULE_LEV='操作级';
执行计划
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (FULL) OF 'WORK_RULE_INFO'
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
32 consistent gets
0 physical reads
0 redo size
378 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
14:49:43 SQL>
2.2 记录分析
由TABLE ACCESS (FULL) OF 'WORK_RULE_INFO'我们可以看到在对本地查询的时候虽然有条件限制,但也没有做过多的优化而是直接访问整个WORK_RULE_INFO表。
3无条件查询远程库表
3.1 SQL/PLUS记录
14:49:43 SQL> select work_id from WORK_RULE_INFO@mopishv0link;
已选择1161行。
执行计划
----------------------------------------------------------
0 SELECT STATEMENT (REMOTE) Optimizer=CHOOSE
1 0 TABLE ACCESS (FULL) OF 'WORK_RULE_INFO' ORACLEDB .US.ORAC LE.COM
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
18183 bytes sent via SQL*Net to client
1219 bytes received via SQL*Net from client
158 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1161 rows processed
14:53:57 SQL>
3.2 记录分析
与3.1比较可以发现,访问远程库表时,会在操作后加入远程数据库的全局数据库名(Global Database Name)表示操作发生在远程数据库服务上。
4不同库中的不同表的表间查询
4.1 SQL/PLUS记录1
15:02:39 SQL> select WORK_RULE_INFO.OP_AV_BE_TIME1
WORK_RULE_INFO,WORK_INFO@mopishv0link where WORK_INFO.
WORK_ID=WORK_RULE_INFO.WORK_ID and RULE_LEV='操作级';
执行计划
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 NESTED LOOPS from
2 1 REMOTE* MOPISHV0
LINK.US.
ORACLE.C
OM
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'WORK_RULE_INFO'
4 3 INDEX (UNIQUE SCAN) OF 'WORK_RULE_INFO_PK' (UNIQUE)
2 SERIAL_FROM_REMOTE SELECT "WORK_ID" FROM "WORK_INFO"
"WORK_INFO
"
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
3118 consistent gets
0 physical reads
0 redo size
390 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
4.2 SQL/PLUS记录2
15:09:21 SQL> select WORK_RULE_INFO.OP_AV_BE_TIME1 from
WORK_RULE_INFO,WORK_INFO@mopishv0link where RULE_LEV='
操作级' and WORK_INFO.WORK_ID=WORK_RULE_INFO.WORK_ID;
执行计划
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 NESTED LOOPS
2
MOPISHV0
LINK.US. 1 REMOTE*
ORACLE.C OM
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'WORK_RULE_INFO'
4 3 INDEX (UNIQUE SCAN) OF 'WORK_RULE_INFO_PK' (UNIQUE)
2 SERIAL_FROM_REMOTE SELECT "WORK_ID" FROM "WORK_INFO" "WORK_INFO
"
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
3118 consistent gets
0 physical reads
0 redo size
390 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
4.2 SQL/PLUS记录3
15:10:12 SQL> select WORK_RULE_INFO.OP_AV_BE_TIME1 from WORK_RULE_INFO,WORK_INFO@mopishv0link where RULE_LEV='
操作级' and WORK_INFO.WORK_ID=WORK_RULE_INFO.WORK_ID and RULE_LEV='操作级';
执行计划
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 NESTED LOOPS
2 1
MOPISHV0
LINK.US.
ORACLE.C OM REMOTE*
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'WORK_RULE_INFO'
4 3 INDEX (UNIQUE SCAN) OF 'WORK_RULE_INFO_PK' (UNIQUE)
2 SERIAL_FROM_REMOTE SELECT "WORK_ID" FROM "WORK_INFO" "WORK_INFO
"
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
3118 consistent gets
0 physical reads
0 redo size
390 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
15:24:25 SQL>
4.4记录分析 1
通过对比4.1、4.2与4.3我们可以发现条件的颠倒与重复都对执行计划没有影响,这说明在联结之前Oracle会对查询进行优化。这点符合分布式数据库系统原理中所描述的查询语句分解的特点。
4.5 SQL/PLUS记录4
15:24:25 SQL> select WORK_INFO.WORK_NAME
WORK_RULE_INFO,WORK_INFO@mopishv0link where WORK_INFO.WORK_ID=WO
RK_RULE_INFO.WORK_ID;
已选择1161行。
执行计划
---------------------------------------------------------- from
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 NESTED LOOPS
2
MOPISHV0
LINK.US. 1 REMOTE*
ORACLE.C OM
3 1 INDEX (UNIQUE SCAN) OF 'WORK_RULE_INFO_PK' (UNIQUE)
2 SERIAL_FROM_REMOTE SELECT "WORK_ID","WORK_NAME" FROM "WORK_INFO
" "WORK_INFO"
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
1240 consistent gets
0 physical reads
0 redo size
32377 bytes sent via SQL*Net to client
1219 bytes received via SQL*Net from client
158 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1161 rows processed
15:35:37 SQL>
4.6记录分析 2
通过对比4.1、4.2、4.3与4.5可以发现,Oracle只从远程库表中查询参与联结的元组,而不是查询所有列。这与分布式数据库系统原理中所描述的半联结相符。
4.7 SQL/PLUS记录5
16:57:00 SQL> select WORK_INFO.WORK_NAME,FLOWNAMES2.FLOW_NAME from
and WORK_RULE_INFO,WORK_INFO@mopishv0link,FLOWN AMES2@mopishv1link where WORK_INFO.WORK_ID=WORK_RULE_INFO.WORK_ID
FLOWNAMES2.WORK_ID=WORK_RULE_INFO.WORK_I
D and trim(COMPANY_CODE)='高法';
已选择327行。
执行计划
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 MERGE JOIN
2 1 SORT (JOIN)
3 2 NESTED LOOPS
4
MOPISHV1 3 REMOTE*
LINK.US.
ORACLE.C
OM
5 3 INDEX (UNIQUE SCAN) OF 'WORK_RULE_INFO_PK' (UNIQUE)
6 1 SORT (JOIN)
7 6 REMOTE* MOPISHV0
LINK.US.
ORACLE.C
OM
4 SERIAL_FROM_REMOTE
"COMPANY_CODE","FLOW_NAME","WORK_ID" SELECT
FROM "FLOWNAMES2" "FLOWNAMES2" WHERE
7 SERIAL_FROM_REMOTE SELECT "WORK_ID","WORK_NAME" FROM
"WORK_INFO
" "WORK_INFO"
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
329 consistent gets
0 physical reads
0 redo size
27050 bytes sent via SQL*Net to client
603 bytes received via SQL*Net from client
46 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
327 rows processed
17:30:08 SQL>
4.8 SQL/PLUS记录6
17:31:33 SQL> select WORK_INFO.WORK_NAME,FLOWNAMES2.FLOW_NAME from
and WORK_RULE_INFO,WORK_INFO@mopishv0link,FLOWN AMES2@mopishv1link where WORK_INFO.WORK_ID=WORK_RULE_INFO.WORK_ID
FLOWNAMES2.WORK_ID=WORK_RULE_INFO.WORK_I
D and WORK_INFO.SUBSYS_ID='55259';
执行计划
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 MERGE JOIN
2 1 SORT (JOIN)
3 2 NESTED LOOPS
4 3
MOPISHV1
LINK.US.
ORACLE.C OM
5 3 INDEX (UNIQUE SCAN) OF 'WORK_RULE_INFO_PK' (UNIQUE)
6 1 SORT (JOIN)
7
MOPISHV0 6 REMOTE* REMOTE*
LINK.US.
ORACLE.C OM
4 SERIAL_FROM_REMOTE SELECT "FLOW_NAME","WORK_ID" FROM "FLOWNAMES
2" "FLOWNAMES2"
7 SERIAL_FROM_REMOTE SELECT "WORK_ID","WORK_NAME","SUBSYS_ID" FRO
M "WORK_INFO" "WORK_INFO" WHERE "SUB
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
329 consistent gets
0 physical reads
0 redo size
814 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
5 rows processed
17:32:12 SQL>
4.9记录分析 3
通过对比4.7与4.8可以发现:Oracle并没有优先执行可以减少数据传输与联结记录数的查询。例如4.8中并没有优先执行WORK_INFO上的查询。而从WHERE字句可以发现,Oracle并没有利用之前查询所得到的结果。这有两种可能:
1 优化得到的是较优的结果而不是最优的,因此没有利用之前查询所得的结果。 2 Oracle的优化不会动态利用查询所得到的结果。
4.10 SQL/PLUS记录7
17:30:08 SQL> select WORK_INFO.WORK_NAME,FLOWNAMES2.FLOW_NAME from
where WORK_RULE_INFO@mopishv0link,WORK_INFO@mopis hv0link,FLOWNAMES2@mopishv1link
WORK_INFO.WORK_ID=WORK_RULE_INFO.WORK_ID and FLOWNAMES2.WORK_ID=WORK_RUL E_INFO.WORK_ID and trim(COMPANY_CODE)='高法';
已选择327行。
执行计划
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 MERGE JOIN
2 1 MERGE JOIN
3 2 SORT (JOIN)
4
MOPISHV1 3 REMOTE*
LINK.US.
ORACLE.C OM
5 2 SORT (JOIN)
6
MOPISHV0 5 REMOTE*
LINK.US.
ORACLE.C OM
7 1 SORT (JOIN)
8
MOPISHV0 7 REMOTE*
LINK.US.
ORACLE.C OM
4 SERIAL_FROM_REMOTE SELECT "COMPANY_CODE","FLOW_NAME","WORK_ID"
FROM "FLOWNAMES2" "FLOWNAMES2" WHERE
6 SERIAL_FROM_REMOTE SELECT "WORK_ID" FROM "WORK_RULE_INFO" "WORK
_RULE_INFO"
8 SERIAL_FROM_REMOTE SELECT "WORK_ID","WORK_NAME" FROM "WORK_INFO
" "WORK_INFO"
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
27050 bytes sent via SQL*Net to client
603 bytes received via SQL*Net from client
46 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
327 rows processed
19:10:46 SQL>
4.11记录分析 4
由4.10可以看出Oracle并没有对可以在同一站点进行内部联结的查询进行优化。原因与记录分析3相同,有两种可能:
1 优化得到的是较优的结果而不是最优的,因此没有利用之前查询所得的结果。 2 Oracle的优化不会动态利用查询所得到的结果。
5 Oracle查询提示driving_site中的启发
提示driving_site用于分布式查询中,指定数据集发送到那个数据库上执行。在某些情况下可以大大提高SQL的性能。
这说明Oracle默认是在本地进行联结操作的,只有通过提示才能进行修改,Oracle并没有对联结操作的位置进行优化。并且由于SQL语句中本身带有定位信息,因此在这里推测分布式数据库系统原理中所描述的数据定位可能被大幅度简化,进而推测Oracle在DBLINK的查询过程中也并没有过多的获取片段上的信息(包括所查询的属性元素数等)。
三、总结
通过以上测试与分布式数据库系统原理中对查询过程的描述,推测Oracle会对不同库中的不同表的表间查询进行优化,但优化程度有限。不同数据库的表间查询步骤如下: 1查询语句分解,优化查询操作序列。
2 将查询条件发送给远程数据库(可能有多个)。
3 远程数据库对参与联结的元组进行查询并将结果发送给进行联结操作的站点。 4 进行联结操作,将结果传送给客户端。