上个月参加了腾讯云DB王者数据库挑战赛,在初赛730人中排名进入前60不算满意,牛人真是多,感谢腾讯云给了相互切磋、相互学习、相互提高的机会。 比赛成绩查看: https://docs.qq.com/sheet/DU2pyckt6R3NsWkNH

顺便附上决赛题目和选手心得以供学习借鉴:

表结构:
create table region  (
       regionkey  integer not null,
       name       varchar(25) not null,
       comment    varchar(152),
       primary key(regionkey)
);
create table nation  (
       nationkey  integer not null,
       name       varchar(25) not null,
       regionkey  integer not null,
       comment    varchar(152),
       primary key(nationkey)
);
create table customer (
        custkey     integer not null,
        name        varchar(25) CHARACTER SET utf8 COLLATE utf8_bin not null,
        address     varchar(40) not null,
        nationkey   integer not null,
        phone       varchar(15) not null,
        acctbal     decimal(15,2)   not null,
        mktsegment  varchar(10) not null,
        comment     varchar(117) not null,
        primary key(custkey)
    );

create table orders  (
        orderkey       integer not null,
        custkey        integer not null,
        orderstatus    varchar(1) not null,
        totalprice     decimal(15,2) not null,
        orderdate      date not null,
        orderpriority  varchar(15) not null,
        clerk          varchar(15) not null,
        shippriority   integer not null,
        comment        varchar(79) not null,
        primary key(orderkey)
    );

待优化SQL语句1:
select c.custkey, c.phone,sum(o.totalprice) totalprice
   from nation n
   inner join customer c on c.nationkey = n.nationkey
   inner join orders o on o.clerk = c.name
   where n.name = "CHINA" and c.mktsegment ="HOUSEHOLD" and c.phone like "28-520%"
group by c.custkey, c.phone;
优化思路:
这是3表连接查询,所有第一考虑的方向是驱动表选择:
group by在5.7版本时会有一个隐藏排序,考虑用c表做驱动表,group by的字段来做索引,避免结果排序;
where中c表有mktsegment,phone;如果能全部包含在复合索引中,有可能实现using index不用回表;
from中c表有nationkey,name;如果能全部包含在复合索引中,有可能实现using index不用回表;
复合索引(custkey,phone,mktsegment,nationkey,name)字段过多,len长度过长;
综合考虑c表的写入性能,仅使用primary key(custkey,phone);
好处有:避免结果排序;利用索引中的phone字段,从引擎层排除一部分rows,在服务层再去除mktsegment不符的rows
被驱动表n的nationkey需要索引,本来就是主键不需要新建;o表的clerk需要索引,增加索引.
语句不用修改;索引alter语句:
ALTER TABLE customer DROP  PRIMARY key, ADD PRIMARY key (custkey,phone);
ALTER TABLE orders ADD index (clerk);


待优化SQL语句2:
select *
   from (
       select custkey, orderdate, sum(totalprice) as totalprice
       from orders
       group by custkey, orderdate
    ) t
where orderdate= "2019-08-01";
优化思路:
只有一个表,子查询后再where筛选;应该将选择和投影一次完成;
和第一条语句一样有个隐含排序的group by,考虑用(custkey,orderdate)复合索引;
如果将select中的totalprice字段,也加到索引中,可以实现using index不用回表;
但是为了考虑写入性能,不使用len太长,字段过多的复合索引;最终使用复合索引(custkey,orderdate)
优化后语句:
select custkey, orderdate, sum(totalprice) as totalprice
    from orders
    where orderdate= "2019-08-01"
    group by custkey, orderdate;
索引alter语句:
ALTER TABLE orders ADD key (custkey,orderdate);

待优化SQL语句3:
select c.custkey, sum(o.totalprice)totalprice
   from customer c
   left join orders o on o.custkey = c.custkey
   where c.phone like "33-64%" and c.name like concat("Customer#00003", "%")
group by c.custkey;
优化思路:
两表连接,选择驱动表;隐含排序的group by;where中的c表字段phone,name;
选择c表驱动,考虑复合索引(custkey,phone,name),综合考虑写性能使用(custkey,phone),在优化第一条语句时,已经建立;
被驱动表o的custkey字段需要索引,在优化第二条语句时,已经是复合索引的最左字段,不用新建;


待优化SQL语句4:
select c.custkey, c.phone
   from nation n
   inner join customer c on c.nationkey = n.nationkey
   where n.name = "CHINA" and exists (
       select 1 from orders o where o.custkey = c.custkey and o.orderdate ="1998-08-11"
    );
优化思路:
exists的子查询,可以用join distinct来实现,效率更高;
select c.custkey, c.phone
   from nation n
   inner join customer c 
   on c.nationkey = n.nationkey
   inner join (select distinct custkey from orders where orderdate ="1998-08-11") o 
   on o.custkey = c.custkey
   where n.name = "CHINA";
变子查询为<derived>表连接;3表连接,选择驱动表;
这时有两个选择:
a:where 字段有<derived>表的orderdate,可以使用auto_key(custkey,orderdate)索引,<derived>来做驱动表;
被驱动表c表custkey需要索引,优化第一条语句时,已是复合索引的最左字段;n表nationkey需要索引,本是主键,不用新建;
不需要新建索引;在测试时出现c,n表不使用主键,为什么?强制join顺序,需要强制使用索引;修改语句如下:
select c.custkey, c.phone
   from 
   (select distinct custkey from orders where orderdate ="1998-08-11") o 
   straight_join
   customer c force index(primary) on  o.custkey = c.custkey 
   straight_join
   nation n force index(primary) on c.nationkey = n.nationkey and n.name = "CHINA";
b:where字段有n表的name,可以使用index(name)索引,需要新建,n表来做驱动表;
被驱动表c表nationkey需要新建索引;<derived>custkey需要索引,已有auto_key(custkey,orderdate),不用新建;
语句使用第一步优化后的即可;索引alter语句如下:
alter table customer add index(nationkey);
alter table nation add index(name);
n表index(name)长度102,还多了一个c表的index(nationkey),现在只有表结构,没有数据,具体是a优,还是b优,待大家评议;
我本人倾向于选择a,如果执行时间不是太慢,我觉得还是不要建太多索引;
但是为什么需要force index()才能执行想要的执行计划?是没有数据吗?
b因为减少了n表的rows肯定是select更快,但是多了两个索引,影响n表和c表写入效率的;