Thursday, 27 December 2012

Step by Step Implementing SQL Baselines for bad Query


Here’s a scenario Where query after ages of running fine, and no changes to the code, a query suddenly starts running for magnitudes longer than it used to.  In this instance  Query Didn't performed well since last couple of hours since plan has been changed.  I can say Manual is better than Grid control in creation of base lines  ( Grid control is very tedious job to create & implement baselines ).  

In one of my database Cost of query was also very  considerable low ( highest  cost of the query is 99 ) .  Here is step by step to Implementing SQL Baseline's :-

BAD SQL ID:-  XXXXXXXXXXXXX

##################################################################################################################
Step 1) Using Kerry Osborne’s script to look at the plan_hash_value over time from AWR, it was clear that the CBO had picked a new, bad, explain plan :-
##################################################################################################################

set lines 155
col execs for 999,999,999
col avg_etime for 999,999.999
col avg_lio for 999,999,999.9
col begin_interval_time for a30
col node for 99999
break on plan_hash_value on startup_time skip 1
select ss.snap_id, ss.instance_number node, begin_interval_time, sql_id, plan_hash_value,
nvl(executions_delta,0) execs,
(elapsed_time_delta/decode(nvl(executions_delta,0),0,1,executions_delta))/1000000 avg_etime,
(buffer_gets_delta/decode(nvl(buffer_gets_delta,0),0,1,executions_delta)) avg_lio
from DBA_HIST_SQLSTAT S, DBA_HIST_SNAPSHOT SS
where sql_id = nvl('&sql_id','XXXXXXXXX')
and ss.snap_id = S.snap_id
and ss.instance_number = S.instance_number
and executions_delta > 0
order by 1, 2, 3
/


In above script just replace the  sql id with bad sql id  according to situation



OUTPUT :-
-------------



From above out we can clearly see the Currently  above sql having the two PLAN's currently using bad plan compare to previous plan.


PLAN_HASH_VALUES from above output:-

2339166878 ====> is Bad plan by watching above
4248587209 =====> Good one better than above one

So we knew the sql_id, and we knew the plan_hash_value of the plan which we wanted the CBO to use. But how to do this?

Back to Kerry Osborne again, and his article about SQL Plan Baselines. He (and others) write in detail about how and what SQL Plan Baselines are, but in essence it lets you tell Oracle which plan to use (or optionally, prefer) for a given sql_id.

Since the desired plan_hash_value was no longer in the cursor cache, we could get it back from AWR, loaded in via a SQL Tuning Set. Here’s the code with in-line comments explaining the function of each block:-

####################################################################################################################################################################
Step 2)  Set up a SQL Baseline using known-good plan, sourced from AWR snapshots.  In this example, sql_id is XXXXXXXXX and the plan_hash_value of the good plan that we want to force is 4248587209                   #
####################################################################################################################################################################

-- Create SQL Tuning Set (STS)
BEGIN
  DBMS_SQLTUNE.CREATE_SQLSET(
    sqlset_name => 'MySTS01',
    description => 'SQL Tuning Set for loading plan into SQL Plan Baseline');
END;


-- Populate STS from AWR, using a time duration when the desired plan was used
--  List out snapshot times using :   SELECT SNAP_ID, BEGIN_INTERVAL_TIME, END_INTERVAL_TIME FROM dba_hist_snapshot ORDER BY END_INTERVAL_TIME DESC;
--  Specify the sql_id in the basic_filter (other predicates are available, see above output)


DECLARE
  cur sys_refcursor;
BEGIN
  OPEN cur FOR
    SELECT VALUE(P)
    FROM TABLE(
       dbms_sqltune.select_workload_repository(begin_snap=>20615, end_snap=>20623,basic_filter=>'sql_id = ''XXXXXXXXXXXXXXXXXX''',attribute_list=>'ALL')
              ) p;
     DBMS_SQLTUNE.LOAD_SQLSET( sqlset_name=> 'MySTS01', populate_cursor=>cur);
  CLOSE cur;
END;
/


This query is using good plan between  20615 & 20623 snapshots from above step 1 pasted output.


-- List out SQL Tuning Set contents to check we got what we wanted

SELECT
  first_load_time          ,
  executions as execs              ,
  parsing_schema_name      ,
  elapsed_time  / 1000000 as elapsed_time_secs  ,
  cpu_time / 1000000 as cpu_time_secs           ,
  buffer_gets              ,
  disk_reads               ,
  direct_writes            ,
  rows_processed           ,
  fetches                  ,
  optimizer_cost           ,
  sql_plan                ,
  plan_hash_value          ,
  sql_id                   ,
  sql_text
   FROM TABLE(DBMS_SQLTUNE.SELECT_SQLSET(sqlset_name => 'MySTS01')
             );


-- Here MySTS01 is Tuning set which was we created for this query.

######################################
Step 3) Lets Pin good plan to this query                    #
######################################


-- List out the Baselines to see what's there
SELECT * FROM dba_sql_plan_baselines ;


-- Load desired plan from STS as SQL Plan Baseline
-- Filter explicitly for the plan_hash_value here if you want

DECLARE
my_plans pls_integer;
BEGIN
  my_plans := DBMS_SPM.LOAD_PLANS_FROM_SQLSET(
    sqlset_name => 'MySTS01',
    basic_filter=>'plan_hash_value = ''4248587209'''
    );
END;
/

-- List out the Baselines
SELECT * FROM dba_sql_plan_baselines ;

########################################################################################################################
Step 4)  Flush the cursor cache after loading the baseline to make sure it gets picked up on next execution of the sql_id.                                                                                                                     #
###########################################################################################################

Since from above first step we dealt with single query we will try to flush that query from cursor cache as below.


1) If a single SQL statement has to be flushed out of the object library cache, the first step is to find the address of the handle and the hash value of the cursor that has to go away. Name of the object [to be purged] is the concatenation of the ADDRESS and HASH_VALUE columns from the GV$SQLAREA view. Here is an example:

select INST_ID,ADDRESS, HASH_VALUE,OLD_HASH_VALUE,PLAN_HASH_VALUE from gV$SQLAREA where SQL_ID like 'XXXXXXXXXXXXXXX';


   INST_ID ADDRESS          HASH_VALUE OLD_HASH_VALUE PLAN_HASH_VALUE
---------- ---------------- ---------- -------------- ---------------
         4 0000000419923CB8 3373908091     2718803809      2339166878




2) The syntax for the PURGE the PLAN_HASH_VALUE procedure is shown below.


procedure PURGE (
        name VARCHAR2,
        flag CHAR DEFAULT 'P',
        heaps NUMBER DEFAULT 1)


SQL> exec DBMS_SHARED_POOL.PURGE ('0000000419923CB8, 3373908091', 'C');

PL/SQL procedure successfully completed.

SQL> select ADDRESS, HASH_VALUE from V$SQLAREA where SQL_ID like' XXXXXXXXXX ';

no rows selected  ( Since we purged the plan its wont return the any values If return we need to flush the shared pool to clear the all plans from cursor cache )

####################################################################
Step 6) Check the query Got picked up New plan which was we pinned through Baselines:-
####################################################################





In above out you can see New plan was picked up  which was we pinned through baselines.

If you execute below statement also will show you plan has been picked in awr or not :-
----------------------------------------------------------------------------------------------------------
set lines 155
col execs for 999,999,999
col avg_etime for 999,999.999
col avg_lio for 999,999,999.9
col begin_interval_time for a30
col node for 99999
break on plan_hash_value on startup_time skip 1
select ss.snap_id, ss.instance_number node, begin_interval_time, sql_id, plan_hash_value,
nvl(executions_delta,0) execs,
(elapsed_time_delta/decode(nvl(executions_delta,0),0,1,executions_delta))/1000000 avg_etime,
(buffer_gets_delta/decode(nvl(buffer_gets_delta,0),0,1,executions_delta)) avg_lio
from DBA_HIST_SQLSTAT S, DBA_HIST_SNAPSHOT SS
where sql_id = nvl('&sql_id','XXXXXXXXXXX')
and ss.snap_id = S.snap_id
and ss.instance_number = S.instance_number
and executions_delta > 0
order by 1, 2, 3
/





In above at around 5:00 PM New plan was picked up due to we forced trough baselines.



Complete References:-

https://rnm1978.wordpress.com/2011/06/28/oracle-11g-how-to-force-a-sql_id-to-use-a-plan_hash_value-using-sql-baselines/ 
http://kerryosborne.oracle-guy.com/2009/04/oracle-11g-sql-plan-management-sql-plan-baselines/                                              
http://kerryosborne.oracle-guy.com/scripts/awr_plan_change.sql 

 

Disclaimer: If you feel this content related to your content  please feel to contact me to remove this information.  I am sharing the knowledge with people whoever in the need which is learning daily basis.

2 comments: