Monday, July 7, 2014

Change Data Capture using Streams

I found this dusty script from my old shelves which enables change data capture feature enabled in oracle database 11.2.0.4 enterprise edition running RHEL 6.2 64bit.

I remember this script is for keeping a product table changes in a CDC table for a java process to read the CDC$xxx table. This consumer process reads the changed records is developed by our talented java development group and is beyond the scope of this post as can be understood.

Here, i will share how i enabled my oracle database change data capture feature.

There is some preparation steps in order to start the change data capture. Most of them are like to streams preparations scripts which makes sense hence streams is the underlying option if this feature.



--prepare the publisher user

--publisher user needs the following privileges and quotas

grant execute_catalog_role to dba_planet;
grant select_catalog_role to dba_planet;
grant execute on dbms_cdc_publish to dba_planet;
grant create sequence to dba_planet;
grant dba to dba_planet;
exec dbms_streams_auth.grant_admin_privilege('dba_planet');

alter user dba_planet quota unlimited on system;
alter user dba_planet quota unlimited on sysaux;


--prepare the database

--supplemental logging should be enabled in database level

alter database force logging;
alter database add supplemental log data;


--check database level supplemental logging level of the database

select SUPPLEMENTAL_LOG_DATA_MIN,
       SUPPLEMENTAL_LOG_DATA_PK,
       SUPPLEMENTAL_LOG_DATA_UI from v$database;


--prepare the instance

--depending on the configuration there will some process overhead when enabling the CDC
--so instance parameters should be reviewed

select name, value, isspecified
  from v$spparameter
  where name in
    ('compatible',
     'java_pool_size',
     'job_queue_processes',
     'parallel_max_servers',
     'processes',
     'sessions',
     'streams_pool_size',
     'undo_retention');


--minimum value of the undo_retention parameter should be 7200sec

alter system set undo_retention=7200 scope=both sid='*';



--prepare source tables

--create necessary log groups
--every column to be change logged should be supplemental logging enabled
--I preferred to create individual log groups for every column I will track

alter table sch_dmall.product add supplemental log data (all) columns;
alter table sch_dmall.product add supplemental log data (primary key) columns;
alter table sch_dmall.product add supplemental log data (foreign key) columns;
alter table sch_dmall.product add supplemental log data (unique) columns;

alter table sch_dmall.product add supplemental log group lg_product1 (id, deleted, seller_id,lastmodifieddate,category_id,dis_price,price,title,subtitle,productstatus,salestatus, urlwords) always;
alter table sch_dmall.product add supplemental log group lg_product2 (id) always;
alter table sch_dmall.product add supplemental log group lg_product3 (deleted) always;
alter table sch_dmall.product add supplemental log group lg_product4 (seller_id) always;
alter table sch_dmall.product add supplemental log group lg_product5 (lastmodifieddate) always;
alter table sch_dmall.product add supplemental log group lg_product6 (category_id) always;
alter table sch_dmall.product add supplemental log group lg_product7 (dis_price) always;
alter table sch_dmall.product add supplemental log group lg_product8 (price) always;
alter table sch_dmall.product add supplemental log group lg_product9 (title) always;
alter table sch_dmall.product add supplemental log group lg_product10 (subtitle) always;
alter table sch_dmall.product add supplemental log group lg_product11 (productstatus) always;
alter table sch_dmall.product add supplemental log group lg_product12 (salestatus) always;
alter table sch_dmall.product add supplemental log group lg_product13 (urlwords) always;



it begins with the famous streams starting point which is prepare_table_instantiation procedure to log the current SCN of the table to be change captured. This procedure will let CDC processes to know from which starting SCN point to start the Change Data Capture.

begin
--dbms_capture_adm.build;
dbms_capture_adm.prepare_table_instantiation(table_name => 'sch_dmall.product');
end;
/




Now we should create the change set to associate with the CDC table. One change set can include one or more CDC tables to process.

--if exists, first drop the change set
exec dbms_cdc_publish.drop_change_set('CS_PRODUCT');

--create the change set
begin
  dbms_cdc_publish.create_change_set(
    change_set_name => 'CS_PRODUCT',
    description => 'Change set for test',
    change_source_name => 'HOTLOG_SOURCE',
    stop_on_ddl => 'n',
    begin_date => sysdate,
    end_date => null);
end;
/



After successfully create the change set, it is time to create the change table. This important because columns to be change captured and the source table will be defined here by the column_type_list and source_table parameters.

--if exists, first drop the change set
exec dbms_cdc_publish.drop_change_table('SCH_DMALL','PRODUCT_CDC','Y');

--create the change table
begin
  dbms_cdc_publish.create_change_table(
  owner             => 'sch_dmall',
  change_table_name => 'product_cdc',
  change_set_name   => 'CS_PRODUCT',
  source_schema     => 'SCH_DMALL',
  source_table      => 'PRODUCT',
  column_type_list => 'ID NUMBER, DELETED NUMBER, SELLER_ID NUMBER, LASTMODIFIEDDATE DATE, CATEGORY_ID NUMBER, ' || 
                      'DIS_PRICE NUMBER, PRICE NUMBER, TITLE VARCHAR2(255), SUBTITLE VARCHAR2(255), ' || 
                      'PRODUCTSTATUS VARCHAR2(255), SALESTATUS VARCHAR2(255), URLWORDS VARCHAR2(255) ',
  capture_values  =>'both',
  rs_id           => 'y',
  row_id          => 'y',
  user_id         => 'y',
  timestamp       => 'y',
  object_id       => 'n',
  source_colmap   => 'n',
  target_colmap   => 'y',
  options_string  => 'TABLESPACE TS_DMALL_DATA');
end;
/

After all to start the CDC processes.

begin
  dbms_cdc_publish.alter_change_set(
    change_set_name => 'CS_PRODUCT',
    enable_capture => 'y');
end;
/

When examining the product_cdc table where the changes are collected, there will be some columns ending with '$' sign will exist. These columns are for holding the streams CDC processes internal information (called control columns) which are very helpful for identifying the row information. Especially the 'operation$' and 'commit_timestamp$' columns;

operation$ : D (delete), UN (updated after image), UO (updated before image),  Inserted
commit_timestamp$ : commit timestamp

Before handing the data over the development teams, creating a view can be a good idea. In this case I have created the following view for the development team to consume. Cheers, Ergem.


create or replace view sch_dmall.v_product_cdc as
select
  case
    when operation$='I' then 'I'
    when operation$='UN' and deleted=0 then 'U'
    when operation$='UN' and deleted=1 then 'D'
    when operation$='D' then 'D'
    else operation$
  end as operation,
  id as product_id,
  deleted,
  seller_id,
  category_id,
  dis_price,
  price,
  title,
  subtitle,
  productstatus,
  salestatus,
  cscn$ id,
  commit_timestamp$ create_date
from sch_dmall.product_cdc
where operation$ in ('D','I','UN');



References;
http://docs.oracle.com/cd/E11882_01/server.112/e25554/cdc.htm#DWHSG016