Create a new default schema in PostgreSQL, customize it and pool the data directly from Oracle without intermediate export files by using oracle_fdw (foreign data wrapper).


Current state of this solution

Works as expected, but very slow because one of the biggest tables asset_feature have a CLOB column and OCI library doesn't allow PostgreSQL to prefetch the rows. For example, a pull of a 40 Million rows table takes 3 hours. There are different workarounds that are described in our internal PostgreSQL mailing list.

Install PostgreSQL

For more information, see PostgreSQL installation.

Setup and test Oracle Instant Client

oracle_fdw is a PostgreSQL extention that need Oracle libraries. There are two optoins for you to take these libraries, download them form Oracle Website or get them from our FileServer. There is a third option to use the libraries from the Oracle DB itself ($ORACLE_HOME), but we shall not describe it here.

Option 1: Download directly from Oracle

Select your OS version here. Then download and unzip the following files in ~postgres/ (the files shall create it's own subdirectory). Don't unzip them on Mac, because you shall get different directories. All files from every zip should be in the same directory.

  • instantclient-basic-linux.x64-12.1.0.2.0.zip - Instant Client Package - Basic

  • instantclient-sqlplus-linux.x64-12.1.0.2.0.zip - Instant Client Package - SQL*Plus

  • instantclient-sdk-linux.x64-12.1.0.2.0.zip - Instant Client Package - SDK

Option 2: Copy the package from our internal FileServer

Execute the following command:

rsync -avx /Volumes/Files/censhare/censhare-3rdParty/sqlplus/instantclient_12_1_linux/ postgres@PG-host:instantclient_12_1/
CODE

Setup and test connectivity to Oracle

Modify and add the following variables in ~postgres/.pgsql_profile and ~postgres/.bash_profile:

export ORACLE_SID=corpus
export ORACLE_HOME=/var/lib/pgsql/instantclient_12_1
export PATH=/usr/pgsql-9.6/bin:$ORACLE_HOME:$PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME
CODE

Test connectivity to Oracle:

postgres@pg-host $ sqlplus system/password@oracle-host:1521/corpus

# if get an error mesage for a missing library, install it like this:
# yum install libaio.x86_64
CODE

Modify environment inside the PostgreSQL startup script:

# as root, search for PGDATA in that file and add the Oracle variables under the [Service] section of the file.
vi /usr/lib/systemd/system/postgresql-9.6.service

Environment=ORACLE_SID=corpus
Environment=ORACLE_HOME=/var/lib/pgsql/instantclient_12_1
Environment=LD_LIBRARY_PATH=/var/lib/pgsql/instantclient_12_1

systemctl daemon-reload
systemctl restart postgresql-9.6
CODE

Stop censhare process and prepare the Oracle database

sqlplus corpus/passwrod
-- truncate table messages;
-- consider this optional step
create table seqtmp(seqname varchar(100), lastnum integer, incr integer);
insert into seqtmp select sequence_name,last_number,increment_by from user_sequences; commit;


-- optional: truncate table corpus.message;
CODE

Setup oracle_fdw and connect PostgreSQL to Oracle

Follow the basic installation instructions

# as root, install needed libraries
yum install postgresql96-devel.x86_64
chown -R postgres:postgres /usr/pgsql*

# as postgres make soft link first
cd /var/lib/pgsql/instantclient_12_1
ln -s libclntsh.so.12.1 libclntsh.so
wget https://github.com/laurenz/oracle_fdw/archive/ORACLE_FDW_2_1_0.tar.gz
tar -xvzf ORACLE_FDW_2_1_0.tar.gz && rm ORACLE_FDW_2_0_0.tar.gz
cd oracle_fdw-ORACLE_FDW_2_1_0
make
make install
CODE

Sometimes, the follwing error occurs during the make install command: 

-bash-4.2$ make install
/usr/bin/mkdir -p '/usr/pgsql-9.6/lib'
/usr/bin/mkdir -p '/usr/pgsql-9.6/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-9.6/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-9.6/doc/extension'
/usr/bin/install -c -m 755 oracle_fdw.so '/usr/pgsql9.6/lib/oracle_fdw.so'
/usr/bin/install: cannot create regular file ‘/usr/pgsql-9.6/lib/oracle_fdw.so’: Permission deniedmake: *** [install-lib-shared] Error 1
CODE

To resolve the error, do the following: 

-bash-4.2$ pg_config --pgxs /usr/pgsql-9.6/lib/pgxs/src/makefiles/pgxs.mk
CODE

Make sure that your pg_config path is in $PATHTo do this, use $PATH with the make install command, or change the permission of the pgsql-9.6 folder from "root" user to "postgres:postgres": 

[root@censhare-db usr]# chown -R postgres:postgres /usr/pgsql*
CODE

Now the make install command should be run smoothly and show the following result: 

Switch to "postgres user"

-bash-4.2$ cd oracle_fdw-ORACLE_FDW_2_0_0/
-bash-4.2$ make install /usr/bin/mkdir -p '/usr/pgsql-10/lib'
/usr/bin/mkdir -p '/usr/pgsql-10/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-10/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-10/doc/extension'
/usr/bin/install -c -m 755 oracle_fdw.so '/usr/pgsql-10/lib/oracle_fdw.so'
/usr/bin/install -c -m 644 .//oracle_fdw.control '/usr/pgsql-10/share/extension/'
/usr/bin/install -c -m 644 .//oracle_fdw--1.1.sql .//oracle_fdw--1.0--1.1.sql '/usr/pgsql-10/share/extension/'
/usr/bin/install -c -m 644 .//README.oracle_fdw '/usr/pgsql-10/doc/extension/'
-bash-4.2$
CODE

The following command connects the PostgreSQL daatbase to the Oracle database, and imports the foreign schema definitions:

psql -d postgres
\i postgresql-create-user.sql
-- this script can be found under css/database/ on the app server and usualy have these two lines + few type cast definitinos
-- CREATE USER corpus PASSWORD 'corpus';
-- CREATE DATABASE corpus OWNER corpus TEMPLATE template0 ENCODING 'UTF8';
\c corpus postgres
CREATE EXTENSION oracle_fdw;
CREATE SERVER oradb FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver '//IP.OF.ORACLE.SERVER:1521/corpus.censhare');
GRANT USAGE ON FOREIGN SERVER oradb TO corpus;
\c corpus corpus
CREATE USER MAPPING FOR corpus SERVER oradb OPTIONS (user 'corpus', password 'password');
CREATE SCHEMA corpus;
IMPORT FOREIGN SCHEMA "CORPUS" FROM SERVER oradb INTO CORPUS;

-- in case that you would like to DROP and recreate it:
-- psql -d postgres
-- drop database corpus;
-- drop user corpus;
CODE

Create the new default PostgreSQL schema

In our software release, PostgreSQL and Oracle have different schemas (data types, constraints, index options, stored procedures etc.). After the data migration, PostgreSQL should have the same schema as our release plus customer specific columns (language, db-schema.patch etc.). For that reason, we create here an empty default PostgreSQL schema taken from the release for that specific censhare server version. 

$stringEscapeUtils.escapeHtml($body)
CODE


Check Oracle for custom TABLES and create them in PostgreSQL

During our tests, we found that PostgreSQL sees Oracle views as tables, so you have to check for custom views with connect to corpus schema with sqlplus and exclude them from all scripts below.

SQL> select view_name from user_views;
CODE

Now check for tables in Oracle that are not defined in db-schema.patch. You can create them like this: 

psql -d corpus -U corpus
set search_path='public'; 
select 'create table '|| table_name ||' (like corpus.'||table_name||');' from (
SELECT table_name FROM information_schema.tables WHERE table_schema='corpus' and table_name not in ('plan_table', 'plan_view','timetrackerinterface')
EXCEPT
SELECT table_name FROM information_schema.tables WHERE table_schema='public'
) as t1;
... copy and paste the output with 10-20 lines at a time.
CODE

Create custom language COLUMNS in PostgreSQL

This step checks for columns in Oracle that don't exist in PostgreSQL. We usually see here the custom language columns, but it could be everything, so executing postgresql-add-missing-languages.sql is not a solution:

psql -d corpus -U corpus

set search_path='public';

create temporary table columnstmp as (
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='corpus' and table_name not in ('plan_table', 'plan_view', 'timetrackerinterface')
EXCEPT
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='public' order by 1,2
);

DO $$
DECLARE
c record;
crow record;
ctype varchar(50);

BEGIN
	FOR c IN SELECT * from columnstmp
	LOOP
		SELECT *
		INTO crow
		FROM information_schema.columns WHERE table_schema='corpus' and table_name=c.table_name and 	column_name=c.column_name;

		IF crow.data_type = 'character varying' THEN ctype := crow.data_type || '(' || 	crow.character_maximum_length || ') ';
		ELSE ctype := crow.data_type;
		END IF;
		-- RAISE NOTICE '% % %', c.table_name, c.column_name, ctype ;
		EXECUTE 'ALTER TABLE ' || c.table_name || ' ADD COLUMN "' || c.column_name || '" ' || ctype;
	END LOOP;
END$$;
CODE

Here is a small exception for versions <5.7 because current_timestamp is a reserved word in postgresql:

# only for <= v5.7
set search_path='public';
alter table stored_transformation drop column current_timestamp;
CODE

Final check to see if there is any difference in the schemas:

SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='public' and column_name not in ('rowid')
EXCEPT
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='corpus' order by 1,2;
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='corpus' and table_name not in ('plan_table','plan_view', 'timetrackerinterface')
EXCEPT
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='public' order by 1,2;
CODE

Let PostgreSQL pool the data from Oracle

For big databasese, you should extend the undo retention in Oracle, otherwise your import shall fail with error message ORA-01555:

oracle@oracle-host $ sqlplus / as sysdba
SQL> alter system set undo_retention=30000;
CODE

Changing the prefetch size from default 200 to the maximum of 10240 will reduce the import time. Please note that some of the biggest tables, like asset_feature, have CLOB columns for which OCI doesn't allow prefetching at all.

DO $$
DECLARE
c record;
BEGIN
	FOR c IN SELECT table_name FROM information_schema.tables WHERE table_schema='public'
	LOOP
		EXECUTE 'alter foreign table corpus.' || c.table_name || ' OPTIONS (prefetch ''10240'');';
	END LOOP;
END$$;
CODE

Shutdown censhare before executing that step and start the process in a screen session, because it's very very slow:

psql -d corpus -U corpus

CREATE OR REPLACE FUNCTION get_columns(varchar) RETURNS varchar AS $$
select string_agg('"'||column_name||'"',',') FROM information_schema.columns
WHERE table_name = $1 and table_schema='public' AND column_name NOT IN('rowid');
$$ LANGUAGE SQL;

set search_path='public','corpus';

create temp table columns_list as
select table_name, get_columns(table_name) FROM information_schema.tables WHERE table_schema='public';

DO $$
DECLARE
c record;
BEGIN
	FOR c IN SELECT * FROM columns_list order by table_name desc
	LOOP
		RAISE NOTICE 'BEGIN processing table %', c.table_name;
			EXECUTE 'insert into '|| c.table_name ||' (' || c.get_columns || ') select '|| 	c.get_columns || ' FROM corpus.'|| c.table_name;
		EXECUTE 'analyze '|| c.table_name; RAISE NOTICE 'END processing table %', c.table_name;
	END LOOP;
END$$;
CODE

Here is a workaround for the case when Oracle have null characters (U+0000) in some of it's text data rows. You execute it in a separate terminal session and then repeat the DO block above.

# If PostgreSQL returns this error for some column ERROR:
# invalid byte sequence for encoding "UTF8": 0x00 try this with sqlplus:

SQL> SELECT string2 FROM asset_info WHERE instr(string2, unistr('\0000')) > 0;
SQL> UPDATE asset_info SET string2 = replace(string2, unistr('\0000')) WHERE instr(string2, unistr('\0000')) > 0;

SQL> SELECT title FROM message WHERE instr(title, unistr('\0000')) > 0;
SQL> UPDATE message SET title = replace(title, unistr('\0000')) WHERE instr(title, unistr('\0000')) > 0;
SQL> COMMIT;
CODE

Create default constraints, indexes etc. in PostgreSQL

Few steps above, we created only the default tables needed to import the data. Now we have to create all other objects in PostgreSQL:

postgres@pg-host $ psql -d corpus -U corpus
set search_path='public';
\i postgresql-create-db-REST.sql
CODE

Set the sequences values as they are in Oracle:

set search_path='public';

DO $$
DECLARE
c record;
BEGIN
 -- create custom sequences
    FOR c IN SELECT * FROM corpus.seqtmp where seqname not in (SELECT upper(relname) FROM pg_class    
    WHERE relkind = 'S')
    LOOP
     EXECUTE 'CREATE SEQUENCE ' || c.seqname || ' START ' || c.lastnum || ' INCREMENT ' || c.incr;
    END LOOP;
 -- set the values
    FOR c IN SELECT * from corpus.seqtmp
    LOOP
     EXECUTE 'SELECT setval('''|| c.seqname ||''',' || c.lastnum || ')';
    END LOOP;
END$$;
CODE

Revert back important changes that you made

Revert back the undo_retention in Oracle

oracle@oracle-host $ sqlplus / as sysdba
SQL> alter system set undo_retention=3600;
CODE

Drop temporary table "seqtmp" which we have created in Oracle. Connect to sqlplus & drop it.

drop table corpus.seqtmp;
CODE

Rename back this column in PostgreSQL and drop the oracle mapping.

# only for versions <=5.7
postgres@postgresql-host $ psql -d corpus -U corpus
alter table stored_transformation add column curr_timestamp TIMESTAMP(3);
drop function get_columns(varchar);

# Additional command, if we have to rename the column "current_timestamp"
alter table stored_transformation rename column "current_timestamp" to curr_timestamp;
psql -d corpus -U corpus
DROP USER MAPPING FOR corpus SERVER oradb;
\c corpus postgres
DROP SERVER oradb cascade;
DROP schema corpus;
CODE