Migration Oracle → PostgreSQL

Create a new default schema in PostgreSQL, customize it and pull the data directly from Oracle without intermediate export files by using oracle_fdw (foreign data wrapper).

Current state of this migration instructions

The solution shown in this article works as expected but is very slow. This happens because the asset_feature table, one of the largest tables in the database has a CLOB column. The OCI library does not allow PostgreSQL to prefetch the rows. For example, a pull of a 40 Million rows table can take three hours. There are different workarounds that are described in our internal PostgreSQL mailing list.

Install PostgreSQL

For more information, see (2023.1) PostgreSQL installation.

Setup and test Oracle Instant Client

oracle_fdw is a PostgreSQL extension that needs Oracle libraries. There are two options to get these libraries:

Download the Oracle libraries from the Oracle website.
Download the Oraclelibraries from the internal censhare FileServer.

There is a third option to use the libraries from Oracle DB itself ($ORACLE_HOME). This is not described here.

Option 1: Download directly from Oracle

Select your OS version here. Then download and unzip the following files in ~postgres/. The files should create their own subdirectory. Do not unzip files on Mac because this creates different directories. All files from every zip file should be in the same directory. There are three zip files:

instantclient-basic-linux.x64-12.1.0.2.0.zip - Instant Client Package - Basic
instantclient-sqlplus-linux.x64-12.1.0.2.0.zip - Instant Client Package - SQL*Plus
instantclient-sdk-linux.x64-12.1.0.2.0.zip - Instant Client Package - SDK

Option 2: Copy the package from the internal censhare FileServer

Execute the following command:

CODE

rsync -avx /Volumes/Files/censhare/censhare-3rdParty/sqlplus/instantclient_12_1_linux/ postgres@PG-host:instantclient_12_1/

Setup and test connectivity to Oracle

Modify and add the following variables in ~postgres/.pgsql_profile and ~postgres/.bash_profile:

CODE

export ORACLE_SID=corpus
export ORACLE_HOME=/var/lib/pgsql/instantclient_12_1
export PATH=/usr/pgsql-9.6/bin:$ORACLE_HOME:$PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME

Test connectivity to Oracle:

CODE

postgres@pg-host $ sqlplus system/password@oracle-host:1521/corpus

# if get an error message for a missing library, install it like this:
# yum install libaio.x86_64

Modify the environment inside the PostgreSQL startup script:

CODE

# as root, search for PGDATA in that file and add the Oracle variables under the [Service] section of the file.
vi /usr/lib/systemd/system/postgresql-9.6.service

Environment=ORACLE_SID=corpus
Environment=ORACLE_HOME=/var/lib/pgsql/instantclient_12_1
Environment=LD_LIBRARY_PATH=/var/lib/pgsql/instantclient_12_1

systemctl daemon-reload
systemctl restart postgresql-9.6

Stop the censhare process and prepare the Oracle database

CODE

sqlplus corpus/password
-- truncate table messages;
-- consider this optional step
create table seqtmp(seqname varchar(100), lastnum integer, incr integer);
insert into seqtmp select sequence_name,last_number,increment_by from user_sequences; commit; 

-- optional: truncate table corpus.message;

Setup oracle_fdw and connect PostgreSQL to Oracle

Follow the basic installation instructions for oracle_fdw in the Foreign Data Wrapper for Oracle repository.

Install the required libraries:

CODE

# as root, install needed libraries
yum install postgresql96-devel.x86_64
chown -R postgres:postgres /usr/pgsql*

# as postgres make soft link first
cd /var/lib/pgsql/instantclient_12_1
ln -s libclntsh.so.12.1 libclntsh.so
wget https://github.com/laurenz/oracle_fdw/archive/ORACLE_FDW_2_4_0.tar.gz
tar -xvzf ORACLE_FDW_2_4_0.tar.gz && rm ORACLE_FDW_2_4_0.tar.gz
cd oracle_fdw-ORACLE_FDW_2_4_0
make
make install

Sometimes, the following error occurs during the make install command:

CODE

-bash-4.2$ make install
/usr/bin/mkdir -p '/usr/pgsql-9.6/lib'
/usr/bin/mkdir -p '/usr/pgsql-9.6/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-9.6/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-9.6/doc/extension'
/usr/bin/install -c -m 755 oracle_fdw.so '/usr/pgsql9.6/lib/oracle_fdw.so'
/usr/bin/install: cannot create regular file ‘/usr/pgsql-9.6/lib/oracle_fdw.so’: Permission deniedmake: *** [install-lib-shared] Error 1

To resolve the error, do the following:

CODE

-bash-4.2$ pg_config --pgxs
/usr/pgsql-9.6/lib/pgxs/src/makefiles/pgxs.mk

Make sure that your pg_config path is in $PATH. To ensure this, use $PATH with the make install command, or change the permission of the pgsql-9.6 folder from "root" user to "postgres:postgres":

CODE

[root@censhare-db usr]# chown -R postgres:postgres /usr/pgsql*

Now, the make install command should run smoothly and show the following result:

CODE

Switch to "postgres user"

-bash-4.2$ cd oracle_fdw-ORACLE_FDW_2_0_0/
-bash-4.2$ make install 
/usr/bin/mkdir -p '/usr/pgsql-10/lib'
/usr/bin/mkdir -p '/usr/pgsql-10/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-10/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-10/doc/extension'
/usr/bin/install -c -m 755 oracle_fdw.so '/usr/pgsql-10/lib/oracle_fdw.so'
/usr/bin/install -c -m 644 .//oracle_fdw.control '/usr/pgsql-10/share/extension/'
/usr/bin/install -c -m 644 .//oracle_fdw--1.1.sql .//oracle_fdw--1.0--1.1.sql '/usr/pgsql-10/share/extension/'
/usr/bin/install -c -m 644 .//README.oracle_fdw '/usr/pgsql-10/doc/extension/'
-bash-4.2$

The following command connects the PostgreSQL database to the Oracle database, and imports the foreign schema definitions:

CODE

psql -d postgres
\i postgresql-create-user.sql
-- this script can be found under css/database/ on the app server and usualy have these two lines + few type cast definitions
-- CREATE USER corpus PASSWORD 'corpus';
-- CREATE DATABASE corpus OWNER corpus TEMPLATE template0 ENCODING 'UTF8';
\c corpus postgres
CREATE EXTENSION oracle_fdw;
CREATE SERVER oradb FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver '//IP.OF.ORACLE.SERVER:1521/corpus.censhare');
GRANT USAGE ON FOREIGN SERVER oradb TO corpus;
\c corpus corpus
CREATE USER MAPPING FOR corpus SERVER oradb OPTIONS (user 'corpus', password 'password');
CREATE SCHEMA corpus;
IMPORT FOREIGN SCHEMA "CORPUS" FROM SERVER oradb INTO CORPUS;

-- in case that you would like to DROP and recreate it:
-- psql -d postgres
-- drop database corpus;
-- drop user corpus;

Create the new default PostgreSQL schema

In the censhare release, PostgreSQL and Oracle have different schemas (data types, constraints, index options, stored procedures, etc.). After the data migration, PostgreSQL should have the same schema as the release and additional customer-specific columns (language, db-schema.patch, etc.). For that reason, we create here an empty default PostgreSQL schema that is taken from the release for that specific censhare Server version.

CODE

# First copy the schema definition file from this specific censhare server version.
you@yourMac$ scp corpus@cs-srver:css/database/postgresql-create-db.sql /tmp/
you@yourMac$ scp /tmp/postgresql-create-db.sql postgres@pg-host:

## Then separate the TABLES structure from everything else
perl -ne 'print if /^CREATE TABLE/../^\);/' postgresql-create-db.sql > postgresql-create-db-TABLES.sql
perl -ne 'print if not /^CREATE TABLE/../^\);/' postgresql-create-db.sql > postgresql-create-db-REST.sql

## Finally import only the TABLES
postgres@pg-host $ psql -d corpus -U corpus
set search_path='public';
\i postgresql-create-db-TABLES.sql

Connect to corpus schema with sqlplus
Exclude the custom views from all scripts below.

Search for custom views:

CODE

SQL> select view_name from user_views;

Now, check for tables in Oracle that are not defined in db-schema.patch. You can create them like this:

CODE

psql -d corpus -U corpus
set search_path='public'; 
select 'create table '|| table_name ||' (like corpus.'||table_name||');' from (
SELECT table_name FROM information_schema.tables WHERE table_schema='corpus' and table_name not in ('plan_table', 'plan_view','timetrackerinterface')
EXCEPT
SELECT table_name FROM information_schema.tables WHERE table_schema='public'
) as t1;
... copy and paste the output with 10-20 lines at a time.

Create custom language COLUMNS in PostgreSQL

This step checks for columns in Oracle that do not exist in PostgreSQL. We usually see here the custom language columns but it could be everything. So, it is not a solution to execute postgresql-add-missing-languages.sql.

Check for non-existing columns:

CODE

psql -d corpus -U corpus

set search_path='public';

create temporary table columnstmp as (
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='corpus' and table_name not in ('plan_table', 'plan_view', 'timetrackerinterface')
EXCEPT
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='public' order by 1,2
);

DO $$
DECLARE
c record;
crow record;
ctype varchar(50);

BEGIN
	FOR c IN SELECT * from columnstmp
	LOOP
		SELECT *
		INTO crow
		FROM information_schema.columns WHERE table_schema='corpus' and table_name=c.table_name and 	column_name=c.column_name;

		IF crow.data_type = 'character varying' THEN ctype := crow.data_type || '(' || 	crow.character_maximum_length || ') ';
		ELSE ctype := crow.data_type;
		END IF;
		-- RAISE NOTICE '% % %', c.table_name, c.column_name, ctype ;
		EXECUTE 'ALTER TABLE ' || c.table_name || ' ADD COLUMN "' || c.column_name || '" ' || ctype;
	END LOOP;
END$$;

Here is a small exception for versions <5.7 because current_timestamp is a reserved word in postgresql:

CODE

# only for <= v5.7
set search_path='public';
alter table stored_transformation drop column current_timestamp;

Final check to see if there is any difference in the schemas:

CODE

SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='public' and column_name not in ('rowid')
EXCEPT
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='corpus' order by 1,2;
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='corpus' and table_name not in ('plan_table','plan_view', 'timetrackerinterface')
EXCEPT
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='public' order by 1,2;

Let PostgreSQL pull the data from Oracle

For large databases, you should extend the undo retention in Oracle. Otherwise, your import might fail with the error message ORA-01555.

CODE

oracle@oracle-host $ sqlplus / as sysdba
SQL> alter system set undo_retention=30000;

Changing the prefetch size from default 200 to the maximum of 10240 reduces the import time. Note that some of the largest tables, such as asset_feature, have CLOB columns. For these, OCI does not allow prefetching at all.

CODE

DO $$
DECLARE
c record;
BEGIN
	FOR c IN SELECT table_name FROM information_schema.tables WHERE table_schema='public'
	LOOP
		EXECUTE 'alter foreign table corpus.' || c.table_name || ' OPTIONS (prefetch ''10240'');';
	END LOOP;
END$$;

Shutdown censhare Server before executing the following step. Start the process in a screen session because it is very very slow:

CODE

psql -d corpus -U corpus

CREATE OR REPLACE FUNCTION get_columns(varchar) RETURNS varchar AS $$
select string_agg('"'||column_name||'"',',') FROM information_schema.columns
WHERE table_name = $1 and table_schema='public' AND column_name NOT IN('rowid');
$$ LANGUAGE SQL;

set search_path='public','corpus';

create temp table columns_list as
select table_name, get_columns(table_name) FROM information_schema.tables WHERE table_schema='public';

DO $$
DECLARE
c record;
BEGIN
	FOR c IN SELECT * FROM columns_list order by table_name desc
	LOOP
		RAISE NOTICE 'BEGIN processing table %', c.table_name;
			EXECUTE 'insert into '|| c.table_name ||' (' || c.get_columns || ') select '|| 	c.get_columns || ' FROM corpus.'|| c.table_name;
		EXECUTE 'analyze '|| c.table_name; RAISE NOTICE 'END processing table %', c.table_name;
	END LOOP;
END$$;

Here is a workaround for the case that Oracle has null characters (U+0000) in some of its text data rows. You execute it in a separate terminal session and then repeat the DO block above.

CODE

# If PostgreSQL returns this error for some column ERROR:
# invalid byte sequence for encoding "UTF8": 0x00 try this with sqlplus:

SQL> SELECT string2 FROM asset_info WHERE instr(string2, unistr('\0000')) > 0;
SQL> UPDATE asset_info SET string2 = replace(string2, unistr('\0000')) WHERE instr(string2, unistr('\0000')) > 0;

SQL> SELECT title FROM message WHERE instr(title, unistr('\0000')) > 0;
SQL> UPDATE message SET title = replace(title, unistr('\0000')) WHERE instr(title, unistr('\0000')) > 0;
SQL> COMMIT;

Create default constraints, indexes etc. in PostgreSQL

Few steps above, we have only created the default tables that are required to import the data. Now, we muist create all other objects in PostgreSQL:

CODE

postgres@pg-host $ psql -d corpus -U corpus
set search_path='public';
\i postgresql-create-db-REST.sql

Set the sequences values as they are in Oracle:

CODE

set search_path='public';

DO $$
DECLARE
c record;
BEGIN
 -- create custom sequences
    FOR c IN SELECT * FROM corpus.seqtmp where seqname not in (SELECT upper(relname) FROM pg_class    
    WHERE relkind = 'S')
    LOOP
     EXECUTE 'CREATE SEQUENCE ' || c.seqname || ' START ' || c.lastnum || ' INCREMENT ' || c.incr;
    END LOOP;
 -- set the values
    FOR c IN SELECT * from corpus.seqtmp
    LOOP
     EXECUTE 'SELECT setval('''|| c.seqname ||''',' || c.lastnum || ')';
    END LOOP;
END$$;

Revert back important changes that you made

Revert back the undo_retention in Oracle

CODE

oracle@oracle-host $ sqlplus / as sysdba
SQL> alter system set undo_retention=3600;

Drop temporary table "seqtmp" which we have created in Oracle. Connect to sqlplus and drop it:

CODE

drop table corpus.seqtmp;

Rename back this column in PostgreSQL and drop the oracle mapping:

CODE

# only for versions <=5.7
postgres@postgresql-host $ psql -d corpus -U corpus
alter table stored_transformation add column curr_timestamp TIMESTAMP(3);
drop function get_columns(varchar);

# Additional command, if we have to rename the column "current_timestamp"
alter table stored_transformation rename column "current_timestamp" to curr_timestamp;
psql -d corpus -U corpus
DROP USER MAPPING FOR corpus SERVER oradb;
\c corpus postgres
DROP SERVER oradb cascade;
DROP schema corpus;

Troubleshooting

Non-UTF8 compliant special characters

Error pattern:

CODE

NOTICE: 2023-05-18 05:25:43.227575+00 BEGIN processing table asset 
ERROR: invalid byte sequence for encoding "UTF8": 0xfc 
CONTEXT: converting column "annotation" for foreign table scan of "asset", row 1577357 
SQL statement "insert into asset ("id","version","currversion","content_version","id_extern","id_org","name","type","application","state","deletion","checked_out_by","modified_by","wf_target", 
... 
,"first_actual_paging","first_target_paging","first_paging","usn","tcn" FROM corpus.asset" 
PL/pgSQL function inline_code_block line 8 at EXECUTE

Cause:
It might be that the client has emojis or non-UTF8 compliant special characters in the Oracle database (source database).

Solution:
Check for non-UTF8 compliant special characters in the ASSET and ASSET_FEATURE tables:

CODE

sqlplus / as sysdba

In the asset table:

CODE

SQL> SELECT * FROM ASSET WHERE CONVERT(ANNOTATION, 'UTF8') != ANNOTATION;

In the asset_feature table:

CODE

SQL> SELECT * FROM ASSET_FEATURE WHERE CONVERT(VALUE_STRING, 'UTF8') != VALUE_STRING

When connecting to the source database using DBeaver or Commandline Tool, you can find the following situation: There are asset IDs which contain the special characters in VALUE_STRING of ASSET_FEATURE or just in ANNOTATION of ASSET.

Remedy:

Manual cleaning of the respective lines in the database (program DBeaver for example). Mostly emojis or really non-compliant umlauts cause the error.
Directly edit VALUE_STRING of ASSET_FEATURE or ANNOTATION of ASSET: Double click to start.
Afterwards "Save" or commit the made changes.
Then, execute the SELECT commands again. There must be an empty result only. Now, the DB is clean and you can migrate.