Migration Oracle → PostgreSQL
Create a new default schema in PostgreSQL, customize it and pull the data directly from Oracle without intermediate export files by using oracle_fdw (foreign data wrapper).
Current state of this migration instructions
The solution shown in this article works as expected but is very slow. This happens because the asset_feature table, one of the largest tables in the database has a CLOB column. The OCI library does not allow PostgreSQL to prefetch the rows. For example, a pull of a 40 Million rows table can take three hours. There are different workarounds that are described in our internal PostgreSQL mailing list.
Install PostgreSQL
For more information, see (2023.1) PostgreSQL installation.
Setup and test Oracle Instant Client
oracle_fdw is a PostgreSQL extension that needs Oracle libraries. There are two options to get these libraries:
- Download the Oracle libraries from the Oracle website.
- Download the Oraclelibraries from the internal censhare FileServer.
There is a third option to use the libraries from Oracle DB itself ($ORACLE_HOME). This is not described here.
Option 1: Download directly from Oracle
Select your OS version here. Then download and unzip the following files in ~postgres/. The files should create their own subdirectory. Do not unzip files on Mac because this creates different directories. All files from every zip file should be in the same directory. There are three zip files:
instantclient-basic-linux.x64-12.1.0.2.0.zip - Instant Client Package - Basic
instantclient-sqlplus-linux.x64-12.1.0.2.0.zip - Instant Client Package - SQL*Plus
instantclient-sdk-linux.x64-12.1.0.2.0.zip - Instant Client Package - SDK
Option 2: Copy the package from the internal censhare FileServer
Execute the following command:
rsync -avx /Volumes/Files/censhare/censhare-3rdParty/sqlplus/instantclient_12_1_linux/ postgres@PG-host:instantclient_12_1/
Setup and test connectivity to Oracle
Modify and add the following variables in ~postgres/.pgsql_profile and ~postgres/.bash_profile:
export ORACLE_SID=corpus
export ORACLE_HOME=/var/lib/pgsql/instantclient_12_1
export PATH=/usr/pgsql-9.6/bin:$ORACLE_HOME:$PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME
Test connectivity to Oracle:
postgres@pg-host $ sqlplus system/password@oracle-host:1521/corpus
# if get an error message for a missing library, install it like this:
# yum install libaio.x86_64
Modify the environment inside the PostgreSQL startup script:
# as root, search for PGDATA in that file and add the Oracle variables under the [Service] section of the file.
vi /usr/lib/systemd/system/postgresql-9.6.service
Environment=ORACLE_SID=corpus
Environment=ORACLE_HOME=/var/lib/pgsql/instantclient_12_1
Environment=LD_LIBRARY_PATH=/var/lib/pgsql/instantclient_12_1
systemctl daemon-reload
systemctl restart postgresql-9.6
Stop the censhare process and prepare the Oracle database
sqlplus corpus/password
-- truncate table messages;
-- consider this optional step
create table seqtmp(seqname varchar(100), lastnum integer, incr integer);
insert into seqtmp select sequence_name,last_number,increment_by from user_sequences; commit;
-- optional: truncate table corpus.message;
Setup oracle_fdw and connect PostgreSQL to Oracle
Follow the basic installation instructions for oracle_fdw in the Foreign Data Wrapper for Oracle repository.
Install the required libraries:
# as root, install needed libraries
yum install postgresql96-devel.x86_64
chown -R postgres:postgres /usr/pgsql*
# as postgres make soft link first
cd /var/lib/pgsql/instantclient_12_1
ln -s libclntsh.so.12.1 libclntsh.so
wget https://github.com/laurenz/oracle_fdw/archive/ORACLE_FDW_2_4_0.tar.gz
tar -xvzf ORACLE_FDW_2_4_0.tar.gz && rm ORACLE_FDW_2_4_0.tar.gz
cd oracle_fdw-ORACLE_FDW_2_4_0
make
make install
Sometimes, the following error occurs during the make install command:
-bash-4.2$ make install
/usr/bin/mkdir -p '/usr/pgsql-9.6/lib'
/usr/bin/mkdir -p '/usr/pgsql-9.6/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-9.6/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-9.6/doc/extension'
/usr/bin/install -c -m 755 oracle_fdw.so '/usr/pgsql9.6/lib/oracle_fdw.so'
/usr/bin/install: cannot create regular file ‘/usr/pgsql-9.6/lib/oracle_fdw.so’: Permission deniedmake: *** [install-lib-shared] Error 1
To resolve the error, do the following:
-bash-4.2$ pg_config --pgxs
/usr/pgsql-9.6/lib/pgxs/src/makefiles/pgxs.mk
Make sure that your pg_config path is in $PATH. To ensure this, use $PATH with the make install command, or change the permission of the pgsql-9.6 folder from "root" user to "postgres:postgres":
[root@censhare-db usr]# chown -R postgres:postgres /usr/pgsql*
Now, the make install command should run smoothly and show the following result:
Switch to "postgres user"
-bash-4.2$ cd oracle_fdw-ORACLE_FDW_2_0_0/
-bash-4.2$ make install
/usr/bin/mkdir -p '/usr/pgsql-10/lib'
/usr/bin/mkdir -p '/usr/pgsql-10/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-10/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-10/doc/extension'
/usr/bin/install -c -m 755 oracle_fdw.so '/usr/pgsql-10/lib/oracle_fdw.so'
/usr/bin/install -c -m 644 .//oracle_fdw.control '/usr/pgsql-10/share/extension/'
/usr/bin/install -c -m 644 .//oracle_fdw--1.1.sql .//oracle_fdw--1.0--1.1.sql '/usr/pgsql-10/share/extension/'
/usr/bin/install -c -m 644 .//README.oracle_fdw '/usr/pgsql-10/doc/extension/'
-bash-4.2$
The following command connects the PostgreSQL database to the Oracle database, and imports the foreign schema definitions:
psql -d postgres
\i postgresql-create-user.sql
-- this script can be found under css/database/ on the app server and usualy have these two lines + few type cast definitions
-- CREATE USER corpus PASSWORD 'corpus';
-- CREATE DATABASE corpus OWNER corpus TEMPLATE template0 ENCODING 'UTF8';
\c corpus postgres
CREATE EXTENSION oracle_fdw;
CREATE SERVER oradb FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver '//IP.OF.ORACLE.SERVER:1521/corpus.censhare');
GRANT USAGE ON FOREIGN SERVER oradb TO corpus;
\c corpus corpus
CREATE USER MAPPING FOR corpus SERVER oradb OPTIONS (user 'corpus', password 'password');
CREATE SCHEMA corpus;
IMPORT FOREIGN SCHEMA "CORPUS" FROM SERVER oradb INTO CORPUS;
-- in case that you would like to DROP and recreate it:
-- psql -d postgres
-- drop database corpus;
-- drop user corpus;
Create the new default PostgreSQL schema
In the censhare release, PostgreSQL and Oracle have different schemas (data types, constraints, index options, stored procedures, etc.). After the data migration, PostgreSQL should have the same schema as the release and additional customer-specific columns (language, db-schema.patch, etc.). For that reason, we create here an empty default PostgreSQL schema that is taken from the release for that specific censhare Server version.
# First copy the schema definition file from this specific censhare server version.
you@yourMac$ scp corpus@cs-srver:css/database/postgresql-create-db.sql /tmp/
you@yourMac$ scp /tmp/postgresql-create-db.sql postgres@pg-host:
## Then separate the TABLES structure from everything else
perl -ne 'print if /^CREATE TABLE/../^\);/' postgresql-create-db.sql > postgresql-create-db-TABLES.sql
perl -ne 'print if not /^CREATE TABLE/../^\);/' postgresql-create-db.sql > postgresql-create-db-REST.sql
## Finally import only the TABLES
postgres@pg-host $ psql -d corpus -U corpus
set search_path='public';
\i postgresql-create-db-TABLES.sql
Search for custom views:
SQL> select view_name from user_views;
Now, check for tables in Oracle that are not defined in db-schema.patch. You can create them like this:
psql -d corpus -U corpus
set search_path='public';
select 'create table '|| table_name ||' (like corpus.'||table_name||');' from (
SELECT table_name FROM information_schema.tables WHERE table_schema='corpus' and table_name not in ('plan_table', 'plan_view','timetrackerinterface')
EXCEPT
SELECT table_name FROM information_schema.tables WHERE table_schema='public'
) as t1;
... copy and paste the output with 10-20 lines at a time.
Create custom language COLUMNS in PostgreSQL
This step checks for columns in Oracle that do not exist in PostgreSQL. We usually see here the custom language columns but it could be everything. So, it is not a solution to execute postgresql-add-missing-languages.sql.
Check for non-existing columns:
psql -d corpus -U corpus
set search_path='public';
create temporary table columnstmp as (
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='corpus' and table_name not in ('plan_table', 'plan_view', 'timetrackerinterface')
EXCEPT
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='public' order by 1,2
);
DO $$
DECLARE
c record;
crow record;
ctype varchar(50);
BEGIN
FOR c IN SELECT * from columnstmp
LOOP
SELECT *
INTO crow
FROM information_schema.columns WHERE table_schema='corpus' and table_name=c.table_name and column_name=c.column_name;
IF crow.data_type = 'character varying' THEN ctype := crow.data_type || '(' || crow.character_maximum_length || ') ';
ELSE ctype := crow.data_type;
END IF;
-- RAISE NOTICE '% % %', c.table_name, c.column_name, ctype ;
EXECUTE 'ALTER TABLE ' || c.table_name || ' ADD COLUMN "' || c.column_name || '" ' || ctype;
END LOOP;
END$$;
Here is a small exception for versions <5.7 because current_timestamp is a reserved word in postgresql:
# only for <= v5.7
set search_path='public';
alter table stored_transformation drop column current_timestamp;
Final check to see if there is any difference in the schemas:
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='public' and column_name not in ('rowid')
EXCEPT
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='corpus' order by 1,2;
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='corpus' and table_name not in ('plan_table','plan_view', 'timetrackerinterface')
EXCEPT
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema='public' order by 1,2;
Let PostgreSQL pull the data from Oracle
For large databases, you should extend the undo retention in Oracle. Otherwise, your import might fail with the error message ORA-01555.
oracle@oracle-host $ sqlplus / as sysdba
SQL> alter system set undo_retention=30000;
Changing the prefetch size from default 200 to the maximum of 10240 reduces the import time. Note that some of the largest tables, such as asset_feature, have CLOB columns. For these, OCI does not allow prefetching at all.
DO $$
DECLARE
c record;
BEGIN
FOR c IN SELECT table_name FROM information_schema.tables WHERE table_schema='public'
LOOP
EXECUTE 'alter foreign table corpus.' || c.table_name || ' OPTIONS (prefetch ''10240'');';
END LOOP;
END$$;
Shutdown censhare Server before executing the following step. Start the process in a screen session because it is very very slow:
psql -d corpus -U corpus
CREATE OR REPLACE FUNCTION get_columns(varchar) RETURNS varchar AS $$
select string_agg('"'||column_name||'"',',') FROM information_schema.columns
WHERE table_name = $1 and table_schema='public' AND column_name NOT IN('rowid');
$$ LANGUAGE SQL;
set search_path='public','corpus';
create temp table columns_list as
select table_name, get_columns(table_name) FROM information_schema.tables WHERE table_schema='public';
DO $$
DECLARE
c record;
BEGIN
FOR c IN SELECT * FROM columns_list order by table_name desc
LOOP
RAISE NOTICE 'BEGIN processing table %', c.table_name;
EXECUTE 'insert into '|| c.table_name ||' (' || c.get_columns || ') select '|| c.get_columns || ' FROM corpus.'|| c.table_name;
EXECUTE 'analyze '|| c.table_name; RAISE NOTICE 'END processing table %', c.table_name;
END LOOP;
END$$;
Here is a workaround for the case that Oracle has null characters (U+0000) in some of its text data rows. You execute it in a separate terminal session and then repeat the DO block above.
# If PostgreSQL returns this error for some column ERROR:
# invalid byte sequence for encoding "UTF8": 0x00 try this with sqlplus:
SQL> SELECT string2 FROM asset_info WHERE instr(string2, unistr('\0000')) > 0;
SQL> UPDATE asset_info SET string2 = replace(string2, unistr('\0000')) WHERE instr(string2, unistr('\0000')) > 0;
SQL> SELECT title FROM message WHERE instr(title, unistr('\0000')) > 0;
SQL> UPDATE message SET title = replace(title, unistr('\0000')) WHERE instr(title, unistr('\0000')) > 0;
SQL> COMMIT;
Create default constraints, indexes etc. in PostgreSQL
Few steps above, we have only created the default tables that are required to import the data. Now, we muist create all other objects in PostgreSQL:
postgres@pg-host $ psql -d corpus -U corpus
set search_path='public';
\i postgresql-create-db-REST.sql
Set the sequences values as they are in Oracle:
set search_path='public';
DO $$
DECLARE
c record;
BEGIN
-- create custom sequences
FOR c IN SELECT * FROM corpus.seqtmp where seqname not in (SELECT upper(relname) FROM pg_class
WHERE relkind = 'S')
LOOP
EXECUTE 'CREATE SEQUENCE ' || c.seqname || ' START ' || c.lastnum || ' INCREMENT ' || c.incr;
END LOOP;
-- set the values
FOR c IN SELECT * from corpus.seqtmp
LOOP
EXECUTE 'SELECT setval('''|| c.seqname ||''',' || c.lastnum || ')';
END LOOP;
END$$;
Revert back important changes that you made
Revert back the undo_retention in Oracle
oracle@oracle-host $ sqlplus / as sysdba
SQL> alter system set undo_retention=3600;
Drop temporary table "seqtmp" which we have created in Oracle. Connect to sqlplus and drop it:
drop table corpus.seqtmp;
Rename back this column in PostgreSQL and drop the oracle mapping:
# only for versions <=5.7
postgres@postgresql-host $ psql -d corpus -U corpus
alter table stored_transformation add column curr_timestamp TIMESTAMP(3);
drop function get_columns(varchar);
# Additional command, if we have to rename the column "current_timestamp"
alter table stored_transformation rename column "current_timestamp" to curr_timestamp;
psql -d corpus -U corpus
DROP USER MAPPING FOR corpus SERVER oradb;
\c corpus postgres
DROP SERVER oradb cascade;
DROP schema corpus;
Troubleshooting
Non-UTF8 compliant special characters
Error pattern:
NOTICE: 2023-05-18 05:25:43.227575+00 BEGIN processing table asset
ERROR: invalid byte sequence for encoding "UTF8": 0xfc
CONTEXT: converting column "annotation" for foreign table scan of "asset", row 1577357
SQL statement "insert into asset ("id","version","currversion","content_version","id_extern","id_org","name","type","application","state","deletion","checked_out_by","modified_by","wf_target",
...
,"first_actual_paging","first_target_paging","first_paging","usn","tcn" FROM corpus.asset"
PL/pgSQL function inline_code_block line 8 at EXECUTE
Cause:
It might be that the client has emojis or non-UTF8 compliant special characters in the Oracle database (source database).
Solution:
Check for non-UTF8 compliant special characters in the ASSET and ASSET_FEATURE tables:
sqlplus / as sysdba
In the asset table:
SQL> SELECT * FROM ASSET WHERE CONVERT(ANNOTATION, 'UTF8') != ANNOTATION;
In the asset_feature table:
SQL> SELECT * FROM ASSET_FEATURE WHERE CONVERT(VALUE_STRING, 'UTF8') != VALUE_STRING
When connecting to the source database using DBeaver or Commandline Tool, you can find the following situation: There are asset IDs which contain the special characters in VALUE_STRING of ASSET_FEATURE or just in ANNOTATION of ASSET.
Remedy:
- Manual cleaning of the respective lines in the database (program DBeaver for example). Mostly emojis or really non-compliant umlauts cause the error.
- Directly edit VALUE_STRING of ASSET_FEATURE or ANNOTATION of ASSET: Double click to start.
- Afterwards "Save" or commit the made changes.
- Then, execute the SELECT commands again. There must be an empty result only. Now, the DB is clean and you can migrate.