Menu Close

Resolving In-Doubt Transactions

Before starting, here is a PDF version of the document that I created years ago: Resolving In-Doubt Transactions

Distributed transactions perform DML on multiple databases which is a bit more complicated task because the database must coordinate the consitency in those seperate or even perhaps between different DBMSs (like Oracle – MS SQL). To ensure the transaction atomicity, Oracle implements a 2-phase commit mechanism through which the distributed transactions undergo some phases like prepare, commit, forget, etc. This phases constitute the hand-shake mechanism of the distributed transaciton.

However, sometimes things may go wrong (due to some network, system problem or even a reconfiguration of the underlying objects) and one of the phases fails while others are ok. Here, we say that the transaction becomes in-doubt. Normallly this problem should be handled by the RECO process itself, but in some cases this cannot br performed.

Why RECO cannot perform in some cases?

One of the databases involved in the distributed transaction might be unreachable (netowrk, system issues etc.) while the RECO was trying to resolve the problem (even when retrying to recover). (UNSTUCK)

The lookup tables of the “2-phase commit” mechanism might become inconsistent with the transaction itself. (STUCK)

1.HANDLING UNSTUCK TRANSACTIONS

Hopefully, there is no inconsistency between the lookup tables and the transaction and the following code resolves the problem.

To View the pending transactions:

SQL> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID, STATE, MIXED, COMMIT# FROM DBA_2PC_PENDING;
	
-- Returns: 96.22.163456 TESTDB.723a8559.96.22.163456	prepared	no	56759427464

‘96.22.163456’ id of the pending distributed transaction, which we will use in the following commands.

If the state is “prepared” the transaction can be forced to rollback or commit:

-- To Force Rollback
SQL> ROLLBACK FORCE  '96.22.163456'

-- To Force Commit
SQL> COMMIT FORCE  '96.22.163456'

Note 1: If the command hangs, go to the “Handling Stuck DBA_2PC_PENDING” section.
Note 2: If the state of the transaction is “collecting” and you execute the above command, you may see an error like:

ERROR at line 1:
ORA-02058: no prepared transaction found with ID 96.22.163456

Then, you have to run:

SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('96.22.163456');

2. HANDLING STUCK TRANSACTIONS

Our ultimate goal is not seeing the transaction in X$KTUXE table; and ensuring that the dictionary tables like PENDING_TRANS$ to be consistent with this information.

Stuck transactions can be examined under the below conditions:

2.1. Condition 1: DBA_2PC_PENDING view have entries about our transaction but there is no transaction in reality

The condition is that; when we issue select to the dictionary views like the DBA_2PC_PENDING, PENDING_TRANS$, etc. we see the transaction, but the transaction does not exist in X$KTUXE view.
The actual transaction entry view is X$KTUXE ([K]ernel [T]ransaction [U]ndo Transa[X]tion [E]ntry) where the columns correspond to the following sections of the transaction id:

KTUXEUSN.KTUXESLT.KTUXESQN = 96.22.163456   (The concat of KTUXEUSN, KTUXESLT and KTUXESQN gives us the transacion number)

KTUXEUSN=96
KTUXESLT=22
KTUXESQN=163456

Therefore, the condition1 holds when DBA_2PC_PENDING has the entry but X$KTUXE does not.

SQL> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID, STATE, MIXED, COMMIT# FROM DBA_2PC_PENDING;
	-- Returns: 96.22.163456 TESTDB.723a8559.96.22.163456	prepared	no	56759427464
SQL> SELECT * FROM X$KTUXE WHERE KTUXEUSN=96 AND KTUXESLT=22 AND KTUXESQN =163456;
	-- Returns: No Rows

Solution 1 to Condition 1: If the state of the transaction (in DBA_2PC_PENDING) is committed, rollback forced or commit forced then it can be cleaned by:

SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('96.22.163456'); 

Solution 2 to Condition 1: If the state of the transaction is prepared, we have to clean manually as follows:

SQL> DELETE FROM SYS.PENDING_TRANS$ WHERE LOCAL_TRAN_ID = '96.22.163456'; 
SQL> DELETE FROM SYS.PENDING_SESSIONS$ WHERE LOCAL_TRAN_ID ='96.22.163456' ; 
SQL> DELETE FROM SYS.PENDING_SUB_SESSIONS$ WHERE LOCAL_TRAN_ID = '96.22.163456';
SQL> COMMIT;

2.2. Condition 2: DBA_2PC_PENDING view does NOT have entries about our transaction but there IS A transaction.

This is something like a orphan transaction that the dictionary is not aware of.
Trying to force commit or rollback this transaction may result in error like below, since the dictionary is not aware:

SQL> ROLLBACK FORCE '96.22.163456'
-- ORA-02058: no prepared transaction found with ID 96.22.163456

Solution to Condition 2: What we need to do at this point is; recovering our transaction from being an orphan by inserting some dummy records into dictionay tables (so the views…) and then force a rollback or commit: You do not have to change the parameters in the insert command other than the transaction id.

SQL> ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;
	
SQL> INSERT INTO PENDING_TRANS$ (LOCAL_TRAN_ID, GLOBAL_TRAN_FMT, GLOBAL_ORACLE_ID, STATE, STATUS, SESSION_VECTOR, RECO_VECTOR, TYPE#, FAIL_TIME,RECO_TIME)
	VALUES
	(
		 '96.22.163456', 
		 299354,
		 'XXXXXXX.12345.1.2.3',
		 'prepared','P',
		 hextoraw( '00000001' ),
		 hextoraw( '00000000' ),
		 0, sysdate, sysdate
	);
	
SQL> INSERT INTO PENDING_SESSIONS$
	VALUES
	(
	 '96.22.163456', 
	 1, hextoraw('05004F003A1500000104'),
	 'C', 0, 30258592, '',
	 146
	);

SQL>  COMMIT;

Now, we should be able to rollback or commit.

SQL> ROLLBACK FORCE '96.22.163456' 
or
SQL> COMMIT FORCE '96.22.163456'

Lastly, we remove the dummy entry from the dictionary:

SQL> ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY;
	
SQL> ALTER SYSTEM SET "_smu_debug_mode" = 4;
	
SQL> COMMIT;
	
SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('96.22.163456');
	
SQL> ALTER SYSTEM SET "_smu_debug_mode" = 0;
	
SQL> COMMIT;

Check to see whether the transaction has gone:

SQL> SELECT * FROM X$KTUXE WHERE KTUXEUSN=96 AND KTUXESLT=22 AND KTUXESQN =163456;
	-- Returns: No Rows	

2.3. Condition 3: DBA_2PC_PENDING has entry and there is a transaction but COMMIT or ROLLBACK HANGS!

In the situation, where COMMIT FORCE or ROLLBACK FORCE hangs,
Trying to purge the transaction will give an error like:

SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('96.22.163456');
	-- ERROR at line 1:
	-- ORA-06510: PL/SQL: unhandled user-defined exception
	-- ORA-06512: at "SYS.DBMS_TRANSACTION", line 94
	-- ORA-06512: at line 1

Solution to Condition 3: The solution is the combination of Cond1 and Cond2:

First, delete the dictionary entries:

SQL> DELETE FROM SYS.PENDING_TRANS$ WHERE LOCAL_TRAN_ID = '96.22.163456'; 
	
SQL> DELETE FROM SYS.PENDING_SESSIONS$ WHERE LOCAL_TRAN_ID ='96.22.163456' ; 
	
SQL> DELETE FROM SYS.PENDING_SUB_SESSIONS$ WHERE LOCAL_TRAN_ID = '96.22.163456';
	
SQL> COMMIT;

Then, insert dummy record, force commit and finally purge the transaction:

SQL> ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;
	
SQL> INSERT INTO PENDING_TRANS$ (LOCAL_TRAN_ID, GLOBAL_TRAN_FMT, GLOBAL_ORACLE_ID,
STATE, STATUS, SESSION_VECTOR, RECO_VECTOR, TYPE#, FAIL_TIME,RECO_TIME)
	VALUES
	(
		 '96.22.163456',
		 306206,
		 'XXXXXXX.12345.1.2.3',
		 'prepared','P',
		 hextoraw( '00000001' ),
		 hextoraw( '00000000' ),
		 0, sysdate, sysdate
	);
	 
SQL> INSERT INTO PENDING_SESSIONS$
	VALUES
	(
		 '96.22.163456',
		 1, hextoraw('05004F003A1500000104'),
		 'C', 0, 30258592, '',
		 146
	);

SQL> COMMIT;
	
SQL> COMMIT FORCE '96.22.163456';
	
SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('96.22.163456');
Posted in Patching & Upgrading, Troubleshooting

Related Posts