Roads ? - where we're going, we don't need roads ...
ORA600 | Last resort Oracle recovery
ORA600 content Content RSS
Oracle ORA600 News RRSS Oracle News RSS
ORA600 blog Blog RSS
ORA600 blog Blog Atom

User login

Recover from unusable dictionary indexes in 10g using 11g


Some time ago was contacted by a company where someone had generated a script to move tables to another tablespace. Unfortunately, they had made an error and included sys and system tables.
Luckely, Oracle does not allow you to move base dictionary tables, throwing an ORA-00701 error.

ERROR at line 1:
ORA-00701: object necessary for warmstarting database cannot be altered

Anyway, some tables were moved out of the system tablespace and this left a trail of unusable dictionary indexes.
This resulted in a database throwing various errors and when the database was restarted, it just gave up.

SQL*Plus: Release 10.2.0.5.0 - Production on Tue Feb 12 17:24:03 2013

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 1015021568 bytes
Fixed Size                  2101168 bytes
Variable Size             243269712 bytes
Database Buffers          734003200 bytes
Redo Buffers               35647488 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced

Alert.log snippet :
Tue Feb 12 17:24:12 GMT+01:00 2013ALTER DATABASE OPEN
Tue Feb 12 17:24:12 GMT+01:00 2013Thread 1 opened at log sequence 18
  Current log# 3 seq# 18 mem# 0: /redoctl1/TEST/redo03.rdo
Successful open of redo thread 1
Tue Feb 12 17:24:12 GMT+01:00 2013MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Feb 12 17:24:12 GMT+01:00 2013SMON: enabling cache recovery
Tue Feb 12 17:24:12 GMT+01:00 2013Errors in file /home/app/oracle/admin/TEST/udump/test_ora_48496640.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01502: index 'SYS.I_DEPENDENCY1' or partition of such index is in unusable state
Tue Feb 12 17:24:12 GMT+01:00 2013Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 48496640
ORA-1092 signalled during: ALTER DATABASE OPEN...

Trace file snippet :
*** ACTION NAME:() 2013-02-12 17:24:12.414
*** MODULE NAME:(sqlplus@ab00s123 (TNS V1-V3)) 2013-02-12 17:24:12.414
*** SERVICE NAME:(SYS$USERS) 2013-02-12 17:24:12.414
*** SESSION ID:(152.3) 2013-02-12 17:24:12.414
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01502: index 'SYS.I_DEPENDENCY1' or partition of such index is in unusable state


The customer was on 10.1.0.3. I'm reproducing the case on a 10.2.0.5 database for the purpose of this blogpost.
The database was already down for 4days and a SR1 was already raised with Oracle.
Their first solution was to restore the backup - that was a bit tricky as apparently... as usual in these situations, there was no backup (they thought there was but there wasn't)

At first I focused on getting the data out with DUDE which took about 4hours for about 40GB, producings lots of DMP's and DDL scripts. 
Then a buddy of mine would take over and do the dirty work of rebuilding a new database and reload all data based on DUDE's output.

During the unload I had glanced at the SR1 call and noticed that the Oracle support engineer had done an effort to revive the crashed database by opening the 10G database using a 11G instance in upgrade mode. 
Which at first sounded strange to me as that would introduce an extra complexity  because of mis-matching dictionaries.

Once I had unloaded all data with DUDE (and most pressure was gone), I tried to open the database using an 11.2.x instance, making sure the COMPATIBLE parameter was set to 10.1.0.3 (otherwise you can't start your db with 10.1.0.3 anymore)
At least there was a cold backup of the database from just after the table moves, so I could play around a bit.

sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Tue Feb 12 17:36:33 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup upgrade pfile=/tmp/init.ora
ORA-32006: BACKGROUND_DUMP_DEST initialization parameter has been deprecated
ORA-32006: USER_DUMP_DEST initialization parameter has been deprecated
ORACLE instance started.

Total System Global Area 1010479104 bytes
Fixed Size                  2227776 bytes
Variable Size             260047296 bytes
Database Buffers          713031680 bytes
Redo Buffers               35172352 bytes
Database mounted.
Database opened.

Interesting enough the instance opened using the 11g instance in upgrade mode!

He then suggested to update ind$ and set the flag column to 1024 (0x400) for SYS.I_DEPENDENCY1.
Looking at dcore.bsq or sql.bsq this means disabling an index the quick and dirty way (/* index is disabled : 0x400 */).

This made the instance crash after bouncing it - game over. So that was clearly not the way the go.
I thought it could have something to do with the many differences between the 10g and 11g dictionary, so in my next attempt I tried the folowing steps:
- open system datafile in a hexeditor
- lookup SYS.I_DEPENDENCY1 in obj$ (based on dataobjectid and info produced by dude)
- mark SYS.I_DEPENDENCY1 as deleted in obj$ 
- see what happens if I fire up the db using 10g instance
Unfortunately I had forgotten to edit the blockheader (mainly the nrow value and the row directory) resulting in yet again a burning instance.
I did not take another attempt as it took to long to restore the db and my job was done.

But after thinking about it later that evening I reckoned, why not try to rebuild the indexes in 11g ?
So I tried one more time :

After starting the 10g db with 11g software and issueing a rebuild on SYS.I_DEPENDENCY1, this was the error returned :
SQL> alter index I_DEPENDENCY1 rebuild ;
alter index I_DEPENDENCY1 rebuild
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-00904: "DEFMAXSIZE": invalid identifier

Question is - where is DEFMAXSIZE coming from ?
So I issued a trace of my session and here's the recursive sql that's issued during a rebuild :

=====================
PARSE ERROR #4577143416:len=453 dep=1 uid=0 oct=3 lid=0 tim=5543224986472 err=904
select parttype, partcnt, partkeycols, flags, defts#, defpctfree, defpctused, definitrans, defmaxtrans, deftiniexts, defextsize, defminexts, defmaxexts, defextpct, deflists, defgroup
s, deflogging, spare1, mod(spare2, 256) subparttype, mod(trunc(spare2/256), 256) subpartkeycols, mod(trunc(spare2/65536), 65536) defsubpartcnt, mod(trunc(spare2/4294967296), 256) def
hscflags, mod(spare3, 256) interval_dty, rowid, defmaxsize from partobj$ where obj# = :1
=====================

Now remember - this is an 11g instance running a 10g database with it's original dictionary - when comparing dictionaries (sql.bsq versus dpart.bsq), the column DEFMAXSIZE did not exist in 10g.
So instead of introducing the complexity of running the migrate scripts, I decided to just add the column and see what happens :

SQL>  alter table partobj$ add defmaxsize number ;

Table altered.

SQL> alter index I_DEPENDENCY1 rebuild ;

Index altered.

Bonus !
During the actual case the above would also fail - it would complain about index I_WRI$_OPTSTAT_IND_OBJ#_ST being unusable.
This index seems related to the set of tables responsible for storing object statistics (index statistics in this case) - it seems logical that if you rebuild an index, oracle adds some stats in these tables.
If the related index is then not usable, it's only logical it complains.
So in fact I had to rebuild that index first :

SQL> alter index I_WRI$_OPTSTAT_IND_OBJ#_ST rebuild ;

Index altered.

And now I was able to rebuild all indexes without issues.

We could now start the db back in oracle 10g :

SQL*Plus: Release 10.2.0.5.0 - Production on Tue Feb 12 17:48:00 2013

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 1015021568 bytes
Fixed Size                  2101168 bytes
Variable Size             243269712 bytes
Database Buffers          734003200 bytes
Redo Buffers               35647488 bytes
Database mounted.
Database opened.

Double BONUS !

Now all that was left was to recompile the invalid objects :

SQL> select count(*) from dba_objects where status='INVALID' and owner='SYS' ;

       496


SQL> @?/rdbms/admin/utlrp

SQL> select count(*) from dba_objects where status='INVALID' and owner='SYS' ;

  COUNT(*)
----------
         0

And voila - the db is back among the living !

Again, when I reproduced this on my test kit, I had no invalid objects left after utlrp ran the first time, but on the actual database I was left with invalid KU$ views which are related to metadata generation needed for exp/expdp utilities.
I noticed this when I tried to do a full export :

About to export the entire database ...
. exporting tablespace definitions
. exporting profiles
. exporting user definitions
. exporting roles
. exporting resource costs
. exporting rollback segment definitions
. exporting database links
. exporting sequence numbers
. exporting directory aliases
. exporting context namespaces
. exporting foreign function library names
. exporting PUBLIC type synonyms
. exporting private type synonyms
. exporting object type definitions
. exporting system procedural objects and actions
. exporting pre-schema procedural objects and actions
. exporting cluster definitions
EXP-00056: ORACLE error 24324 encountered
ORA-24324: service handle not initialized
EXP-00056: ORACLE error 24324 encountered
ORA-24324: service handle not initialized
EXP-00056: ORACLE error 24324 encountered
ORA-24324: service handle not initialized
EXP-00000: Export terminated unsuccessfully

Datapump as well as exp was tumbling down half way through ddl generation.
So the last step to do was regenerate these views :

-- re-install DataPump types and views
@?\rdbms\admin\catdph.sql
-- re-install tde_library packages
@?\rdbms\admin\prvtdtde.plb
-- re-install DataPump packages
@?\rdbms\admin\catdpb.sql
-- re-install DBMS DataPump objects
@?\rdbms\admin\dbmspump.sql
-- recompile invalid objects
@?\rdbms\admin\utlrp.sql

And finally we can do a full export and regenerate the database !
Final note - I also tested this starting on an 11gR2 (moving sys tables in 11gR2) - and it seems 11gR2, allthough bitching and moaning, does not crash after restart and allows you to rebuild the indexes.
ps - if you were thinking about using SKIP_UNUSABLE_INDEXES - that didn't work either on the dictionary indexes ;-)
 
Final note - it's not possible to disable normal btree indexes (alter index disable) - only function based indexes.

FRM-92095: Oracle Jnitiator version too low - patch fix for Forms 10.1.2.3

It seems pigs can fly.

 

I must admit I had a few doubts if there would ever be a patch that would fix the FRM-92095 on Forms 10g when running Java plugin 1.7.x.

But here we are - almost a year later and there is a patch for Forms 10g. It seems the patch is already available since November 21 2012. I only picked up on it because of this blogpost by Steve Chan stating Forms 10.1.2.3.2 bundle patch 2 is now certified with EBS12. 

Bundle patch 2 is patch number 14825718 and one of the bug fixes is :

#  11782681 - APPS6  FORMS DO NOT LAUNCH WITH BETA JRE 1.7

The patch is basically an opatch which copies new libraries and jar files to your system including a new frmall.jar. It's available for all platforms and is only about 7.5MB.

 

So a big high five to Oracle - let's all install it so we can get rid of java 1.6.x and get on with our lifes ! 

 

ORA600 Oracle News Aggregator decommissioned

I've decided today to decommission my oracle news aggregator.

I started it a couple of years ago when orablogs.com stopped its RSS aggregation. I was a big of orablogs but I was too late in offering the owner a price for the domain. Otherwise I would have continued orablogs.com.

As it happened - I was too late and some other guy swept up the domain to make a quick buck. If I remember correctly, he was asking around 6000$ for the domain.

So - I decided to start my own RSS aggregator under the ora600 site.

I only added feeds that actually interested me (*no* Oracle EBS stuff!!!)

 

At the time, a nice side effect was the extra traffic and higher google ranking.

However, lately, the site's ranking is actually being hurt by the aggregator as google seems to be punishing aggregators because of duplicate content.

Anyway, with all the smart phone apps and news readers websites - aggregator sites have almost become obsolete.

And if you're interested in a pure Oracle RDBMS RSS aggregator - have a look at the oaktable aggregator

 

So with a couple of swift delete and truncate sql commands, I have purged just around 20000 blog entries.

Once I issued commit, I had the idea of maybe setting up a seperate site for the RSS entries ... but oh well ... too late.

 

It was a good run ! 

FRM-92095: Oracle Jnitiator version too low - please install version 1.1.8.2 or higher

A customer of mine who deploys Oracle Forms 10g  on the wild wild web, ran into a little problem.
They don't have control over the java plugin being installed on the client machine and with the latest java 1.7 update
they encountered the following forms error :

FRM-92095: Oracle Jnitiator version too low - please install version 1.1.8.2 or higher

Doh - we're running java 1.7 - not Oracle's obscure Jnitiator.

This seems to be a know issue on metalink see 
FRM-92095 Error When Running Forms Using JRE 7 (JRE 1.7) [ID 1348436.1]
A bug has been filed Bug:11782681 APPS6: FORMS DO NOT LAUNCH WITH BETA JRE 1.7

There's no patch yet(not even for the latest 11g Forms). And because Forms 10.1.2.x permier support
ended in December 2011... God only knows there will be a back-port. So for now - we're on our own !

However ... why does the Forms applet thinks it is running on an Oracle Jnitiator ?
Can it be that Java's vendor system property changed from 'Sun Microsystems Inc' to 'Oracle Corporation' ?

A little test program will show :
import java.util.* ;

class test
{
public static void main(String args[])
{
System.out.println("java.version =" + System.getProperty("java.version") ) ;
       System.out.println("java.vendor =" + System.getProperty("java.vendor") ) ;
}
}

On java 1.6:
java test
java.version =1.6.0_30
java.vendor =Sun Microsystems Inc.
On java 1.7:
java.version =1.7.0_02
java.vendor =Oracle Corporation

This property string for Java 1.6 and below is 'Sun Microsystems Inc' - starting from 1.7 it is 'Oracle Corporation'.
Oracle Jnitiator also used the vendor string 'Oracle Corporation'.
So - Forms now thinks it is running on an old jnitiator instead of the latest java 1.7 plugin.

I used a bootstrapper applet which sets the vendor property back to 'Sun Microsystems inc' like this :
- System.setProperty("java.vendor", "Sun Microsystems Inc.") ;
Then the applet(stub) loads Oracle Forms' main engine class.
- Class oforms = Class.forName("oracle.forms.engine.Main");
- init and start the Forms applet

I can't share the code nor class as the IP belongs to my employer but this fixed the problem and we can now run Forms on Java 1.7 !
The great part of this is - we don't touch the original frmall.jar, which is important if 3000 end users need a new jar downloaded over limited bandwidth ! 
Then, I went into a creative overdrive :-) 
In the spirit of the 80's and 90's atari and amiga demo scene I added an intro with plasma effect. 
Update - see also a very elegant workaround by Didier Marcelis in the comments (add -Djava.vendor="Sun Microsystems Inc.")
Update - if you are interested in the Forms bootstrapper (without plasma intro) - send me a mail.
Check out the flash screen capture I've made of the application loading :-)))
Update - it looks like there is a patch for forms 10g (bundle patch 10.1.2.3.2) that will solve the problem !
The patch number is 14825718 - and can be downloaded here.
One of the fixed bugs is 
#  11782681 - APPS6  FORMS DO NOT LAUNCH WITH BETA JRE 1.7
I have not yet installed and tested it - so if someone has - please leave a comment !

Oracle ASS - one time it will save yours !


For those who don't know Oracle ASS - it's an awk script that is hidden in Oracle's LTOM (aka Lite Onboard Monitoring).
You can download it from metalink - see note ID 352363.1.

It's an awk script that formats system state dumps. I now and then use it to format massive trace files to something I can actually use.

A couple of months ago I had to investigate why a 2 node RAC on windows kept on crashing.
The dba managing it didn't really bother to check the system state dump - instead they just open a call with Oracle and uploaded the trace files.
As usual, the dba blamed the developers - 'it's the app - it's the app' - and I had to defuse the situation and as the developers couldn't see anything wrong and the relationship started to turn sour.
I wasn't allowed access to the system but I was able to have a look at the trace files - or as the dba called it 'a very complex logfile'.

Fair enough - the system state dump was several megabytes larges but luckely I had Oracle ASS ;-)
Within minutes I kind of figured out what the problem was. 

Just looking at some of the wait events from the output below:
SGA: allocation forcing component growth
cursor: pin S wait on X
row cache lock

This looks like an sga resize operation, at which point library cache locks arise.
A sga resize operation can hang the database for brief moments of time especially when the shared pool needs to be resized.
However it this case, the instance was crashing.

With no access to the system I told the dba to have a look v$sga_resize_ops. If there are excessive resize operations, the sga might simply be sized too small.
I also got hand of the alert logs - as the instances were frequently crashing, I could see the startup in the alert.log, showing me sga settings.
And I kid you not - these 2 nodes were configure with SGA_TARGET=256M !!! I was surprise this thing even started !
Bumping up sga_target basically solved all issues.

This puppy was running in production for a month, had lots of issues and yet, the dba hadn't even bothered to check even the most basics settings.
Instead, they just blamed the app, opened a call with oracle, getting some general advise and be done with it. Amazing...


awk -f ass109.awk tracefile.trc > out

Starting Systemstate 1
..............................................................................
.................................................................
Ass.Awk Version 1.0.9 - Processing xyz_ora_4948.trc

System State 1
~~~~~~~~~~~~~~~~
1:                                      
2:  waiting for 'pmon timer'            
3:  waiting for 'DIAG idle wait'        
4:  waiting for 'rdbms ipc message'     
5:  waiting for 'rdbms ipc message'    [Latch received-location:] 
6:  waiting for 'ges remote message'    
7:  waiting for 'gcs remote message'    
8:  waiting for 'gcs remote message'    
9:  waiting for 'gcs remote message'    
10: waiting for 'gcs remote message'    
11: waiting for 'gcs remote message'    
12: waiting for 'gcs remote message'    
13: last wait for 'rdbms ipc message'   
14: waiting for 'rdbms ipc message'     
15: waiting for 'rdbms ipc message'     
16: waiting for 'rdbms ipc message'     
17: waiting for 'rdbms ipc message'     
18: waiting for 'rdbms ipc message'     
19: waiting for 'SGA: allocation forcing component growth' 
20: waiting for 'enq: DR - contention' [Enqueue DR-00000000-00000000] 
21: waiting for 'SGA: allocation forcing component growth' 
22: waiting for 'SGA: allocation forcing component growth' 
23: waiting for 'SGA: allocation forcing component growth' 
24: waiting for 'rdbms ipc message'     
25: waiting for 'SGA: allocation forcing component growth' 
     Cmd: Select
26: waiting for 'ASM background timer'  
27: waiting for 'rdbms ipc message'     
28: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
29: waiting for 'rdbms ipc message'     
30: waiting for 'rdbms ipc message'     
31: waiting for 'Streams AQ: qmn coordinator idle wait' 
32: waiting for 'cursor: pin S wait on X' 
     Cmd: Select
33: waiting for 'PX Deq: Parse Reply'   
34: waiting for 'SQL*Net message from client' 
35: waiting for 'PX Deq: Parse Reply'   
     Cmd: Select
36: waiting for 'cursor: pin S wait on X' 
     Cmd: Select
37: waiting for 'cursor: pin S wait on X' 
     Cmd: Select
38: waiting for 'cursor: pin S wait on X' 
     Cmd: Select
39: waiting for 'cursor: pin S wait on X' 
     Cmd: Select
40: waiting for 'SGA: allocation forcing component growth' 
     Cmd: Select
41: waiting for 'Streams AQ: qmn slave idle wait' 
42: waiting for 'PX Deq: Parse Reply'   
     Cmd: Select
43: for 'Streams AQ: waiting for time management or cleanup tasks' 
44: waiting for 'PX Deq: Execution Msg' 
     Cmd: Select
45: waiting for 'PX Deq: Parse Reply'   
     Cmd: Select
46: waiting for 'PX Deq: Execution Msg' 
     Cmd: Select
47: waiting for 'PX Deq: Parse Reply'   
     Cmd: Select
48: waiting for 'enq: PS - contention' [Enqueue PS-00000002-00000E0C] 
     Cmd: Select
49: waiting for 'class slave wait'      
50: waiting for 'SGA: allocation forcing component growth' 
     Cmd: PL/SQL Execute
51:                                     
52: waiting for 'PX Deq: Parse Reply'   
53: waiting for 'SGA: allocation forcing component growth' 
     Cmd: PL/SQL Execute
54: waiting for 'PX Deq: Execution Msg' 
     Cmd: Select
55: waiting for 'PX Deq: Parse Reply'   
     Cmd: Select
56: waiting for 'PX Deq: Parse Reply'   
57: waiting for 'SGA: allocation forcing component growth' 
58: waiting for 'PX Deq: Execution Msg' 
     Cmd: Select
59: waiting for 'SGA: allocation forcing component growth' 
     Cmd: Select
60: waiting for 'SGA: allocation forcing component growth' 
     Cmd: Select
61: waiting for 'PX Deq: Execution Msg' 
     Cmd: Select
62: waiting for 'PX Deq: Parse Reply'   
63: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
64: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
65: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
66: waiting for 'PX Deq: Parse Reply'   
67: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
68: waiting for 'SGA: allocation forcing component growth' 
69: last wait for 'ksdxexeotherwait'   [Rcache object=000007FF88BFD200,] 
70: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
71: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
72: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
73: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
74: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
75: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
76: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
77: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
78: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
79: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
80: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
81: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
82: waiting for 'row cache lock'       [Rcache object=000007FF88AC2658,] 
83: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
84: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
85: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
86: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
87: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
88: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
89: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
90: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
91: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
92: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
93: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
94: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
95: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
96: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
97: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
98: waiting for 'row cache lock'       [Rcache object=000007FF88AC2658,] 
99: waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
100:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
101:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
102:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
103:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
104:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
105:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
106:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
107:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
108:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
109:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
110:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
111:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
112:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
113:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
114:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
115:waiting for 'row cache lock'       [Rcache object=000007FF88AC2658,] 
116:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
117:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
118:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
119:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
120:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
121:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
122:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
123:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
124:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
125:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
126:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
127:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
128:waiting for 'row cache lock'       [Rcache object=000007FF88AC2658,] 
129:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
130:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
131:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
132:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
133:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
134:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
135:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
136:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
137:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
138:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
139:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
140:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
141:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
142:waiting for 'row cache lock'       [Rcache object=000007FF88BFD200,] 
143:waiting for 'row cache lock'       [Rcache object=000007FF88AC2658,] 
Blockers
~~~~~~~~

Above is a list of all the processes. If they are waiting for a resource
then it will be given in square brackets. Below is a summary of the
waited upon resources, together with the holder of that resource.
Notes:
~~~~~
o A process id of '???' implies that the holder was not found in the
  systemstate.

                    Resource Holder State
    Latch received-location:    ??? Blocker
Enqueue DR-00000000-00000000    ??? Blocker
Rcache object=000007FF88BFD200,    57: waiting for 'SGA: allocation forcing component growth'
Enqueue PS-00000002-00000E0C    48: Self-Deadlock
Rcache object=000007FF88AC2658,    68: waiting for 'SGA: allocation forcing component growth'

Query Co-Ordinator to Query Slave Mapping
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
QC=  33:  [Count=1]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E0E)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
QC=  35:  [Count=2]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E0F)
           Communicates with Slave 0 (hex) on instance 1 (PS-01-0E0F)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
  44: PS-01-0E0F   1   00000,         35:         44:  DTA,     FRE     STRE    0
QC=  42:  [Count=2]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E0D)
           Communicates with Slave 0 (hex) on instance 1 (PS-01-0E0E)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
  46: PS-01-0E0E   1   00000,         42:         46:  DTA,     FRE     STRE    0
QC=  45:  [Count=2]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E07)
           Communicates with Slave 0 (hex) on instance 1 (PS-01-0E09)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
  61: PS-01-0E09   1   00000,         45:         61:  DTA,     FRE     STRE    0
QC=  47:  [Count=2]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E0B)
           Communicates with Slave 0 (hex) on instance 1 (PS-01-0E0C)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
  54: PS-01-0E0C   1   00000,         47:         54:  DTA,     FRE     STRE    0
QC=  48:  [Count=2]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E0C)
           Communicates with Slave 0 (hex) on instance 1 (PS-01-0E0D)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
QC=  52:  [Count=1]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E0A)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
QC=  55:  [Count=2]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E08)
           Communicates with Slave 0 (hex) on instance 1 (PS-01-0E0A)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
  58: PS-01-0E0A   1   00000,         55:         58:  DTA,     FRE     STRE    0
QC=  56:  [Count=1]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E05)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
QC=  62:  [Count=1]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E06)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
QC=  66:  [Count=1]
           Communicates with Slave 0 (hex) on instance 2 (PS-02-0E09)
Slave     Info Msg    State        From          To  Type  Status     Mode  Err
                         ------------------------
STATUS Key:
  DEQ = buffer has been dequeued
  EML = buffer on emergency message list
  ENQ = buffer has been enqueued
  FLST= buffer is on SGA freelist
  FRE = buffer is free (unused)
  GEB = buffer has been gotten for enqueuing
  GDB = dequeued buffer has been gotten 
  INV = buffer is invalid (non-existent)
  QUE = buffer on queue message list
  RCV = buffer has been received 
  NOFL= not on freelist (just removed)
                              ------------------------

Object Names
~~~~~~~~~~~~
Latch received-location:              last post received-location: kjmdrms
Enqueue DR-00000000-00000000                              
Rcache object=000007FF88BFD200,                              
Enqueue PS-00000002-00000E0C                              
Rcache object=000007FF88AC2658,                              


53697 Lines Processed.

Are you the smartest 2011

It's been a long time since I've written something here, but I reckon the following is worth a post.

For those of you who don't know what Oracle 'Are you the smartest' is (or ayts short), have a look at this post I wrote a couple of years back.
Basically it's a competition for Oracle partners in the Benelux - I've entered the first time in 2008 and since then I've won it twice and became second once. 
Have a look at this year's topics (http://www.ayts.be/)
All Benelux based Oracle partners can enter, and there is a winner per partner. 
This year, AXI was the smartest partner in Benelux, meaning, on average, we had the highest scores on the exams... again.
Anyhow - I've won it this year for AXI - and the first price is a trip to Oracle HQ, followed by a quick stop in Vegas. Now that's a cool incentive !
 

Have a look at Marcel's blog for a more detailed overlook of the trip.

I've been to SF and Vegas numerous times, but it never gets old ! 
Although this year, I must it admit, I was very tired before I even started the trip.
I've become a father 6months ago and this sort of starts to work on your sleep patterns Cool
So I was looking forward to some rest, but these were 4days packed full of action and this tired me down even more. 
Also the group was quit large (25people), and people who know me, know I don't thrive that well in big groups.
This meant that by the end of the trip I hadn't talk to all the people I wanted to talk to !

The trip was a blast tho - I teamed up with Geert Depaep (the guy from DBA-Village)  as a room-mate. He also won in 2008, the same year I last won.
Two high lights stand out 
- the visit to Oracle HQ, were we had 2 presentations.
One from Wim Coeckaerts on Oracle virtualisations and one from Jeff Davis on SOA.
I can't really say much about it since we had to sign an NDA, other than, OracleVM 3.0 does exist.
(it has been in beta so long, I started to wonder if it acutally existed !)
Also, have a look at Oracle Virtual Assembly Builder.
I had never heard about this tool but it looks pretty cool if you want to deploy VM images and customize them.
wim coeckaert 

- my poker session at the Venetian
I didn't enter a tournament (allthough those deep stack tournaments look mighty tempting) - just played cash games 4h straight.
Most of the guys went to the V club after 1h of gambling - apperently I was the only one not there ... but when I'm at the tables
I lose time and interests in anything else ... and I made some money as a bonus.

Anyway, overall excellent trip, a great incentive from Oracle, and a big thanks to Yves Van Looy.

Now if only I could get this song out of my head they were playing at the Paris Chateau Nightclub.




Memory Footprint For Dedicated Server Processes More Than Doubled After 11g Upgrade On AIX Platform

Another one I categorize under 'I should write this down before I forget'.
Connor Mcdonald - Oracle geek and entertainer extraordinaire - shared this with me last week. Apparently, he was having some memory issues after upgrading to 11gR2 under AIX. As we run 80% of our databases under AIX, I thought I should make a mental note about it (some may call it a blogentry).
For those with metalink access - lookup note id ID 1246995.1 titled 'Memory footprint for dedicated server processes more than doubled after 11g upgrade on AIX platform'.
 

The symptoms are :
  • excessive memory utilization for dedicated server process after upgrade to 11.2.0.1 or 11.2.0.2 on AIX
  • svmon on oracle OS process id shows size of USLA, User-Space Loader Assistant, heap of about 7M bytes
 
 
Example 
svmon -P <PID>
outputs
  • 11.2.0.1.0 -> 7M bytes (=USLA x 4k page size)
  • 11.1.0.7.0 -> 60KB 
  • 10.2.0.4.0 -> 420KB

Example on 11.2.0.1 (unpatched)
 svmon -P 1368120 | grep USLA
 10e811d  9fffffff clnt USLA text,/dev/hd2:8227      s     17     0    -       -
 10d89b9  80020014 work USLA heap                   sm      0     0 1895    1895 ---> x4K = 7Mb
 
Example on 11.2.0.1 (patched)
  c291c2  80020014 work USLA heap                   sm     31     0    0      31 ---> x4K = 124K
 
 
 
This is apparently a bug
Bug 10211065.-P Base Bug 9796810
Abstract: MEMORY FOOTPRINT FOR DEDICATED SERVER PROCESSES MORE THAN DOUBLED AFTER 11G UGP
 
 
Bug 9796810.-P Base Bug 10190759
Abstract: 11G SERVER PROCESSES CONSUMING MUCH MORE MEMORY THAT 10G OR 9I
 
 
Bug 10190759.-P
Abstract:  PROCESSES CONSUMING ADDITIONAL MEMORY DUE TO 'USLA HEAP'

 
 
The good news is - there's a 'interim' patch - number 10190759.
It basically changes the make file so that it excludes certain options (-bexpfull and -brtllib) related to 11g's online patching feature.
It's not really an Oracle problem but it seems related to a bug in runtime loader on AIX.
 
 

_memory_imm_mode_without_autosga - no really ! don't resize my sga ! I mean it !

Wow - I need a mental note for this one ! (hence the blogpost)

So you've disable Automatic Memory Management (AMM/ASMM) on your new 11.2 instance - memory_target and sga_target are set to zero ... because ... the dba knows best!
You've configure large pages - locked the sga in memory - manual configured buffercache etc - you've got the luxury of finally having a machine with gigs and gigs of memory.

And then one day you find out that your buffercache is 256Mb in size, the large pool is now 20Gb (coming from 1Gb) and the alert file is screaming ORA-4031, ORA-4031 !!!

How did that happen !!!!!! We didn't use automatic memory management ! You scream and yell - who touched the instance parameters..... no fool would set a large pool of 20Gb and a buffer cache of 256Mb manually.
Looking in the audit logs reveals nobody touched the instance ... it seems AMM/ASMM *is* at play - how can that be if we disabled it ?

You open a SR with support and everything becomes clear now :
- we just ignore your settings and do AMM anyway - if you don't want that set this underscore parameter :
alter system set "_memory_imm_mode_without_autosga"=FALSE scope=both;

Quote - "This is expected behavior in 11.2 for immediate memory allocation requests, which added this as a new feature when automatic memory management was disabled."
Stupid me !
 
note on metalink "SGA Re-Sizes Occurring Despite AMM/ASMM Being Disabled (MEMORY_TARGET/SGA_TARGET=0) [ID 1269139.1]"

Unloading history - old Oracle7 dictionary

When I saw Jonathan's post it reminded me of the work I did last weekend.
Friday evening I got 2 datafiles - one SYSTEM and one DATA datafile.

I only knew the platform but I had send out a request for more information. You can quite easily find out endianness and blocksizes... but the Oracle version is a bit different.
With this one, I was quite certain they were Oracle 7 datafiles. For one thing, when DUDE scanned the fileheaders it decoded the filenumbers as being a multiple of 2. SYSTEM turned out to be file#=16. I'll explain later.
Secondly, setting DUDE's VERSION parameter to '7', allowed me to unload the dictionary without too much trouble.

However, when I wanted to unload data from the actual datafile, no data objectid's were found in obj$ or tab$ !!! How could that be !
So I started to investigate the dictionary - looking at the SYSTEM tablespace with an hex editor.
Blocks were fine, however, there was huge gap in objectid's being used, jumping from 3000 to 41000.
Hmm... a serious part of the dictionary was missing - maybe completely overwritten.

I then got word back that this was an Oracle 8.0.x database and even more surprising, the database was up and running except for one datafile that had been offlined. (the one they needed data from).
So, the dictionary had to be good ! 

I was too puzzled at the time - it was only when I took a step back and thought about it for a while everything came together.
This was an Oracle 8.0.x, which was once an Oracle 7.3.4 migrated to 8.0.x using Oracle's 'mig' utility. 'mig' doesn't exist anymore - it's functionality has basically taken over by 'startup migrate' and the upgrade scripts.
Back in the day you had to run the mig utility on your database, which would upgrade your dictionary, using 'migrate.bsq' script. No startup migrate there!
To migrate the dictionary, new dictionary tables were created. For example, if you had obj$, tab$, col$, then mig utility would create obj_mig$, tab_mig$ and col_mig$ and then it would migrate the data from the old dictionary to the new dictionary and basically switch names. That would mean that the objectid's (and of course data objectid's) of the base dictionary tables are totally different than a normal 8.0 database.
In this case, they would be in the 41000 range, because that was the range of objectid's of the most recent created object/segment. 
The old dictionary would then be dropped ... but orphaned extents might sit untouched for years in your system tablespace.
And that's exactly happened when I unloaded the dictionary with VERSION='7'.
DUDE found the left-overs of the old dictionary. However, DUDE should have unloaded the new dictionary !

So how does Oracle find the dictionary ?
Well, in the tablespace header block of the SYSTEM tablespace, there's a pointer pointing to the bootstrap$ table. The bootstrap$ table contains ddl for the base dictionary tables (and indexes) including data objectid's (and table numbers for clustered tables).
(you'll also find the ddl for the base dictionary in the sql.bsq script which is used when the database is created)
Here's an example from bootstrap $: 
 17","2407","CREATE TABLE OBJ$ ( OBJ# NUMBER NOT NULL, DATAOBJ# NUMBER, OWNER# NUMBER NOT NULL, NAME VARCHAR2(30) NOT NULL, NAMESPACE NUMBER NOT NULL, SUBNAME VARCHAR2(30), TYPE# NUMBER NOT NULL, CTIME DATE NOT NULL, MTIME DATE NOT NULL, STIME DATE NOT NULL, STATUS NUMBER NOT NULL, REMOTEOWNER VARCHAR2(30), LINKNAME VARCHAR2(128), FLAGS NUMBER, OID$ RAW(16), SPARE1 NUMBER, SPARE2 NUMBER, SPARE3 NUMBER, SPARE4 VARCHAR2(1000), SPARE5 VARCHAR2(1000), SPARE6 DATE) pctfree 10 pctused 40 initrans 1 maxtrans 255 storage ( initial 10240 next 126976 minextents 1 maxextents 121 pctincrease 50 objno 2407 extents ( file 40 block 247))"

You'll notice that it contains some key pointers for the described table, like file# (40), offset (247) and dataobjectid (2407). For tables part of a clustered table, it will also contain the table number within the cluster.

So once we have the bootstrap$ segment, we'll know the location of dictionary !

DUDE actually has the ability to actually 'search' for the bootstrap$ segment based on it's characteristics.
It does not however parse the bootstrap ddl for objectid's and table numbers - but once you have the output for bootstrap$, it's quite straightforward to punch in DUDE's bootstrap parameters like :
  • BOOTSTRAP_FILE_OID
  • BOOTSTRAP_OBJ_OID
  • BOOTSTRAP_COBJ_OID
  • BOOTSTRAP_CFILEBLOCK_OID
  • BOOTSTRAP_CTS_OID
  • BOOTSTRAP_CUSER_OID
  • BOOTSTRAP_TAB_TABNO
  • BOOTSTRAP_IND_TABNO
  • BOOTSTRAP_ICOL_TABNO
  • BOOTSTRAP_COL_TABNO
  • BOOTSTRAP_LOB_TABNO
  • BOOTSTRAP_SEG_TABNO
  • BOOTSTRAP_TS_TABNO
  • BOOTSTRAP_USER_TABNO 

These parameters are explained in the DUDE primer here.

Anyway, once I set these parameters, I was home free, unloading a complete Oracle8 dictionary !!!


So what about the SYSTEM datafile having a filenumber larger than 1 ?

In Oracle 6 there were only 5 to 6 bits used for the file number. So only a maximum of 2^5-1 (31) or 2^6-1 (63) datafiles could be used (database wide).
In Oracle 7 this changed to 10bits or 2^10-1 (1023) datafiles (database wide). However, because of backward compatibility with Oracle 6 an encoding scheme was introduced splitting up the 10bits for file number into 6 and 4 bits and wrapping them around. It really depends on the platform. On intel windows and IBM AIX for example, I’ve seen an 8/2 split.
SVRMGR> select dump(chartorowid('00000000.0000.0001')) from dual ;
DUMP(CHARTOROWID('0000000
-------------------------
Typ=69 Len=6: 1,0,0,0,0,0
1 row selected.
SVRMGR> select dump(chartorowid('00000000.0000.ffff')) from dual ;
DUMP(CHARTOROWID('00000000.00
-----------------------------
Typ=69 Len=6: 255,192,0,0,0,0
1 row selected.
This means that the first file# is :
00000001 00000000 00000000 00000000 -> file# 1
And the maximum file# is :
11111111 11000000 00000000 00000000 -> file# 1023
So the 10bits encoding scheme is like this :
LLLL LLLL HH
Where L is the low order bits
And H the high order bits
Now let’s open DUDE on a series of these datafiles :
DUDE> Initialising ...
DUDE> Init : creating filenumber map ...
DUDE> Scanning tablespace SYSTEM : BLOCKSIZE = 2048
DUDE> File : G:\sys1orcl.ora resolves to number : 4
DUDE> File : G:\sys2.ora resolves to number : 40
You’ll notice that sys1orcl.ora which is basically the first file of the database has file# equal to 4. And we know that sys2.ora had file# equal to 10. How’s that possible ?
File# 1 = 0000 0001 00 (LLLL LLLL HH) EQUALS 4 in Oracle 8 DBA format
File# 10 = 0000 1010 00 (LLLL LLLL HH) EQUALS 40 in Oracle 8 DBA format
It’s clear that using the Oracle 8 DBA format encoding on the Oracle 7 wrapped DBA format, results in different file numbers. Basically, the file number shifted 2 bits to the left (or x2x2). This is of course platform specific, but if the first file of SYSTEM has a file number that is a multiple of 2, you probably have a migrated database.
So what happened when Oracle 8.0 came along and introduced 2^10-1 or 1023 datafiles per tablespace !
Well – the DBA format stayed the same. However, the file numbers became relative to the tablespace. So 2 datafiles of the same database could have potentially the same file number, but belong to 2 different tablespaces!
What happened to the Oracle 7 (absolute) file numbers when it was migrated to Oracle8. Surely, the mig utility didn’t update the DBA for all blocks ?
Let’s check out an Oracle 7 database :
SVRMGR> desc file$
Column Name Null? Type
------------------------------ -------- ----
FILE# NOT NULL NUMBER
STATUS$ NOT NULL NUMBER
BLOCKS NOT NULL NUMBER
TS# NOT NULL NUMBER
SVRMGR> select file#, ts# from file$ ;
FILE# TS#
---------- ----------
1 0
2 1
3 2
4 3
5 8
6 9
7 10
8 7
9 11
10 0
10 rows selected.
Ok – looks logical – we see that tablespace TS#=0 or SYSTEM has 2 datafiles with file#=1 and file#=10.
Let’s do the same after a migration to 8.0 :
SVRMGR> desc file$
Column Name Null? Type
------------------------------ -------- ----
FILE# NOT NULL NUMBER
STATUS$ NOT NULL NUMBER
BLOCKS NOT NULL NUMBER
TS# NUMBER
RELFILE# NUMBER
MAXEXTEND NUMBER
INC NUMBER
CRSCNWRP NUMBER
CRSCNBAS NUMBER
OWNERINSTANCE VARCHAR2(30)
SPARE1 NUMBER
SPARE2 NUMBER
SPARE3 VARCHAR2(1000)
SPARE4 DATE
SVRMGR> select file#,ts# from file$ ;
FILE# TS#
---------- ----------
1 0
2 1
3 2
4 3
5 8
6 9
7 10
8 7
9 11
10 0
10 rows selected.
So – the file# for the datafiles stayed the same. But we can see an add column in file$ - relfile# :
SVRMGR> select file#,relfile#,ts# from file$ ;
FILE# RELFILE# TS#
---------- ---------- ----------
1 4 0
2 8 1
3 12 2
4 16 3
5 20 8
6 24 9
7 28 10
8 32 7
9 36 11
10 40 0
10 rows selected.
Here we can clearly see the 2bit shift to the left – the Oracle 7 absolute filenumber became an Oracle 8 relative filenumber.
So the mig utility did not have to :
  • update the DBA in a block
  • row addresses in chained and migrated rows

More info on the topic can be found on Metalink - see note 122926.1 ...
I think I'll have to lay down now Wink 

ORA600 - 5years later

This summer is ORA600's 5th anniversary - and DUDE's 10th !
 
So I thought it would be appropriate to look back and reflect on all crazy and weird things we've gone through.
 
As many of you know, DUDE started out as jDUL on sourceforge. I started on it while I was working in South Africa - me and my buddy Kugendran Naidoo came up with all kinds of crazy idea's whilst enjoying a smoke in the smoking room. Wow, remember the days you could still smoke in the office ? 
 
I can't say the'Oracle community' was very appreciative to the idea of a DUL-like tool. Back in the day, the Oracle community, for me, consisted mostly of the cdos newsgroup (comp.database.oracle.server). To be honest, I found cdos quite a hostile place - a lot of flames, rants, newbie bashing etc… all in all  sending out negative vibes. Thinking about it, it's just the opposite of the Oracle-l mailing list (http://www.freelists.org/archive/oracle-l). I'm not surprised cdos is now full of spam messages and almost completely abandoned. If it wasn't for cdos … I might have gone through open-sourcing jDUL/DUDE !
 
Anyway - flash forward to the summer of 2005 - after a meeting with Mogens Norgaard from Miracle AS, ORA600 is born. (see also here). That particular summer was quite fruitful as I spent most of my time implementing new features. To give you an idea - in 2001 the source code was 131Kb in size and 4393 lines long. By the end of 2005 it was 12390 lines of code and 446Kb in size. The current version is 1049Kb and 27817 lines long. I must admit, a lot of that is code documentation !
 
Although Miracle AS worked as an incubator, Mogens introduced me to Daniel Fink in 2007 at a Miracle Scotland event in Edinburgh. Daniel was interested in representing DUDE in the US. Mogens didn't mind. Dan is a great guy to work with - very professional! After that I started collaborating with different companies around the globe, like PythianEvdbt Inc (Tim Gorman), NRG Consulting and HBTec. I really think local support is very important for potential customers even if it squeezes my profit margins.
 
I have to admit - I have encountered some amazing corruption cases over the last 5 years. Some were solved fast and swiftly - others have costs me several years of my life Wink
Here are some recovery cases I will never remember :
 
1. The one with the mixed data in a LONG column
 
This case was *by far* the most stressful recovery I have done. The customer was a newspaper publisher and they needed their database online before Sunday 2PM. Having a deadline is always stressful, but this database contained a table with a LONG column that was on average a gigabyte large. Normally that's not a problem, but at that time I had a bug in my DMP API related to Java's garbage collector, that caused the unload to slow down …. It actually appeared to be hanging as there were no error messages being generated. I didn't know about the bug right away so I investigated the situation up to the point I opened several blocks in a hexeditor, following pointers of chained rows manually (did I mention the deadline?). What I saw was that the column contained readable text and then suddenly what looked like binary data. I immediately thought some blocks were partially corrupted and what I was seeing was garbage. Quite often on windows platforms you'll see file corruption, where files are partially overwritten by junk in multiples of 512bytes, spread all over. It's like someone takes a machinegun and puts some holes in the files. Worst case, the corruption is in the middle of a block so header and tail are fine and the corruption goes almost unnoticed. Until you try to trail the row directory to the row headers and you suddenly hit junk. It gets even worse, if by coincidence the header is identified as a chained row and thus you try to read the next dba…. Which of course will point to some non-existing datafile and offset!!!
Anyway that's what it look like at first glance … until I actually started to put things together with an hex-editor (try doing that at 3AM at night!). The blocks were fine, the row directory was valid, row headers were ok, chained row headers were pointing to valid row pieces. This led me to the conclusion the data was valid, just extremely weird !
At that point I focused on my DMP API and found out the garbage collector was doing too much work, cleaning up after processing 1Gb columns. 
Once I fixed that, the unload was blazing fast - and I made the deadline… but boy… what an adrenaline rush !!!
 
 
2. The one with the even larger column
 
DUDE is written in java - most JVM's are/were 32bit. Although DUDE is multithreaded (it has producer and consumer threads) and some platforms turn threads into procs,  I have to work with a 2Gb memory limitation in mind. At one point I had a case were DNA strands were being stored in CLOBs. These were all around 2Gb large. Needless to say I ran into some memory related issues ;-) At that point I had to react quickly - and these things always happen at night. The solution was to write on-disk data structures and thus avoiding the 2Gb limit. Performance is of course affected by this - but I can safely say that DUDE will unload very large LOB's on a 32bit platform. 
 
 
3. The one without a system tablespace and just toooo many tables
 
This one I will also never forget. Losing your system tablespace, and thus your dictionary is a disaster. But no worries - DUDE can identify tables and datatypes using an heuristic algorithm. However, table names, column names, logical column order … that's all stored in the dictionary. So recovery from such a situation involves intimate knowledge of your data, and a lot of guessing to map the flat files to the correct table. I've done it a number of times with a  hundred or so tables, always with the developer by my side. It's time consuming and very hard. One time we had a 'lost system tablespace' case. The db in itself was about 68Gb but it was a BAAN ERP system. That means thousands (5000+) of tables !
It took a team of 8 BAAN consultants to rebuild the database in 2 weeks ! I specially changed the code for them so that DUDE would generate BAAN specific flatfiles for easy loading through BAAN's loader tool. This is where the DUDE license really pays off for the customer - one fixed price no matter how long you need it !!!
 
 
4. The 'we don't use backups' case… but we're happy with that
 
One time I got a call from a company who had a SAN crash. They had a 4TB datawarehouse sitting on it, and apparently were unable to restore the database. Because of the size of the DB I suggested to go with the license as clearly, the unload would take some time. A license is fixed priced for as long the unload takes and 4TB… well it can take a while! The next day I get a call back - 'hey this thing really works - we've got another database on that SAN we would like to see unload' - me:'what's the size?' - answer 'about 4TB'…   That's not all ! A year later - got a call from the same company different branch. 'We have a db crash - we've used your tool before - we would like to use it again. The DB size is 2TB!". So in 2years, 3 databases and a total of 10TB recovered for the same customer… my favorite customer… ever! They were also the first one who used table compression - so I can safely say my block compression algorithm has been tested on multiple terabytes Wink
 
 
 As you can see - recovering data can be really stressful, but we take great pride in what we do, and if it means we have to change the code for your specific needs, we'll do it and make it work !
 
The ORA600 vehicle also made it easier to network and meet new people - which is a bonus as I don't like large crowds. I can definitely say OOW is not for me - but UKOUG is. It's one of the best conferences out there and it allowed me to meet all sorts of interesting new people (you know who you are!). Well, I tasted HOTSOS first which was really great … but I don't like long flights either … and I definitely don't like all male dance parties …. Give me an English pub anytime Wink
 Here's a picture of an infamous evening - you'll recognize a certain scottsman in the back, a couple of Danes and a couple of Finns... oh and me !
 
 
 
Anyway -  because of the anniversary I'll be giving away the very last ORA600 poloshirts !!!
 
(to be honest, I have been remodeling the house to accommodate a nursery room as I will become father in December … so everything has got to go!!! Laughing )
 
 
 
I have 5x size medium (M), 1x extra large (XL) and 3x extra extra large (XXL) - the first to respond to dude@ora600.be will get one !!! (don't forget to mention your size and address!!!)
 
Update - all polo shirts are gone !!! 
 
Up to the next five years ! 





Howto's
See DUDE primer for info

Get Support

Europe

Belgium :
Kurt Van Meerbeeck
ORA600 bvba
E-mail
dude@ora600.be
Cell : +32 495 580714

Denmark :
Michael Môller
Service & Support Manager
Miracle AS
E-mail :
hra@miracleas.dk
Cell: +45 53 74 71 27


North America

USA :
Tim Gorman
Evdbt Inc
E-mail
tim@evdbt.com
Cell : +1 303 885 4526

Canada :
Pythian
E-mail
dude@pythian.com
Contact


Latin America

Brazil :
HBtec
E-mail
dude@hbtec.com.br
Cell : +55 47 88497639
Contact


Africa

South Africa :
Kugendran Naidoo
NRG Consulting
E-mail
k@nrgc.co.za
Cell : +27 82 7799275


East Asia Pacific

Australia
Andre Araujo
Pythian Australia
E-mail
dude@pythian.com
Cell : +61 2 9191 7427 ext. 1270