I have a client application out there that kicks off a number of threads. The agent is failing to open some databases -- in a consistent and predictable but otherwise nonsensical pattern. When we spotted the pattern, it looked like we had it figured out. Lets assume the databases are db01, db02, db03, ...... db50. The agent would consistently fail to open db03, db06, and so on -- but not in an every 3rd database pattern. Here is the pattern of skipped databases: 3,6,9, 14, 17, 20, 23, 25, 28, 31, 34, 36, 39, 42, 45, 47, and 50.
I've narrowed the code down to what you see below. Each database is spawned off on its own thread as its own process. The very first thing the thread does us call session.getDatabase(). As you see, I've wrapped that call in its own try/catch framework. No exception is thrown. The database object is not null, but neither is it open. This is consistent with failure to open the database when you've left the optional "create on file" to either false or not set it (as is the case here).
So when we saw the consistent debugging reports:
Unable to open db03
Unable to open db06
Unable to open db09
We thought we had it. Some stupid problem with those database ACL's. Nope, no cigar. We we tried searching JUST the failed databases, can you guess which ones failed? In searching just the db03, db06, and db09 databases -- two worked, one didn't. db03 and db06 worked fine and db09 failed.
In other words, regardless of which database order you use, the third, sixth, ninth, fourteenth, and so on are the ones which fail!
Consider these additional facts:
1. There is a variable number of concurrent threads. The startup process will kick off one new thread for each database until that limit is reached, at which time it will wait in a loop at low priority until the number of running threads drops (a database is finished) and kick off the next. Other than the file name of the database, the objects are identical.
2. Varying the maximum number of threads -- initially set at 5, dropped to 2, and set as high as 10 -- had no impact at all on the result.
3. The Domino server is 6.5.5 with fixpack 1 --running on win32.
Here's the important bits of code which run when the new process is started:
Database sdb = null;
try {
sdb = session.getDatabase( serverName, dbName ) ;
} catch(Exception e) {
if(debug) logItem(" X5 " + this.getName() + ": " + e.toString());
}
if (sdb != null) {
if ( sdb.isOpen() ) {
/* <--- code removed ---> */
} else {
logItem("Unable to open " + dbName);
} // end of check for database.isopen()
} else {
logItem("Unable to access " + serverName + "!!" + dbName);
} // endof check for null database
There is literally no code here being executed in any kind of unique way. This is the first bit of code a new db object runs when its thread is started. Each db object is on its own and is unaware of its order in the process or of any other processing threads. Nothing here suggests a pattern.
I believe there is some outside limiting factor at play here.
Comment Entry |
Please wait while your document is saved.
what an intriguing problem!! :-)
Given that you ruled out the obvious issues with ACL and access in general I'm
not going to go there. I'm instead thinking of problems with allocating memory
or similar or other resource bottlenecks inside the underlying DLLs that might
trick your code. Are the problems only apparent when running on the server? I
guess it would be worth a try to run the code outside of the Domino server as a
strandalone program and/or to see how the code fares it you use the Corba
classes instead of the DLL-versions.
This may be obvious but you do remember to initialize the Domino environment
per thread right (by extending NotesThread or similar)?
Let me know.
/lekkim