Sometimes you chase the rabbit too far down the hole. In my never ending quest for high performance, I built an in memory index that works very much like a sorted view column in that its very quick to find an entry, and even support partial matches -- in fact, the constructor accepts a regular expression used to split the entry into the parts for granularity of partial matches. We tend to think in terms of any letter, but in my case splitting on periods was good enough and produces a far smaller index.
As you can guess, it won't scale. Oh, sure it scales enough for anything I can currently see putting in it, but go not too much further out and you run out of heap space. So, its got to go off to disk. Now I find myself halfway through the paper process of creating a fast on disk structure to match the index, and I realize -- its just not worth it. Its a great tool, but its tech that needs to be shelved so I can finish this project. Maybe it will reappear incarnated as my own full text index or something.
Comment Entry |
Please wait while your document is saved.
Did you have a look at Apache's Jakarta Commons? They have all kinds of stuff
on caching, collection and things we tend to reinvent.
:-) stw