Got a thought or idea for our BI Bloggers ? Email biinactionblog@ebizq.net
BI in Action Blog
|
« DataMirror Reflects IBM's Integration Ambition | Main | Big Data --Why Size Doesn't Necessarily Matter » August 20, 2007Will Search Replace Query?
I recently had the opportunity to speak with John Tredennick, CEO of Catalyst Repository Systems, an operation that manages and provides millions of documents to law firms and corporations that seek relevant information for pending cases or regulatory reviews. As Tredennick puts it, regulatory reviews and corporate law cases now require "truckloads" of documents, versus a mere single briefcase stuffed with documents as of just a couple of decades ago. Catalyst invokes both search technology and more traditional database queries as part of its information access capabilities, and as Tredennick puts it, has found a happy middle ground between the two approaches. Lately, as a matter of fact, there's been a lot of attention and excitement brewing around the search approach to data access as a cheaper and faster alternative traditional queries. A few months back, I heard executives from ING and Merrill Lynch talk about their employment of search as a faster and more cost-effective option to databases. (Original post here.) For example, Edward Longo, VP of information technology for retirement services at ING, felt that running SQL queries in batch mode against databases would be too slow for providing data access to six million retirement plan participants within 40,000 retirement plans serviced by the company. This methodology kept their systems busy until 4 am each morning, and the workload was growing, he said. The company's data mart needed to store 500 GBs of data, covering 400 million transactions covering 18 months of history. The load time for all this information was seven to 10 hours, he said. With search technology in place, the company was able to increase its levels of aggregation from four levels within the relational databases to 140 available aggregations. At Merrill Lynch, a single search and discovery portal was employed to replace SQL-based queries to multiple silos of data across various units and services across the globe. “SQL was not ideal for searching — it was too slow, said Zach Friedland, vice president of enterprise data solutions for Merrill Lynch. The firm’s EDS Search portal (for Enterprise Data Solutions) links against messaging across the enterprise. “We have no data warehouse at all here, since we’re processing the same messages that we use to send to our systems,” he said. Friedland’s team did, however, build a data warehouse off of the navigators used for the search and discovery portal. “We built a warehouse off of the search engine, which is the reverse of the way it’s usually done.” he said. Is the traditional relational database dead, then? “No,” said ING’s Longo. “Not for companies with lots of legacy systems.” Most enterprises rely on relational databases for enterprise information management and access, and this will remain the case for a long time to come. But they now have an alternative that will open up new avenues of information and access where it has not been possible before. Tredennick says he has seen the pendulum swing back and forth between database query and search for some time now, and agrees that both approaches are needed for managing and making sense of the terabytes upon pedabytes of data now out there. Database queries can be pretty powerful, he explains. "A modern search engine has fields, but they're not really set up to do a lot of the things that the database fields do. Databases are incredibly well-adapted to handle immediate index changes to fielded data. And frankly, information from field searches and field displays quite well." However, databases can’t handle "big wide-ranging searches that involve a lot of components in no particular order, and mixing text and fields," he continues. "And databases are extremely slow for that. We saw situations where databases, if you ran field searches and you had things tuned, would bring results back in 0.2-0.3 seconds. And you could even throw a text search at them, if that was tuned, and that would come back. But the minute you started mixing them -- fields and text -- response time slows to two or three minutes, and performance degrades substantially." However, in a business that needs to serve up documents, "you need a database as a container, to hold the fields," Tredennick says. "In our world, users have to know the exact count of documents, and have it sorted, maybe by date range, control numbers, or some other criteria. Google sorts by relevancy, but anyone can do that." Posted by joemckendrick in Decision Support | Digg This | Add to del.icio.us Trackback Pings TrackBack URL for this entry: |















