BI in Action Blog

« July 2007 | Main | September 2007 »
August 29, 2007
Will Search Change the Way We BI?

"Can search deliver on the promise of ubiquitous BI? And, if not ubiquitous BI, how or why will search change the manner in which organizations generate and consume BI?"

To answer these questions, Steve Swoyer recently spoke with some leading voices in the BI space, and comes to the conclusion that the jury is still out on these questions, but search is showing potential in as a BI front-end tool. "Search vendors," he writes in a TDWI report, "cite the can’t-miss Web search model on which the technology is based and say that enterprise search has a proven usability track record."

Search isn't likely to replace more established reporting, BI, and enterprise data warehouse approaches, but enhance them or fill in gaps where information is difficult to access. Jill Dyche, a partner with and co-founder of BI and DW consultancy Baseline, and one of the most respected authorities on all things BI, said that she is seeing forward-thinking companies tap search technologies to complement existing BI and data warehouse implementations. Search tools also help clear up confusion and redundancy in larger organizations, which may have thousands of documents and hundreds of database instances scattered about.

Plus, there's the search appliance. Dyche is quoted as observing that "most of our clients embarking on search are using it to track and manage their reports using search appliance technology. Hyperion—now Oracle—does this very well, using a Google search appliance. By exposing the metadata from BI tools, the search appliance can find reports and other documents and make them available to anyone with Web access."

Steve cites an example of search-BI query fusion: "A more advanced BI search use case involves indexing reports across multiple BI platforms." Often, end-users need to search multiple systems one at a time in an attempt to gather all relevant reports.

However, Steve adds, because there are so many vendors offering so many BI platforms out there, that it may be a challenge finding a search engine that can support so many formats.

Posted by joemckendrick in  |  Permalink  | Comments (0)  | TrackBacks (0)

August 27, 2007
BI Leader: BI Will Incorporate More Web 2.0, Social Networking, SOA

Business intelligence interfaces will borrow more form and function from the consumer space, and as a result, be simpler to use while delivering more sophisticated results.

That's the view of Don Campbell, vice president of product innovation and technology at Cognos. One of my colleagues over at ZDNet, Larry Dignan, just reported on a chat he had with Campbell, who talked about the changing face of business intelligence.

“The technology at home is creeping into the enterprise,” Campbell said. "At home we all Google to find information.
At work we have no idea where things are." Increased simplicity in BI interfaces will help move it out to a larger base of users, beyond analysts and numbers wonks. For many employees, BI will run, unseen, in the background of applications, Or it may be a simple search result, or as part of a social networking experience.

To emphasize this point, Larry Dignan observes that Cognos has been hooking into search providers such as Google, Yahoo, Autonomy, Fast and IBM. "Under this concept, BI data would surface through a simple text box. You type in third quarter revenue and you’d get a chart, just like Yahoo or Google gives you the weather. Search on 'raincoats in Milan' and BI should return product specific sales by region via a simple search box."

SOA is also a trend reshaping BI systems, Campbell said. Cognos' latest platform, Cognos 8, ditched a legacy
infrastructure in favor of a service oriented architecture. As a result, Cognos can now tap into various systems–an important point given the company is agnostic when it comes to enterprise applications. Cognos can also use SOA to tack on new features. “Our footprint is much smaller now,” Campbell said.

What's next for BI systems? GPS is one exciting area of application, Campbell said. The other is "learning from the user:" Campbell says the next challenge for BI tools is learning from users, noting that today's BI tools "spit out information without much input from users." In the future, he said, BI systems will incorporate user tags and commentary. “The next generation of BI will be more comfortable understanding unstructured data,” he said. “Unstructured data will be as important as structured data.”

Posted by joemckendrick in  |  Permalink  | Comments (0)  | TrackBacks (0)

August 24, 2007
Big Data --Why Size Doesn't Necessarily Matter

A couple of months back, I had the opportunity to talk with Alex Spinelli, CTO of TheStreet.com, about the ways his company leverages the massive amounts of content and data surging through its servers each day. Jim "Mad Money" Cramer alone is probably enough to fill a few servers. (A posting of the Database Trends & Applications article this interview originally appeared in can be found here.)

Spinelli told me that TheStreet.com maintains more than a terabyte’s worth of data in various forms - articles, alert data, company data, and trad­ing data. While the online information service’s data store totals well into the terabyte range, the company’s data man­agers prefer to keep its data in a distributed format.

“We addressed the problem of having lots of lots of information on lots and lots of data servers by cutting it up into smaller segments,” Spinelli said. “Each one is a bit more man­ageable and allows us to have a bit more flexibility, since the information is specialized.”

What to do with all this data? Just a few years ago, when the largest databases were only just starting to top the 1TB mark. Now, a terabyte is almost commonplace, said Richard Winter, president of Winter Corp. In fact, a terabyte equals “only a few disk drives these days."

The ability to effectively manage these large stores of data leads to greater business agility, Spinelli said. That’s why he prefers to maintain large data stores in a distributed fash­ion, linked by grid and clustering tech­nologies. “ The landscape changes quickly for businesses such as ours,” he said. “The ability to be agile and flexi­ble is one of the most important things I can deliver to my business.”

TheStreet.com is deploying grid and clustering solutions to avoid building “a giant huge database running on big subsystems,” Spinelli explained. “We can actually be very smart about building out a modular database that scales horizontally and lets us still slice and dice as we need to, and be very flexible and agile, but have it all within the same management systems. That will enable us to very quickly move in different directions.”

Posted by joemckendrick in  |  Permalink  | Comments (0)  | TrackBacks (0)

August 20, 2007
Will Search Replace Query?

I recently had the opportunity to speak with John Tredennick, CEO of Catalyst Repository Systems, an operation that manages and provides millions of documents to law firms and corporations that seek relevant information for pending cases or regulatory reviews. As Tredennick puts it, regulatory reviews and corporate law cases now require "truckloads" of documents, versus a mere single briefcase stuffed with documents as of just a couple of decades ago.

Catalyst invokes both search technology and more traditional database queries as part of its information access capabilities, and as Tredennick puts it, has found a happy middle ground between the two approaches.

Lately, as a matter of fact, there's been a lot of attention and excitement brewing around the search approach to data access as a cheaper and faster alternative traditional queries. A few months back, I heard executives from ING and Merrill Lynch talk about their employment of search as a faster and more cost-effective option to databases. (Original post here.)

For example, Edward Longo, VP of information technology for retirement services at ING, felt that running SQL queries in batch mode against databases would be too slow for providing data access to six million retirement plan participants within 40,000 retirement plans serviced by the company. This methodology kept their systems busy until 4 am each morning, and the workload was growing, he said. The company's data mart needed to store 500 GBs of data, covering 400 million transactions covering 18 months of history. The load time for all this information was seven to 10 hours, he said. With search technology in place, the company was able to increase its levels of aggregation from four levels within the relational databases to 140 available aggregations.

At Merrill Lynch, a single search and discovery portal was employed to replace SQL-based queries to multiple silos of data across various units and services across the globe. “SQL was not ideal for searching — it was too slow, said Zach Friedland, vice president of enterprise data solutions for Merrill Lynch. The firm’s EDS Search portal (for Enterprise Data Solutions) links against messaging across the enterprise. “We have no data warehouse at all here, since we’re processing the same messages that we use to send to our systems,” he said. Friedland’s team did, however, build a data warehouse off of the navigators used for the search and discovery portal. “We built a warehouse off of the search engine, which is the reverse of the way it’s usually done.” he said.

Is the traditional relational database dead, then? “No,” said ING’s Longo. “Not for companies with lots of legacy systems.” Most enterprises rely on relational databases for enterprise information management and access, and this will remain the case for a long time to come. But they now have an alternative that will open up new avenues of information and access where it has not been possible before.

Tredennick says he has seen the pendulum swing back and forth between database query and search for some time now, and agrees that both approaches are needed for managing and making sense of the terabytes upon pedabytes of data now out there.

Database queries can be pretty powerful, he explains. "A modern search engine has fields, but they're not really set up to do a lot of the things that the database fields do. Databases are incredibly well-adapted to handle immediate index changes to fielded data. And frankly, information from field searches and field displays quite well."

However, databases can’t handle "big wide-ranging searches that involve a lot of components in no particular order, and mixing text and fields," he continues. "And databases are extremely slow for that. We saw situations where databases, if you ran field searches and you had things tuned, would bring results back in 0.2-0.3 seconds. And you could even throw a text search at them, if that was tuned, and that would come back. But the minute you started mixing them -- fields and text -- response time slows to two or three minutes, and performance degrades substantially."

However, in a business that needs to serve up documents, "you need a database as a container, to hold the fields," Tredennick says. "In our world, users have to know the exact count of documents, and have it sorted, maybe by date range, control numbers, or some other criteria. Google sorts by relevancy, but anyone can do that."

Posted by joemckendrick in  |  Permalink  | Comments (0)  | TrackBacks (0)

August 19, 2007
DataMirror Reflects IBM's Integration Ambition

"Mirror, mirror on the wall, who is the fairest data integration vendor of them all?"

A narcissistic IBM might well be asking itself that question following its recent $162m acquisition of DataMirror.

But wait a sec. Didn't IBM buy Ascential to cover all the data integration bases? Apparently not it seems. One area where Ascential lacked was in real-time data replication and change data capture, areas that DataMirror excels in.

DataMirror's core Transformation Server software does a lot more than just move high volumes of data directly between relational databases, message queues, and other data stores. It also detects changes in data sources (additions, updates or deletions) and manages the replication, thereby enabling the changed information to be delivered at the actual moment when the change has been made. Being able to track data changes as they occur and respond accordingly is becoming important for dynamic data warehouses, particularly those operating in high volume, quick-sale retail environments.

DataMirror also hands IBM a robust offering complete with heterogeneous data support in real-time. Of course that plays nicely to IBM's "On Demand" computing vision -- that of enabling the operational real-time, event-aware enterprise. Data replication, by its very nature, works well in real-time and the technology has its roots in high-availability database applications for reporting, backup, migrations and consolidation as well as disaster recovery, all of which require mirrored copies of every database transaction to be maintained in real-time, so that the mirror can ensure uninterrupted operation in the event of a server outage.

Given that diversity of its user base, it's understandable why IBM plans to continue to offer DataMirror's data integration, auditing, high availability, and data replication software as stand-alone products. But over time they will also become absorbed into its flagship Information Server, which is fast becoming a homestead for IBM's broad portfolio of homegrown and acquired data integration software.

IBM has yet to detail its specific plans about integration. But there are several technical considerations to take into account:

One of the main strengths of DataMirror is its database support. Hence, IBM should make sure that DataMirror's neutrality is maintained, even though it has a vested interested in promoting its own DB2 database system. DataMirror already has close partnerships with Teradata, Netezza, and Oracle. Interestingly DataMirror was quick to pledge support for Oracle's new 11g database less than a week after the acquisition was announced. IBM also intends to continue DataMirror's partnerships with other vendors, notably BEA Systems, Business Objects, Microsoft, and Oracle. Keeping such relationships alive are critical to ensuring that IBM can continue to offer a heterogeneous data integration.

Given the sheer breadth of IBM's product portfolio and technology acquisition is bound to result in some degree of overlap. That's also the case with DataMirror, but not to an extent that creates a significant amount of redundancy. A quick look at IBM integration portfolio shows some overlap with IBM's Q Replication and Data Propagator products as well as the rudimentary replication capabilities built into DB2. But DataMirror's technology is functionally superior and more broadly applicable to the diverse range of real-time data processing environments envisaged by IBM's Information On Demand strategy.

DataMirror seems a good fit with IBM's broader vision for real-time integration. For example, there are obvious links between DataMirror's real-time software and IBM's own real- and batch-oriented ETL, business intelligence, enterprise service bus, and MQSeries message queue integration technologies.

Besides the technology intellectual property that IBM gains, the company also brings on board DataMirror's considerable technical and marketing expertise. So far IBM has not imposed a hiring freeze on its DataMirror division that suggests the company is still thinking about growing the business. Historically IBM has not had a great track record in shepherding acquired skills. But its recent acquisition of customer data integration firm DWL seems to have gone smoothly from an organizational perspective.

So by acquiring DataMirror IBM gets its paws on a very mature and advanced change data capture and replication tool that is still modestly priced against some of its competitors. The move shouldn't really come as a surprise as IBM had partnered closely with DataMirror to provide its own data integration customers with those core competencies. But over time it has become clear that IBM needed to own this kind of technology as more and more of its data integration customers demanded it.

A big chunk of DataMirror's 2,200 customer base part of that has come from IBM referrals. But IBM didn't make the purchase to buy-in customers. It was more interested in DataMirror's technology to shore up a functional weakness in its own data integration platform. Competitively DataMirror now positions IBM more strongly against Acsential's arch-nemesis Informatica, particularly in the real-time ETL space for enabling operational BI and even event-aware analytics.

It's more than likely that DataMirror's software will eventually find a permanent resting place in IBM's new Information Server, a platform it is investing heavily in right now. And because IBM has laid out grand plans for Information Server that extends beyond data replication, it probably isn't done buying smaller software firms and technologies to round out the platform. IBM's customer base is quite diverse and it's likely that it will need multiple products even of the same type, including replication.

From a birds-eye view, the acquisition of DataMirror is also part of a general push by IBM to expand its software holdings and revenue. It could well be a precursor to many more similar types of technology-focused acquisitions that IBM does in the second half of this year. IBM has already spent around $1bn on acquisitions this year alone. It still has roughly $4bn to spend before it meets its stated objective of keeping up with the 13 companies it acquired in 2006. DataMirror is just the type of acquisition that reflects a shift away from lower-margin computer hardware to more profitable software sales. In a nutshell software is making most of IBM's money right now.

Posted by madansheina in  |  Permalink  | Comments (0)  | TrackBacks (0)

August 17, 2007
By the way...

I have begun a multi-entry discourse/diatribe about getting back to basics and first principles as the foundation of a truly effective strategy for business process management (BPM). I believe this argument applies equally well, if not more so, to BI efforts. So I humbly yet eagerly encourage you to visit the "BPM in Action" blog, and check out my entries on "BPM Back to Basics." And feel free to comment liberally!

Posted by mdortch in  |  Permalink  | Comments (0)  | TrackBacks (0)

August 06, 2007
To SaaS or Not to SaaS? That is the Question for BI Vendors

Speaking at the recent Pacific Northwest BI Summit, Claudia Imhoff weighed the advantages and disadvantages of delivering business intelligence through the Software as a Service model. (Podcast available here as an audio download.)

Claudia observes that in most cases at present, BI over SaaS is mainly operational versus strategic, and the growth of SaaS-delivered BI in a strategic sense is an open question.

She states that there are four primary advantages to BI vendors (and ultimately to end-user customers) in going the SaaS route:

For one, the vendor has to support one platform and one version of their software, versus multiple OSes, paltforms, and versions. "That’s a pretty impactful thing in a software business," Claudia says. "The decrease in development costs can be significant."

In addition, SaaS deployments provides the vendor "real insight into how customers are using their software. "What features are they actually using, and which ones do they never touch? They get to see every move, every feature, every function that is being used by their customers.… ...any vendor would kill to have that kind of information."

In addition, Claudia continues, the software can remain light and agile. "Vendors don’t get trapped into this feature-bloat thing," she observes. "In a traditional model, a software vendor has to keep coming up with new features and new functionality so they can sell another version."

Posted by joemckendrick in  |  Permalink  | Comments (0)  | TrackBacks (0)

August 02, 2007
BI as a Service: An Idea Whose Time Has Come?

Hamilton Beach-Proctor Silex, a major consumer appliance manufacturer, facilitates its market research through SaaS-delivered applications provided by statistical tools vendor SPSS.

A couple of months ago, I had the opportunity to speak with Tracy Trawick, consumer insights manager for Hamilton Beach, who talked about her company's mixed use of SaaS and on-site business intelligence tools.

Hamilton-Beach's market research department primarily relies on on-site software, but began tapping into SPSS's SaaS-based platform to cover a growing workload that end-users simply don't have time to sort through. "All the data manipulation on the back end is supported by SPSS," Trawick said. "We're using it for very template-based iterative projects so we don't use up a lot of programming time. I don't have the time to learn the logic and the programming tools."

Trawick uses the SPSS SaaS platform to create and launch consumer market research surveys, and manipulate the data as it comes in. However, she still employs SPSS on-site software for more complicated projects.

Does the BI as a service trend have legs? Our blogmate here at BI in Action, Madan Sheina, recently took a hard look at the trend. Can it work? Or, as Madan put it, is it wheat or chaff?

While SaaS is still a nascent trend in BI, it is gaining in popularity, especially among small to medium-size businesses, Madan said. However, BI SaaS vendors face Microsoft in the market. Plus, Madan said, "moving to SaaS is not as easy as simply having your BI software hosted." Data security is a huge issue, he notes. Customization is another. "Vendors offering their BI solutions as SaaS must also overcome the significant hurdles of data privacy, reliability of service, working out commissions for channel partners, and grappling with a new sales model."

Another ebizQ colleague, James Taylor, also talked out BI and analytics delivered via SaaS in a recent post.
James notes that decision management can be delivered via SaaS, noting that his company, Fair Isaac, offers such capabilities. "SaaS BI is interesting as a way to provide support for those decisions that could not be automated 100%," he says.

Posted by joemckendrick in  |  Permalink  | Comments (0)  | TrackBacks (0)


Oracle 11g -- Not "The BI Release" You Were Hoping For

So Oracle has announced its next generation database platform Oracle 11g that packs in nearly 500 enhancements and new features, promising improved performance, accelerated change management, higher scalability, easier administration and reduced cost. But what's in it for business intelligence and data warehousing?

Well basically there are three areas that might put a smile on the faces of Oracle data warehousing gurus:
• OLAP-Based Materialized Views
• Advanced Partitioning
• Accelerated Query Performance.

SQL-Flavored OLAP
The biggest new BI feature in 11g is undoubtedly embedded support for online analytic processing (OLAP) cube-based management of materialized data views -- composite slice of data in multidimensional cube data that are accessible by standard SQL commands.

Put simply; Oracle has embedded an OLAP engine into 11g to store and efficiently manage millions of these materialized views.

So why are materialized views important? One reason performance. Materialized views are sort of pre-fetching used to speed multidimensional queries -- for example to calculate sales across products, regions or customers -- by presenting logically pre-aggregated data sets to users.

Oracle uses an OLAP cube to store millions of materialized views so that they can be managed more quickly and efficiently. First it uses OLAP cubes as a transparent performance accelerator inside the relational database system itself. Then it offers the core manageability features of 11g to track data changes in the underlying data sources so that those changes are incrementally refreshed (usually daily or nightly) to the materialized views stored in the cubes.

But performance and manageability aren't the only benefit to users. The ability to use standard SQL tools and applications to access and slice-and-dice multidimensional OLAP cubes, without users knowing they are using OLAP, is also key. The Oracle 10g database allowed users to access OLAP cues. But they had to write specific SQL to specific views. 11g lets users do this more transparently using the SQL syntax they know and love. In other words, call it Oracle's attempt to push OLAP from a specialized market to a much broader constituency of SQL-savvy users.

Interestingly the underlying OLAP engine used to drive these materialized views is neither Essbase, the marketing leading OLAP server that oracle gained from its recent acquisition of Hyperion Solutions, nor Express, its legacy product that acquired from IRI Software over a decade ago. Rather it’s a separate OLAP server that was designed by Express engineers to be more embedded into the database.

The feature is deemed important by Oracle as it claims over 60% of its data warehousing customers use materialized views in their implementations today. Oracle hopes its new embedded support will grow this figure, with the stated aim of pushing OLAP into everyone of its data warehouse implementations.

Advanced Partitioning
Oracle first introduced an optional partitioning scheme in its Oracle 8 database to deal with high data volume environments. 11g evolves the partition scheme's scalability and manageability features with more sophisticated automated, rules-based partitioning and storage management, composite partitioning, and extended partitioning methods like interval, reference, virtual columns.

Oracle calls 11g "its most significant partitioning update" over the last six major database releases and says that almost all of its data warehousing customers use the partitioning capabilities. It notes that partitioning plays a key role in the base enterprise data warehouse foundation scheme, where data is typically held in a granular 3NF schema and where the largest tables, joins and data loading reside.

Accelerated Query Performance
Oracle always makes a point to bump up performance in every major platform release. In 11g this includes the introduction of a new caching engine called Query Results Cache to accelerate query performance. The Cache stores and reuses the results of often called database queries and functions in database and application tiers. The cache is subsequently consulted whenever a repeat or similar query is fired off against the database.

Oracle claims that 11g's cache implementation is far more sophisticated than a standard cache that simply stores query results and is consulted every time a query is resubmitted. It goes further in two ways:

• Users can make intelligent decisions on which results to put in the cache based in criteria like query format, how long the query took to run, how big the results set is, etc.
• The query does not have to exactly match the results in the cache -- for example, it could be a piece of a larger query.

Oracle expects its Cache to absorb more sophisticated layers of functionality like: dependency tracking (to make sure the cache is up to date); invalidation logic (to determine if the results are out of synch with underlying data changes); and automatic updates (to keep pace with data changes).

Wrap-Up
In no way can you call Oracle 11g "The BI Release" that Microsoft SQL Server 2005 was billed to be. The BI and data warehousing improvements in 11g are incremental rather than mind-blowing. Nevertheless it does offer enough to keep a smile on the faces of the tens of thousands of Oracle data warehousing customers.

The most pertinent upgrade for BI is 11g's OLAP-cube based management of materialized data views. But users shouldn't overplay then significance of that technology. A materialized view is useful for what. But it is only an enabling technology. By itself it won't deliver a new generation of end-user analytics. By the same token, the performance-enhancing query cache will put a smile on the faces of performance-pressured DBAs. But it won't necessarily transform BI.

The real BI story in 11g is really a combination of OLAP technology that is accessed transparently by SQL-based applications. In other words OLAP cubes are used as a query performance accelerator inside the relational database without SQL applications and tools knowing they are accessing those cubes. But wait a sec: isn't that really a way of Oracle saying that traditional OLAP slice-and-dice, drill-up/down and pivot analysis has been too expensive, slow and complex to achieve among its own data warehousing customer base up to now? If so then the materialized views feature now attempts to fix a sub-optimal workaround that oracle had previously offered. However there is another rub. The reliance on materialized views certainly harks back to a need for massive pre-aggregation of data that traditional multidimensional OLAP (MOLAP) engines used to be attacked for. Think data explosion.

Nevertheless the new BI features in 11g do represent another big step in the commoditization of BI. In the case of materialized views, Oracle is offering OLAP as a core function of the relational database platform. Of course that's an area where Oracle is playing catch-up to Microsoft which has spread BI to the masses through SQL Server’s OLAP Services.

What's notable however is that Essbase isn't being tapped in 11g. But as a market leading OLAP server that is arguably more robust that Oracle's own offerings, it is only a matter of time before it is pushed closer to the core relational database kernel.

Posted by madansheina in  |  Permalink  | Comments (2)  | TrackBacks (0)

 

Partners:

Premier Media Partner
Gartner

Association & Media Partners
BPMG ConnectIT eChannelLine RFG Group TEC OMG theOpenGroup GIM BPM Forum BIJ Online BPT Trends BPT Trends