Friday, February 27, 2009

Firehose continued …

Firehose continued …

I’m writing this at 38,000 feet (so they tell us) after four long, intense days at the TDWI conference. I’m still wired (maybe it was the Mocha Frappucino at the airport), and my book is done, and it seems as good a time as any to try and get some impressions down.

Las Vegas: I found it daunting. The trip from the airport to Caesar’s Palace took so long, and the streets were so full of people, that I had no desire at all to revisit those streets once I got into the hotel. The only shows I wanted to see were more money than I could justify on entertainment, so I stayed in. Between my traveling book collection (the Prydain series by Lloyd Alexander, classics I first read in Junior High and not since), long conference days and a broad selection of restaurants, I never needed to leave the hotel.

TDWI: This was my first TDWI (The Data Warehouse Institute) conference. It won’t be my last. I’m very impressed with this organization. The focus is on relevant education, and I found all four of the day-long sessions I took to be fascinating, taught by knowledgeable and highly-experienced practitioners (not just trainers following a curriculum). At work we’re in the process of building our first real data warehouse, and I found both validation that we’ve done a lot of things right just by thinking hard and applying good sense – yes, starting from business questions and mapping those to conceptual entities then to actual data elements was a good idea – and learned a number of refinements we can apply to make our ongoing design work better.

Two different classes in dimensional modeling yielded two very different paths to very similar outcomes, each with advantages and disadvantages. I suspect I won’t be able to resist synthesizing them, taking elements from each approach. Mapping all or most of the data elements from the source tables into Fact Groups, without worrying too much about whether they all are “needed” to answer questions the business is ready to ask yet, satisfies the data-packrat in me. After all, sometimes our business customers don’t even know a particular element is available, so how would they think to ask for it? On the other hand, filling out a Fact-Qualifier Matrix is a great way to be sure I’m headed the right direction, and our own similar approach turned up required elements we might not otherwise have looked for, or may not have grouped with the relevant Facts, otherwise.

A class in “Predictive Modeling for Non-statisticians” sounded too intriguing to pass up, and it was indeed a fascinating class. The instructor made some fairly bold assertions about the ability to use “brute force” methods with relatively large data sets, as opposed to much more sophisticated methods that must be applied to the relatively small sets (often sample sizes of a few dozen or a few hundred) around which classical statistical science was developed. I felt he proved those assertions rather well, and they matched my own semi-scientific about the power of large numbers when trying to understand one’s data. The was not ultimately that statistical methods should not be applied, but that deep knowledge of the meaning of one’s data is even more necessary to useful analysis than academic statistical knowledge, particularly when dealing with data sets of at least tens of thousands of rows, which are much more common today than they were decades ago. Indeed, in my day-to-day work I’m often dealing with millions of rows, and we are not all that large an organization! If want a sample of people who have had a specific treatment experience, I don’t usually have to think a whole lot about the minimum acceptable sample size – if anything, I use time bounds to keep the sample size down to something small enough that I can process the data in a reasonable time frame. This lets me make assertions like “In 10% of cases where A occurred over the past 12 months, B also occurred.” I don’t state confidence or error, because confidence is 100% and error is 0. Within the time-frame stated, I’ve examined all instances, not a sample.

hiatus

The loaner laptop ran out of battery on the plane, so I had to shut down. 24 hours later, I can say it was good to be back in the office today. Along with some "real work", I downloaded the InfoBright Open-Source Column-Store DBMS just as a little skunk-works project. I have it in mind to convert my most-used set of tables from the operational snapshot I spend most of my time in to a star schema and compare query performance between different DBMS's. If building out the start gets to be too much fuss, I'll just port the existing tables straight over and compare with them. After all, I already know how to write those queries!

Sunday, February 22, 2009

Drinking From a Firehose

It's hardly a novel metaphor, but it can be an apt one. Last Monday and Tuesday I sat in a darkened room at the Microsoft campus being indoctrinated, along with 50 or 60 others, in the way of WPF and Silverlight (XAMLFest - there's a name). The primary developer/presenter also wanted to show us the value of the M-V-VM pattern, and of course it was all done using CS2008 and Expression Studio. As someone who is just now struggling to learn enough of the .NET way to port a Classic .ASP site/web app to ASP.NET and (probably) LINQ, it was quite a heady experience. Add to that the fact that all the coding was done in C#, and I've been (except for a foray into Python a few years back) strictly a VB guy, and I frequently found myself largely lost. Still, I came out with an appreciation for Microsoft's seriousness in making a major chunk of the user experience available through WPF portable to RIA's (Rich Internet Applications, for the TLA-deficient) while asking developers and designers to learn a minimum of new/different techniques. I may even come up with a use for some of that spiffy Silverlight presentation-layer stuff in my day job as a Business Intelligence analyst, in the form of custom visualizations - we'll see.

Now I'm getting ready to fly to Las Vegas for a TDWI (The Data Warehousing Institute) conference. Four days of getting my head filled with Data Warehousing goodness. After years of learning to think relationally to the point that, when someone comes to me with a question or problem I automatically start writing SQL, I'm going to (hopefully) learn to think dimensionally. I've dipped my toes in the water a few times and I'm intrigued by the promise of power and fluidity. But I have a feeling I'm going to find myself once again trying to sip from a two-inch nozzle from which that enticingly-cool water is spraying with enough force to knock me over. I think I'll get a drink, but I'm likely to be drenched and exhausted when it's done.

Salut!