I wrote this article after reading NoSQL Distilled, A Brief Guide to the Emerging World of Polyglot Persistence by Pramod J. Sadalage and Martin Fowler in the summer of 2016. I was incented to read the book after seeing Martin Fowler give a presentation on NoSQL at a 2013 GoTo Conference presentation, which can be viewed at https://www.youtube.com/watch?v=ASiU89Gl0F0.
Up to that point the whole concept of “NoSQL” was very fuzzy to me, but after experiencing his presentation and reading the book I realized that the “lynchpin” that held RDBMS and NoSQL together was a concept that many SQL database architects have been working with for years, the common de-normalized data structure. One problem (among several) with de-normalization has always been how do you systematically de-normalize data so that two de-normalized data-sets can be compared to each other in a consistent and repeatable manner to assure reliable analysis by different parties, over time and in different locations. In other words how do we maintain the old “apples-to-apples” and “oranges-to-orange” paradigm.
————————————————————————————————————
Six Basic Interrogatives, or 6BI for short, provides an approach for organizing and understanding the context of data used to support business decision making. I first wrote about 6BI back in the 2004-2005 time frame when I was working full time as a Business Intelligence Data Architect and needed a quick way to get started analyzing existing data in order to organize it to answer business questions. Since that time I’ve had other opportunities to put 6BI to use in situations not normally identified as business intelligence, certainly not the traditional type which focuses on a central RDBMS based data warehouse. I have found it holds up well, as I thought it would, in various scenarios and has versatility and applicability in just about all the computing situations that I have found myself in.
With the increased popularity of distributed computing, fueled mostly by the overwhelming up take of mobile devices both for business as well as personal use, new highly scalable technologies for persisting data have been developed. These technologies tend to rely more on an expanding network of smaller computing units as they grow by scaling out to more units, as opposed to a small number of large and ever growing computing units which grow capacity by scaling up. Scaling out is most often referred to as “horizontal scaling” while scaling up is referred to as “vertical scaling”. There is no doubt that horizontal scaling is more popular today than vertical scaling. Going hand-in-hand with the horizontal scaling trend has been the introduction of alternative data stores now collectively known as NOSQL databases.
Three types of NOSQL databases that are designed to store a rich structure of closely related data that is accessed as a unit were identified by (Sadalage and Fowler 2013)[i]. They call this type of database an aggregate-oriented database because its basic storage structure is conceptualized differently from that of a traditional relational database management system (RDBMS). An RDBMS stores its data in tables, or more precisely in relations. Relations allow data to be normalized into units (tables) which contain data that is dependent only upon the primary key of the entity. From a logical perspective we can say that each set of attributes (i.e. a row in a table) describes one and only one instance of an entity which in turn is differentiated from all other instances of all other entities by its unique identifier. The columns of the table store the attribute values of the entity instance where its row intersects with its columns.
In practice this results quite often in data about customers stored in a Customer table, data about products in a Product table, data about Orders in an Order table, and so forth. This is a very powerful design because it does not pre-suppose any fixed relationship between Customers, Products and Orders, but it let’s designers and developers link them together in any way that meets business requirements even when these requirements might not necessarily be known in advance. This ad hoc linking is made possible by the SQL language[ii].
NOSQL databases, in order to provide more performance across distributed data stores, even when the nodes are distributed globally, were conceptualized and built with a different set of priorities. Non-deterministic query structure was not a priority, but instead just capturing and persisting the data as quickly as possible was the top priority. There was no need to build SQL language interpreters into the architecture of these “new age” data stores. As a result NOSQL databases end up sacrificing some of the flexibility needed to support non-deterministic query construction in favor of higher performance access in a distributed landscape, or cluster.
Inevitably this trade-off has lead to the need for designers to know or predict, in advance, how the data will be accessed, which entities will need to be joined, and in what order to answer their business questions. The loss of the luxury of not needing to know in advance exactly how your data will be accessed is the price that is paid for better performance. Fair enough! Nothing comes without its cost, especially in the world of computing. The important thing to remember is that there will always be a trade off[iii].
There are three aggregate-oriented NOSQL database categories that have emerged over the last several years: Key-Value, Document, and Column Family. Each is different but they share a conceptually similar data storage unit called an aggregate[iv]. A data modeler (or developer modeling an application’s data) using these new types of models needs to start thinking about how the data will be accessed earlier in the design and development process than ever before. As (Sadalage and Fowler 2013) state, aggregate awareness needs to be made part of the design of the data model[v]. I can see how this could be the case when there is a lack of clarity between the role of database design and that of database development. With the advent of aggregates and aggregate-orientation the job of a business intelligence data modeler has changed from being able to concentrate on business requirements to needing to be more aware of how the data will be accessed and stored. This challenge is stated succinctly by (Jovanovic and Benson 2013) when they say NoSQL databases have opened a new non-trivial problem area for data modelers[vi].
If we do the aggregate data modeling at a level independent of the style of the data store (i.e. Key Value, Document or Column Family) however we can create implementation neutral and re-useable data models that reflect the requirements of the business as much as they reflect the requirements of the technology used to implement it. Data organized with an aggregate-orientation needs to be modeled at a logical level just as much as data organized with a relational orientation ever did. In both cases we need to start with the basic entities that describe business value for persisting and manipulating data. The need for an organizing framework for data still exists. We still need to know which real world entities (regardless of how they are instantiated in the data store) the business cares about and how these entities relate to each other.
Business Object Categories
6BI is built on a framework of six universal and re-usable Business Object Categories (BOCs). Each BOC represents a non-overlapping category of business data. Each category corresponds to one of the six basic interrogatives: Who, What, Where, When, How and Why. Figure 1 shows the six basic interrogatives and the business object category which is derived from each including example sub-categories which will be used in subsequent business data model examples.
Basic Interrogative | Business Object Category |
Who produces the data used to measure performance? | Parties[vii] (e.g. Customer and Provider) |
What is being manipulated or exchanged to produce measurable performance? | Things[viii] (e.g. Product and Payment) |
How are the data values that measure performance produced? | Activities (e.g. Exchange and Process) |
When does the activity take place or for how long is the performance measured? | Events (e.g. Point in Time and Period of Time) |
Where does the activity used to measure performance take place? | Locations (e.g. Address and Placement) |
Why does the data actually measure performance? | Motivators[ix] (e.g. End and Means |
Figure 1. Six Basic Interrogatives and Business Object Categories.
In the 6BI Framework, Parties and Things have no direct association with each other. It is only through the other four Business Object Categories (Activities, Events, Locations, and Motivators) that Parties and Things are associated. The left side of Figure 2 shows the relationship of Parties to Activities, Events, Locations and Motivators. The right side shows the relationship of Things to the same four BOCs. This makes sense because only when parties engage in activities, exchanging and manipulating things between them, do events get generated at certain locations and reasons why become relevant. Figure 2 illustrates the relationships between business object categories, both direct and indirect.
Figure 2. The indirect association of Parties to Things.
These business object categories provide the framework and starting point for classifying data that are need for non-deterministic query structure. This design allows enough flexibility to support ad hoc reporting by focusing on the basic questions that all decision support data structures need to support, while not requiring a strictly relational storage structure underneath.
In the next part of this article we will show a method for transforming a 6BI derived logical data model into an aggregate-oriented data model.
[i] Pramod J. Saladage and Martin Fowler, NOSQL Distilled, A Brief Guide to the Emerging World of Polyglot Persistence, Addison-Wesley, 2013. Sadalage and Fowler identify four (4) types of NOSQL databases. The fourth being Graph databases which we will not be discussing in this article.
[ii] SQL is not the first relational database language. In the 1980s each relational database product had its own language, each of which did pretty much the same thing. SQL simply won the “database language war”. It is often criticized for being, among other short comings, “less than perfect”. But like the human communication vehicles known as “natural languages” it too has evolved to fit its purpose and become widely accepted as a result.
[iii] This is the reason why the choice of using a NOSQL style database is, or should be, as much a business decision as a technical decision. What needs to always be taken into consideration are the possible hidden costs of implementing a database solution that does not support SQL. Especially in light of the fact that SQL has become a virtual “lingua franca” of data analysis and business intelligence. These costs may be hidden from the group making the NOSQL database decision since it is quite likely they are not the same group that uses SQL to answer business questions.
[iv] Sadalage and Fowler, page.13.
[v] Sadalage and Fowler, page.17.
[vi] Vladan Jovanovic and Steven Benson, Aggregate Data Modeling Style, Proceedings of the Southern Association for Information Systems Conference, Savannah, GA, USA March 8th–9th, 2013, page 75.
[vii] Parties are often first specialized into “Persons” and “Organizations” but whether a party is only one or a group is not relevant to our discussion here. Instead I used a sub-categorization that is more reflective of the role that a party plays in an exchange.
[viii] In earlier articles on 6BI the term “Product” as the name of the 1st level BOC that corresponds with the “what” interrogative was used. Product however is not universal enough. For lack of a better term I use “Thing” to include Money along with Goods and Services.
[ix] In earlier articles on 6BI the terms “Plans” and “Rules” were used as BOCs that correspond with the “why” interrogative. But these are only Means, not Ends, and “Why” needs to include both Ends and Means.
Tags: 6BI, Aggregate-Orientation, Business Intelligence, Data Modeling, NOSQL
Leave a Reply