This is the second part of a two-part post. The first part can be found at https://waynekurtz.wordpress.com/2016/09/06/introduction-to-6bi-and-aggregate-orientation/.
To transform a 6BI (“Six Basic Interrogatives”) derived logical data model into an aggregate-oriented data model there are several steps that need to be taken. We will use the logical business data model shown in Figure 3 as our example to show this transformation. It’s very simplified but fairly typical of certain types for commercial enterprises.
Figure 3. A Business Data Model Example.
Semantically we can say the following about our business data model. A Customer can have many Contacts, and a Contact can have many Customers. The Customer_Contact entity[i] resolves this by linking a Customer to its Contacts through the “has contact” relationship and a Contact to its Customers through the “has customer” relationship. A Customer “places” Orders, “receives” Order_Invoices, and “makes” Order_Payments. A Provider “receives” Orders. An Order “contains” Order_Items and a Product “identifies” Order_Items. An Order_Invoice “is paid for by” Order_Payments and a Payment “is applied to” an Order_Payment.
Unlike data modeling where the outcome, as mentioned in Part 1, is typically the creation of tables which separately store instances of entity types linked together by key-based references, the outcome of an aggregate-oriented data model is the creation of aggregates. An aggregate is a complex data structure based on a “root entity”. The root is determined by the business requirements and is identified as an entity type in an LDM. This entity is often derived from a 6BI BOC (“Business Object Category”). Aggregates are stored as a nested collection of entity instances whose relationship to the root is also determined by business requirements. How deeply an entity type is nested within the aggregate structure is determined by where it is placed along a dependency gradient.
Dependency Gradients
A dependency gradient is a continuum of degrees of dependency where the amount of dependency flows from a higher concentration to a lower concentration. It flows from entities that have a greater number of dependencies, both direct and indirect, to entities with a lesser number of dependencies. An entity that consumes no other entity’s identifier (i.e. it has no foreign keys) is referred to as the “apex entity” of the continuum and has the least dependency. An entity whose identifier is not consumed by any other entity along the continuum is referred to as the “base entity”. The relationship between a base entity and an apex entity could be direct, however there are often many potential intermediate entities between the base and the apex entities.
Figure 4 shows two dependency gradients that flow from the same Base Entity Order_Payment (a type of Exchange).
- Dependency Gradient-01 flows through the relationship “is applied to” directly to Apex Entity-01 Payment (a type of Thing).
- Dependency Gradient-02 flows through the relationships “is paid for by” to Order_Invoice (a type of Exchange), “is billed thru” to Order (a type of Exchange), and finally “places” indirectly to Apex Entity-02 Customer (a type of Party).
Figure 4. Dependency gradients flowing from Order_Payment.
Dependency gradients form dependency trees when the flow from base to apex takes more than one path. The term gradient[ii] is used to indicate the relative degree and direction of dependency along a tree. Dependency always flows from a higher concentration (more foreign keys) to a lower concentration. Aggregates in an aggregate-oriented data model are based on dependency trees.
Steps for Transforming the LDM
The following ordered set of steps are required for transforming a LDM into an aggregate-oriented data model.
- Select the root entity for each aggregate.
- Identify relationships to be eliminated by “rolling down” parent entity attributes into the child entity.
- Identify relationships to be collapsed by “rolling up” child entity attributes into the parent entity.
- Eliminate unneeded apex entities by rolling down their attributes to an entity higher on the dependency gradient.
- Roll up attributes from base entities in the direction of the root entity along selected dependency gradients.
- Create the aggregate-oriented data model.
Select the Root Entity for each Aggregate
The root entity of an aggregate, identified by the ROOT operator, can occur anywhere along the dependency gradient and is independent of the relational association (i.e. relationship) between itself and any other entity along the gradient. The other entities in a dependency tree, called aggregate participants, are embedded into the aggregate root entity instance. The embedding of participant entity attributes into the aggregate root occurs regardless of whether the participant is in the direction of the apex or in the direction of the base relative to the root. Identification of the root entity will influence which relationships get eliminated and to a greater extent determine which relationships get collapsed.
Identify Relationships to be Eliminated
Relationships to be eliminated are tagged with the REFI[iii] tag. To identify each individual action triggered by the REFI tag the tags are numbered starting with REFI-01. In eliminated relationships the parent entity is removed and all or some of its attributes are rolled down into the child entity.
Which attributes to roll down is determined by how normalized the entity is. In cases where an entity has attributes that are not part of one of the dependency gradients being aggregated the non-dependent attributes would not be rolled down, however the relationship is still eliminated in transforming the logical data model into aggregate-orientated data model.
Figure 5 shows the relationship between Product and Order Item named “identifies” tagged as REFI-03. In this case it was decided this relationship would be eliminated and the parent (Product) attributes were rolled down into the child (Order Item) entity eliminating the relationship and its parent entity.
Figure 5. A relationship tagged for elimination.
Identify Relationships to be Collapsed
This step requires modelers to take an entity, like Order in our example data model, which has other entities that refer to it and roll up the attributes of its selected child entities into the parent entity. Order and its dependent entities, Order_Item through the “contains” relationship, and Order_Invoice through the “is billed thru” relationship are shown in Figure 6. Only the relationship with Order_Item is tagged as EMBED-02. The relationship with Order_Invoice does not need to be tagged for this transformation because Order_Invoice has a separate relationship with Customer named “receives” and its attributes are rolled up along that dependency gradient (see Figure 7).
Figure 6. A relationship tagged for collapse.
Figure 7 shows the whole example model tagged in preparation for transforming it into an aggregate-oriented data model with Customer as its root entity.
Figure 7. A logical data model tagged for transformation.
Eliminate Unneeded Apex Entities
Figure 8 shows what the physical data model (PDM) looks like after certain relationships, and their parent entities, are eliminated. In this model we have eliminated the relationship “has customer” rolling Contact into Customer_Contact (REFI-01), “receives” rolling Provider into Order (REFI-02), and “identifies” rolling Product into Order_Item (REFI-03)[iv]. Please note that in the roll downs we have included the parent primary keys (now as non-key attributes). Strictly speaking we would not always need to do this. If we were to implement this data model in a relational database management system (RDBMS) the entity Provider would become a table and its attributes columns.
Again this is actually more of a business decision than strictly a technical decision, though I suspect it is rarely considered so. This is another reason why I say selection of a NOSQL database is at least as much of a business decision as a technical decision.
Figure 8. Our example data model after eliminating unneeded apex entities.
Create the Aggregate-Oriented Data Model
Figure 9[v] shows an aggregate-oriented data model based on Customer as its root entity. All the attributes of the entities that survive the roll down are rolled up along dependency gradients and collapsed into the Customer_Aggregate forming the nested collection of entity instances, which is called an aggregate.
We have the same issue here that we had in the previous section. What do we do with the entity identifiers (i.e. primary keys)? One identifier that we have to keep of course is the primary key of the root entity, Customer_Aggregate.customer_id. It is the identifier of the aggregate. The answer for an aggregate is not the same as the answer for a table. In a RDBMS we are optimizing the design of tables, in a NOSQL database we do not have tables. All the attributes of all entity types included in creation of the aggregate are considered attributes of the aggregate and no longer “stand alone” entities or entity instances. The identifiers of all aggregate participants are removed from the model.
Figure 9. Aggregate-oriented data model.
Not having tables as your “modeling target artifacts” may be a difficult and even disturbing concept for data modelers experienced in identifying and normalizing entities. For this reason an aggregate-oriented data model is a new type of data model. It is not a logical model, but it is also not a physical model in the strict sense that it can be used to generate pre-defined database schema structures (e.g. tables). It is used to relate the entity types and relationships designed in a logical data model, and the business requirements they represent, to the chosen type of NOSQL database (e.g. key-value, column family, document, and others) and the unique metadata structures each type of NOSQL database has.
A final transformation is required to create code for the construction of the appropriate aggregate-oriented NOSQL database structures. Because we typically see NOSQL data structures expressed as a mixed meta-data and data structure such as a JSON string it is easy to forget that JSON still has a schema. That schema is often maintained in the application code and not in the database engine. The advantage of this is more flexibility in application development but the cost is often a lack of the uniform constraints needed to maintain the re-useable data structures required for non-deterministic query construction and compliance to data governance standards.
In summary the steps needed to transform a LDM into an aggregate-oriented data model are the following. First, identify the root entity for the aggregate in the LDM, identify non-root parent entities to be eliminated by rolling down their attributes into their child entities, and finally identify child entities to be collapsed by rolling up their attributes into the root entity along dependency gradients that have the aggregate root as apex entity.
In a future paper I will describe how the concepts in this article can be applied to data targeted to be stored in popular NOSQL databases.
[i] It’s conventional for data modeling tools to call the rectangles in LDMs “entities”. They can also be called “types” or “entity types”. The physical form should be called an “entity instance”. In this paper I will use “entity” and “entity type” interchangeably because most of the paper deals with logical data modeling and most of the tools used for logical data modeling call them tables. Where I refer to physical modeling I use either “entity instance” of if entity is understood just “instance”.
[ii] In physics “gradient” is defined as an increase or decrease in the magnitude of a property, the change of which can usually be measured. The word seemed appropriate to name the concept of increasing or decreasing dependency.
[iii] The REFI and EMBED tags were identified by Jovanovic and Benson. Their usage in this article is derived from the paper but may differ slightly from the original meaning, including the numbering of each instance of a tag type for easier identification of each individual roll-up and roll-down action.
[iv] It is important to understand that the three eliminated entities are not removed from the data model, neither logically or physically. They are simply not part of the aggregate-oriented data model.
[v] Any reader familiar with the Erwin Data Modeler tool will recognize that I’ve used the symbol for a View to represent the aggregate. This is not to say that an aggregate is the same as a view. Erwin, nor any other data modeling tool I am familiar with yet supports aggregate oriented modeling. The COMN data modeling technique introduced by Theodore Hills is the closest I’ve seen to actually recognizing and describing a methodology for doing so. Please see http://www.tewdur.com .