Full Stack Data Science. Next wave in incorporating AI into the corporation.

October 22, 2019


I like the concept of “Full Stack Data Science”, especially the way the author depicts it in the included graphic.

One thing I would like to point out is the recognition that the process is really a circle (as depicted) and not a spiral, or line.  What I mean by that, is the path does not close between what can be perceived as the beginning “Business goal” and the end “Use, monitor and optimize”.

The results of applying Data Science to business problems not only helps solve these problems, but actually changes the motivators that drive the seeking of solutions in the first place.  Business goals are usually held up as the ends with the lowest dependency gradient of any component of any complex enterprise architecture.  While this may be true at any point in time, the dependency is not zero.  Business goals themselves change over time and not just in response to changing economic, societal or environmental factors.  The technology used to meet these goals does itself drive changes to the business goals.

A party, whether person or organization, tends to do what it is capable of doing.  Technology gives it more activities to undertake and things to produce and consume, which then feedback to the goals that motivate it.

I think this article is one of the best I’ve seen in making that point.


Machine Learning and Database Reverse Engineering

October 13, 2019

Artificial intelligence (AI) is based on the assumption that programming a computer using a feedback loop can improve the accuracy of its results.  Changing the values of the variables, called “parameters”, used in the execution of the code, in the right way, can influence future executions of the code.  These future executions are then expected to produce results that are closer to a desired result than previous executions.  If this happens the AI is said to have “learned”.

Machine learning (ML) is a subset of AI.  An ML execution is called an “activation”.  Activations are what “train” the code to get more accurate.  An ML activation is distinctly a two-step process.  In the first step, input data is conceptualized into what are called “features”.  These features are labeled and assigned weights based on assumptions about their relative influence on the output.  The data is then processed by selected algorithms to produce the output.  The output of this first step is then compared to an expected output and a difference is calculated.  This closes out the first step which is often called “forward propagation”.

The second step, called “back propagation” takes the differences between the output of the first step, called “y_hat” and the expected output, called “y” and, using a different but related set of algorithms, determines how the weights of the features should be modified to reduce the difference between y and y_hat.  The activations are repeated until either the user is satisfied with the output, or changing the weights makes no more difference.  The trained and tested model can then be used to do predictions on similar data sets, and hopefully create value for the owning party (either person or organization).

In a sense, ML is a bit like database reverse engineering (DRE).  In DRE we have the data, which is the result of some set of processing rules, which we don’t know[i], that have been applied to that data.  We also have our assumptions of what we think a data model would have to look like to produce such data, and what it would need to look like to increase the value of the data.  We iteratively apply various techniques to try to decipher the data modeling rules, mostly based on data profiling. With each iteration we try to get closer to what we believe the original data model looked like.  As with ML activation we eventually stop, either because we are satisfied or because of resource limitations.

At that point we accept that we have produced a “good enough model” of the existing data.  We then move on to what we are going to do with the data, feeling confident that we have an adequate abstraction of the data model as it exists, how it was arrived at, and what we need to do to improve it.  This is true even if there was never any “formal” modeling process originally.

Let’s look at third normal form (3NF) as an example of a possible rule that might have been applied to the data.  3NF is a rule that all columns of a table must be dependent on the key, or identifier of the table, and nothing else.  If the data shows patterns of single key dependencies we can assume that 3NF was applied in its construction.  The application of the 3NF rule will create certain dependencies between the metadata and the data that represent business rules.

These dependencies are critical to what we need to do to change the data model to more closely fit, and thus be more valuable for, changing organizational expectations.  It is also these dependencies that are discovered through both ML and DRE that enable, respectively, both artificial intelligence and business intelligence (BI).

It has been observed that the difference between AI and BI is that in BI we have the data and the rules, and we try to find the answers.  In AI we have the data and the answers, and we try to find the rules.  Whether results derived from either technology are answers to questions, or rules governing patterns, both AI and BI are tools for increasing the value of data.

These are important goals because attaining them, or at least approaching them, will allow a more efficient use of valuable resources, which in turn will allow a system to be more sustainable, support more consumers of those resources, and produce more value for the owners of the resources.

[i] If we knew what the original data model looked like we would have no need for reverse engineering.

Google Sheets or Microsoft Excel? The differences are disappearing — Quartz

September 19, 2019


Google Kills Hyper-Threading On Chrome OS In Wake Of Critical Intel Flaw

May 15, 2019


Android Authority: 8 years on from the first Chromebooks: Google was right about them

May 11, 2019

Android Authority: 8 years on from the first Chromebooks: Google was right about them.

New Feature Coming For Chromebook Extended Displays

April 13, 2019

New Feature Coming For Chromebook Extended Displays

It looks like Display Port and USB-C are required for daisy chaining monitors with Chromebooks.

The Verge: Microsoft’s Chromium Edge browser is now officially available to test

April 8, 2019

The Verge: Microsoft’s Chromium Edge browser is now officially available to test.

SlashGear: Chrome OS is a productivity utopia but it needs one more thing

March 5, 2019

Will ChromeOS replace the Windows, Linux and/or Mac OS platforms for some types of software development?  The author, though awkwardly stating it, seems to thing so.

SlashGear: Chrome OS is a productivity utopia but it needs one more thing.

howtogeek.com: What To Do When Your Chromebook Reaches the End of Its Life

February 16, 2019

howtogeek.com: What To Do When Your Chromebook Reaches the End of Its Life.

Information Entropy

February 9, 2019

Is there a maximum amount of data about any given subject above which the incremental value of any additional data begins to approach zero? Even more starkly is there a point where more data about a given subject may actually begin to have a negative effect, in that it actually decreases the amount of information about the subject?

I don’t mean that newer data may prove that old data about the subject is no longer accurate and therefore render the old data out of date. In which case the old data would still exist but no longer be relevant. I’m talking about a situation where the informational content, i.e. the payload of the data, will actually decrease. This would be a situation when we know less about a subject at some point in the future than we knew in the past, based on all the data there is on the subject. I think it would have to have something to do with context, with the passage of time, and the associations between units of data about the subject and data about other subjects. The more connections, the more information a body of data has.

This is a tricky speculation. I mean, is it possible to actually know less at some point than was previously known? Not for just a single person, as sometimes happens as we age and simply forget the information we previously could recall about a subject. I am talking about the accumulated knowledge about a subject. It is kind of like a subject becoming simpler over time rather than more complex. Which is pretty much the opposite of observed reality.

In fact, this just may be what happens as we approach absolute and universal entropy. I am not a physicist but why would this not be the case? There would be two aspects to this, not only would universal entropy eliminate any differences between things, but the differences between parties, places and time would cease to exist as well. Not only would there be less information on all subjects, but there would be less and less subjects to have information about. Nor would there be anyone to have and be responsible for information, even if it did still exit.