Comments on Data Oriented Design – Part I

Data Oriented Design is a concept I became increasingly curious about. I read various presentations from CppCon 2014 that discussed Data Oriented Design but it was not enough for me. I tabled the matter for a few months until I ran across a presentation by Sony. I was tentatively convinced that Data Oriented Design had worth. I needed to know more. I had an intuitive sense of what it was about. I mean, I studied Assembly Programming, data structures, and computer architecture and wrote my code in such a way that respected the machine. However, was there a larger viewpoint? I wanted to confirm my ideas of what Data Oriented Design was in order to transition from intuition to practice. What I found was that I had no idea what Data Oriented Design was, but I do now.

Richard Fabian has done an exceptional job detailing the concept. I found his work online and read the entire website, dataorienteddesign.com, that he produced. A compelling read. He fully explains the rationale, the application, and the reality of the Data Oriented Design concept. Applied to software development, Data Oriented Design, is far less taxonomic than Object Oriented Design. Instead, Data Oriented Design is more mechanistic and active. The goal is to create “inherent” behavior rather by reducing the amount of explicit control flow in imperative programs. The shape of the data determines how the program runs, what it does, and how well.

The website gives you many details about the concept. I think Data Oriented Design is probably the right approach to software development. I do not think that programming languages such as C++, C, C#, Java, similarly structured languages are the most practical languages for Data Oriented Design. Yes, people have found ways to apply Data Oriented Design in these languages, but there is much more you can do with Data Oriented Design than what C++, C, C#, and Java naturally allow.

At this point, you may wonder why I say this. The reason is I think Data Oriented Design is a way of using the relational database approach as a way to design and organize software. Harnessing relational databases and SQL can be far more powerful than what you can do with conventional computer code. I see the echos of a relational based approach in the details.

Transformation

I had access to multi-million row databases in professional contexts in which I had a choice of writing computer code to transform the data or writing SQL, organizing relational views, tables, and stored procedures to accomplish the same thing. My technical background emphasized software development, but I grew to appreciate other approaches to data. I did not use a step through debugger to diagnose an SQL select statement at the level of granularity of a for loop iteration. Instead, I had to think deeply about relationships, transforms, and set theoretic operations from similar and dissimilar groups. The result was that I could move, copy, calculate, and transpose huge swaths of data in a fraction of the time it would take to write code for each minute step. I began to appreciate that more over time and realized that for many things in IT software, I could simply stop writing software if I just organized, queried, and revised the data a certain way.

What I just related though is not reality when you actually do write software. Eventually, I settled on a model in IT in which the database would do the majority of the heavy lifting since it was quicker, easier, more thorough, and faster to iterate than code. The software is just there to provide feedback to an end-user or dump file contents into a table and receive output destined to a file. It greatly simplified the software I wrote in IT but it had the downside of spoiling you for you may use databases in cases where you don’t really need one except for the convenience it offers in data transformation, which is the whole point of software.

Relational Concept as a Model

Solutions exist that are pure software code solutions that do not really indicate the need for a relational database. Preferences sometimes exist among those who write software, not to use databases as a transformation engine. Sometimes, you just want to write self-contained, pure code (See my technology projects website as of Feb. 2015).

When you do go the route of pure code, Data Oriented Design can become the governing design approach. It will speed up programs and make those programs more agile when done properly. Structure of Arrays in which the structure is that database and the arrays are the tables works but up to a point. Array of Structures in which the array is a uniform table with rows as structures whose schema is defined as fields of the structure operating as columns works okay too. The main problem is that the productive mechanisms found in relational technology implementation is missing.

Microsoft .NET has a technology within ADO.NET called the DataSet that allows Data Oriented Design, but that technology is now discouraged in favor of Entity Framework. LINQ in Microsoft .NET would solve the productivity problem except that it really does not and it is not at the same level as the query optimized technology found in SQL Server. You can solve all of the above in the Microsoft space by using embedded SQL Server but that is anathema to many who write code in which Entity SQL may be a more favorable alternative.

Likewise, C and C++ does not have relational capabilities in a productive form, but they otherwise work exceedingly well without them. The problem is when you want to do Data Oriented Design and you sense the solution would advance much further if the way you approach data transformation in SQL was naturally available in the language. It is not. You can fix that by embedding a database like SQLite in C or C++ program or HSQLDB in the case of Java. Mozilla uses SQLite with Firefox that an end-user can open a settings window that will allow them to open a directory containing SQLite databases that they can delete to fully reset browser preferences. Certainly, as true Data Oriented Design goes, this is the best approach from a productivity and solution design approach. Except those needing full deterministic control over the data do not have it.

Practical Transformation

In the end, Data Oriented Design is a name for something people who have written software has done for decades. The value of naming it is that you can better decide if the approach applies to your situation or not. It is a tool that I think is the primary tool for truly good software development but a tool that is not conveniently accessible in practical, real-world software development.

Software languages will have to change or a shift will slowly guide more people in the direction of functional languages. That has not happened because thinking step-by-step is easier than trusting a somewhat declarative statement to transform data from shape A to shape B. It is a matter of trusting what you can see in terms of direct side-effect as opposed to something very abstract whose heuristics are more hidden. I trust some abstractions in certain implementations of programming languages but not enough to say that cache misses will decline consistently. Sometimes you have to take a more direct approach. In those multi-million row databases, I did a lot of pre-calculation work to speed things up for when the end-user needed to trigger a battery of reports. Time frames were too critical to fully trust query optimizations in real-time when you have no advanced visibility into the shape of mass import data.

Readable Source Code but High Performance Programs

Database engines are not programming tools but despite this, they can go far beyond software in most areas of data transformation. Regardless, one area I think Data Oriented Design may have a beneficial impact is in the structure of software code. Array of Structures is a core, default mechanism for improving access. Some programming languages cater to this better than others. Java and C# do not, but they work well enough up to a certain scale.

Explicit code defined to produce native code using functions and arrays may be less accessible in form than Object Oriented Design. C++ tries to strike a balance that produce high performance code with a higher measure of readability and obvious intent. It is a good trade-off as code marches off into the future. You can overdo things. Readability can be overdone as a form of premature optimization.

Right Abstraction for the Task

Objects are useful but many of us learned explicitly or implicitly that object hierarchies was the right approach for taking modular containment and reuse to the next level. I still like Object-Oriented Design and use it in some of my recent code presentations on my technology projects website. It is a great way to have internal modular containment within the code. I do not advocate a rejection of Objects but I do advise understanding them in the broader software technology context.

Adapt the Code to the Machine

How do we expand and balance our understanding of Object Oriented Design alongside other paradigms and approaches to software? It helps to consider other points of view. On Richard Fabian’s website is a section called, What’s Wrong in which he explains his view of what Object Oriented Design lacks or at least I think that is what he is trying to say. A subsection titled, Mapping the problem, talks about the difference between a real life table and a computer description of a table. The computer does not care or even understand what a table actually is.

Objects are important to people who write code, but the computer will only treat the object as well as the rules of the computer allows. Richard Fabian’s presentation is good food for thought on the general virtues of objects under the umbrella of technical systems design in service of a functional subject domain. Some software is created for a situation that does not seem to benefit as much from Data Oriented Design. When it does though, it is good to know that there is a road map for its use.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s