This article will be somewhat technical in nature and begins with a question. Does a program exist to house algorithms or to convert data? If the previous 5 articles in the series are any indication, I would emphasize the latter. Algorithms are one of the primary means by which data is translated in a program but the overall effect of the program is to provide a useful output. That output can be something you see, hear, or touch or it can simply be an output that is fed into another program, file, database, or network location. The details and clever techniques in a program mean little if the overall output is missing, incorrect, or otherwise unsatisfactory. How well information/data moves from starting positions in the program’s cycle to resultant outputs in correct positions on the screen; in a directory; or table is the concern at root in program implementation.
Program Engine Concept
A tool we can use to accomplish this translation is called an engine. An engine is an overall computer program module comprised of algorithms and data structures that work in unison to migrate data input into a relevant data output. Other names for this exist. An XML or REST web service can take specific inputs to produce relevant results. Many parts can reside beneath a web service. They are unseen when you view its interface but an implementation nonetheless exists with parts working together to product outputs consistent in format and kind.
Fully baked services and frameworks are often highly reusable. Here, I place less emphasis on a perfectly reusable engine. Make no mistake, I know a highly universal program core is very useful. However, the excessive pursuit of re-usability and pristine interface definition may weigh too heavily on the full program’s implementation. It is best to reduce continuous redevelopment in lieu of a finished program. Yet, the program’s core will often evolve into an increasingly reusable form over time. The right data translation process is the goal. The early stages of a program’s implementation benefit more from a correct engine than a perfect engine that is perpetually incomplete.
I chose the term engine instead of framework to set the emphasis on what we are doing. Our goal with an engine is to concentrate functionality, centralize logic which collectively represent a general concept. I will refer to the Gautier RSS program I built. That program has a user interface which you would see as the visual screen part of the program. That exists at one layer of the program. Next, is an RSS engine as a fully separate, self-contained part of the program that drives the mechanics of pulling RSS data and organizing it for immediate use in the visual part of the program.
Without the RSS engine to pull data and slice/dice it, the rest of the program would not make sense. In this way, the RSS engine is the critical component. It knows nothing about visual display, screen geometry, mouse clicks, or voice commands. However, the area it deals in requires substantial attention and thus must work well as a whole for “any” UI that uses it to have a chance of being useful. Some versions of the Gautier RSS program used a game engine that handled UI interactivity such as keyboard, mouse input and screen output. Data output from the RSS engine would then be translated into shape, color, and geometry input into the game engine. I now use a UI framework, but the process is the same. These engines are a little more specific than frameworks but still essential.
An engine can be a standalone executable. Often, it takes the form of one or more binaries linked into a program. At a source code level, a good way to implement an engine is as one or more code libraries. Those libraries can then be translated into individual binaries. The binary code can exist a single file or multiple files. You can have one or more dll files on Windows or so files on Linux. The important matter from the standpoint of programs that use the engine is the interface. You want to strike a balance between convenience, correct use, and granularity.
The following is a screen shot of the Gautier RSS engine in source code form. The most generic parts of this engine are represented by the file and http classes in the techconstruct folder. Those two classes represented the bottom most part of the engine. I consider them generic as their functionality can work in most C++ programs. The classes that represent the primary interface for the engine are the rss_cycle_feed_name, rss_cycle_feed_headline, and rss_cycle_feed_article classes. A program would access rss functionality by working through one or more of those 3 classes.
A diagram of the class hierarchy would place rss_cycle* at the top and file and http at the bottom. The other classes are simply the means of translating data encoded in RSS format and file format into usable C++ program format. The actual C++ data representation exists in the rss_data* classes and individual instances of them are often grouped in rss_set* classes. As a result, rss_file_manager* classes contain the actual RSS logic. They are so named as the file system is where RSS data is placed after it is (or if it is) pulled from the network and the same place where the authoritative version of an RSS feed is pulled before translation into data structures defined in C++. See the following diagram.
The typical way this engine is used is the user interface begins invoking functions in an instance of the rss_cycle_feed_name class. A function in that class then uses an instance of the rss_file_manager_feed_name class to retrieve a list of RSS feed website addresses and common names. This occurs along path n1 and requires no network access, only file access. The user interface uses this list of names to create a visual representation of the names. Those names are shown on the screen and also maintained in the user interface for easy access by the program.
Whenever a name is chosen by the user of the program, that name is used to obtain the RSS document. That RSS document retrieval occurs through an instance of the rss_cycle_feed_headline class. That class accesses an instance of the rss_file_manager_headline class that uses an instance of the http class to retrieve the RSS document at the website address indicated by the RSS feed name mentioned earlier. The document is retrieved along paths a, b, and c. If path c is successful, then the data retrieved along that path is applied to a file at path d in which case path d is reversed to bring the data from the saved file into the program. The result is that the rss_cycle_feed_headline class instance produces a list of headlines for the feed that the user interface converts to an on screen representation.
The user interface can optionally show the full article for a headline in a web browser. If so, no further action is needed other than to pass the web address for a headline to the web browser. The web browser, whether Mozilla Firefox, Google Chrome, Apple Safari, or Microsoft Edge, will show the contents at that website address. Otherwise, if the full article contents need to be show in the program itself, that occurs through an instance of the rss_cycle_feed_article class through an instance of the rss_file_manager_article class along path v1.
In this discussion of an engine we noted how a user interface may use the engine. The engine discussed here has nothing to do with showing anything on the screen. This engine also contains no hints as to how information should be displayed. Rather, what this engine does is retrieve data stored on a website and/or a file on the local hard drive and converts it data structures defined in C++. Those are data structures that can be evaluated by other processes within a C++ program. As an example, a list of RSS names are represented as individual instances of an rss_data_feed_name class. If there were 5 distinct rss feed sources, for example, there would be 5 distinct instances of the rss_data_feed_name class. Each instance contains the name you would show on screen as well as the website address (you may or may not show) that can be used to retrieve all the headlines associated with that RSS feed. That list of RSS feed names and website addresses is completely non-visual and resides in the computer’s memory.
Alternatively, we could have designed the RSS engine to have a single RSS data class and an RSS feed class that returned C++ STL vectors on the overall RSS feed. Indeed, we could have sustained the entire RSS document in the computer’s memory by retaining a C++ representation of the actual XML document. That would have been quick to code and very easy to understand in a way. However, that approach does work in some instances, but is not favorable when the size of the data can vary widely. Sure, you can use tricks such as pointers to memory mapped file locations but you still end up with more code than is probably necessary.
Instead, the design approach here was to observe the practical divisions of an RSS document related to the most common way RSS data is represented. At the same time, the representation was geared towards reducing the amount of work done to retrieve RSS data. For example, avoid retrieving all RSS feeds at the same time by default. There are pros and cons to that, but in this case, retrieval of an RSS feed is driven by the end-user rather than the program preemptively seeking the latest version of all RSS feeds in one go. The user interface’s design can ultimately determine which strategy is applied in practice. A UI can be setup to cycle through all feeds in a separate thread. The engine’s design is flexible enough to work either way.
Now, although our conversation refers to an RSS engine, the general practice of pulling and posting data in the manner outlined is a general approach. Indeed the engine itself conforms to several generic software design patterns. Familiarity with design patterns is not required to produce a good design. However, do not be surprised when a given design matches a given design pattern since design patterns are generally chosen based on their common occurrence in software.
You will find the following design patterns emphasized in this engine:
- Facade – The rss_cycle_* classes do not necessarily enforce an uniform interface to the rest of the RSS engine. The other classes are accessible. In practice, they are the primary interface. They are preferred as they are the higher-level interfaces denoting nothing about the use of a file system in the fulfillment of RSS data requests and translation to C++ data structures. This is fully intentional as the details involving website access, file system access, feed date/time expiration, and accumulation of headlines into lists largely addressed in the rss_file_manager_* classes.
- Builder – One of the roles this engine satisfies is it uses the same steps to build different representations. Although an explicit polymorphic interface is not used in this case, there is yet a common interface for constructing the RSS engine data in the form of get_spec, get_set, pull_set, and save_set functions.
- Mediator – A principal role of the rss_file_* classes is to coordinate interactions among rss_data_*, rss_set_*, file, and http classes. The design pattern is more apt in this case as the rss_file_* classes in general approach this coordination task in nearly the same way. Although each rss_file_* class is addressing a specific part of the RSS data, it generally does so in a similar fashion as the others in terms of how instances of rss_set_*, file, and http classes are applied.
- Bridge – Feed names and headlines are represented in a manner separate from their implementation. They are fully decoupled from implementation in this case. Such loose coupling is not required in an engine, but it can facilitate a more flexible implementation. As a result, the representation is more likely to remain stable with little variation in its interface over time while the implementation can be optimized and updated. In this case, while the implementation relies on the file system for the actual data, nothing inhibits the design from using live network streams in the future. That choice would not necessarily affect the representation if there was a preference to leave the representation as is.
- Command – Access to a website is inherent to the rss_data_feed_name class as each instance bears a website address. Each instance then is a unique request that, through an Adapter pattern applied through the rss_file_manager_feed_headline class, a request occurs either to a website or the file system. Likewise, rss_data_feed_headline inherently reflects a request to the file system wherein headline data exist. Again, the actual access mechanics involved are not expressed in the rss_data_* instances themselves but is related to their reuse in multiple areas of the engine and program.
- Memento – All of the rss_data_* classes fully express this design pattern and do so through the file class instance accessed in rss_file_* class instances.
A good place to start when designing the actual engine is to consider the high-level interface. In this case, we want a list of feed names; add new RSS feed names and web addresses that the end-user provides; and retrieve a list of headlines for a given RSS feed. Those are the high-level requirements for the engine and what the engine needs to do for a program that uses it. We stated those requirements in one sentence. Not every engine will have simple requirements. Whether the requirements are simple or more elaborately defined, having requirements for the engine benefits you by reminding you about what the engine does and does not do.
Based on those requirements, the high-level interface consists of a function that retrieves a list of names. Each name really is the human rather than technical name of a website or place on the website. It is optional that the actual website address accompanies each name. That is an implementation detail. Those website addresses can be separately listed if needed. A second function retrieves a list of headlines for a given feed name. It is optional that the actual website address for a headline exists directly joined to the headline text. That is an implementation detail. Each headline’s website address can be separately listed if needed. A third function saves a new RSS feed name entered by the end-user and associates a website address with that name. The key term here is save the name and does not include how to save the name. That is up to the internal implementation of the engine.
Once you have designed the engine’s “top-level” interface, the rest of the design can proceed as an exercise to stay within the constraints of that interface. You can approach this design process in a variety of ways. Refer to the data flow and program flow articles earlier in the series on C++ UI. At some point, you will want to transition from a possible design to an actual design reflected in C++.
In my case, I executed the entire program in my head several times including many of the scenarios the program should support. As I did this, I identified the right interface for the individual parts I saw executing and used the insight into how the program executed to draft the overall interface on paper. My experience creating 4 or more previous versions of this program was the primary way in which I could do this, otherwise, I would have followed a more elaborate approach. You could say I took a design shortcut, but did I really when I had the experience from previous versions to refine this one?
I then translated the paper-based representation of all the engine’s interfaces and implementations into C++ source code files that more or less worked well together. I did not write much detailed implementation code but simply conferred into C++ the overall relationships among classes in what is called “stub functions”. These functions had no implementation in them and merely served as place holders. While the functions lacked actual functionality, their form and declarations provided me a chance to see if the overall code structure was obvious enough to work with once a detailed implementation was added.
You are ready to implement the engine which means make the engine a reality. Something you can actually use. Despite the anecdote in the previous section, this is not something you rush into. After all, “stub functions” are easier to write than the implementation that drives them. Stub functions simply eliminate holes in the concrete interface in C++ header (.hxx/.hpp) and implementation (.cxx/.cpp) files so you write your interface in one step instead of writing it at the same time you build up the implementation. Your goal here is to gradually build up the implementation carefully and with certainty. What I mean by that last part is each piece you put in place, you can test as you go along. That way, when errors appear, they are less likely to appear in areas you have already tested which can speed up your debugging and fix efforts as you focus on the most recently written pieces of the implementation.
If your engine is similar to the one I describe here in which you have a “top-level” interface and a “bottom-most” implementation in the form of files, server or database access, then you may want to build just those two parts first. The reason is the “bottom-most” implementation affects nearly everything and must be correct in order for the other processes within the engine to work properly and make sense. However, you do not want to simply test direct, raw access to this foundation. A top-level interface can exist at the opposite end such that as you test the foundation, you are continuously aware of any design/data gaps at the top that must ultimately be translated into the foundation.
Here is an example of the top-level interface for dealing with RSS feed names and website addresses. It was designed to be very simple. Basically, you can look at it without documentation and discern its use. You see an init function that lets you determine which file name will contain feed names. The engine does not care how you came about this file path. You could have stored it in a database; pulled it from a configuration website; ran a loop over the program’s own execution directory to derive an automatically determined path; any number of a hundred ways to conjure a file path.
The get_feed_names() function brings back all the feed names. The other functions assist in maintaining the list of feed names. Is the following the only way to write C++ code that does this? Many variations on the following exist. For example, a std::vector of strings can be returned from the get_feed_names() function and a string can be returned from the get_single_feed_name() function. Interface definition preferences will vary at this level. The goal is generally the same. Simplified access to data from the viewpoint of the application using the engine.
The following representation of a feed name is a pair of strings. I could have used a std::map as in previous versions of the program. In that case, the map would index on name and return a website
address (denoted as url here). However, I wanted the ability to expand the definition of an rss feed name with less effort. Later, I may want to include other details regarding an RSS feed and an explicit class allows me to add attributes without the need to maintain additional types indexed by name or sequential list position.
Next, we go all the way to the other end of the engine’s implementation. The http class would come first since we benefit from having actual data from an RSS feed. You have to be careful here however since RSS feeds generally come with restrictions on how often you can access them. So what you can do is instead of accessing an actual RSS feed, you can access some other website to test out the mechanics of your http access implementation.
The interface for the http class defined in C++ does not hint at the implementation beneath. Yes, this violates the contemporary custom of avoiding interfaces with only 1 function. However, I also have the sense not to force in additional functionality when all I need is just 1 function. I try to avoid gold plating an interface just for appearance sake. At the same time, I wanted to make certain that access to data available over http or https was done one way. This is the one way.
The implementation behind the get_stream function is quite involved. The interface to the http class is in C++ but the details behind the interface is in C. More accurately, this is C++ directly interfacing with a module written in C. That module is cURL. The interface method, get_stream has remained the same but previously, I used POCO C++ libraries to execute data retrieval over http. When I found that insufficient, I changed the details from POCO C++ to cURL in C without affecting the method signature for get_stream(). That meant the rest of the engine unwittingly used POCO C++ or cURL without issue.
I follow a similar approach to the file class. With these two classes in place, other substantial portions of the engine’s implementation can proceed. One of the next steps is to refer back to your design document. Recall the first diagram in this document where we show all those arrows pointing from the top-level interface to the file and http classes. The arrows represent paths from one part of the engine to the other. The place to start after you have finished the top-level interface and the bottom-most interface is “one” of the paths in your flow. Again, “one” of the paths. Never overwhelm yourself in this process. The practice is to do one piece to conclusion and test it until you verify it works all the way. However, you will need to be cognizant of the material in the 7th and 17th articles. Both articles show the command-line side of creating C++ applications and are necessary in order to progress in reality. The full source code for Gautier RSS engine itself is available on github and the instructions for installing it on Fedora Linux, published on this blog.