Many thousands of programs exist that have zero, zilch to do with a communications network. A common example is most calculator programs built into desktop operating systems from 1980 – 2018 do not access the network. The further back you go towards 1980 the more programs you will see that do not have network capabilities. Another example, would be the main character and symbols program built into most desktop operating systems during the same time frame. Eventually, these may be changed to rely on the network, but for now, their functionality is fine without it. Most programs that you may interested in writing will rely on the network. In many cases, networked programs do not really do anything meaningful without network functionality or a good network connection. The way your program accesses the network and exchanges data over the network has a huge impact on how the program operates.
Communication over the network occurs in layers. I am not referring to the layers involved at a hardware level but rather, the layers involved at a software level. The code you may typically use to access data over a network may only be the tip of the iceberg. Several layers of code may underpin the code you use to access the network. Layering is very common in software development and networking is no stranger to this layering.
If you are programming in the Java language for example, you may be accessing code that, unknown to you, accesses a layer of code also written in Java that then accesses code written in C++ bound to more code written in C connected to parts written in x86_64 assembler. That is not a strange occurrence and indeed is the rule rather than the exception. The reality is that a large amount of the foundation for networked software is defined in the C programming language that other programming languages can access to move data back and forth. One of but not the sole reason for this layering is that you often need the data communicated over the network to already be in a format convenient for the program to process and relay and by the time you get to C, you are dealing with a number of technical requirements that may exceed your time and effort budget so you use a more convenient wrapper. Hence the term, “Wrapper class”. Quite a bit of programming involves using other people’s wrappers as well as creating your own.
We use other terms for wrappers such as abstractions, API, components, etc. When it comes to network communications and data exchange, the choice of abstraction can have a huge impact on the time and difficulty in integrating the program with the network. Some abstractions may not give you enough control since there can be a trade-off between convenience and control. Other abstractions may not transact communications efficiently and introduce lower performance. That is not always a bad thing since you can sometimes have a trade-off between speed and secure and/or reliable communications. The main point is you need to determine which level of abstraction works for you.
Communication Abstraction Priorities
My suggestion is to choose security first. That can slow things down, but these days the stakes are too high to make an error in this area. Second to security is reliability. You have to determine how well the communication abstraction works. Does it open and close connections to the remote data source consistently and properly? Are there too many errors when using it in the prescribed way? How many workarounds are required to get it to work the best way for your application? Is the effort expended in the integration of the communication API worth it? What are the suitable candidates for a good, solid, communication API that meets your requirements?
It is true that some situations do not require a secure, consistently reliable communications exchange between the program and a data source/data destination. An old example of that is instant messaging clients in which case retrying the communication is okay. In other cases, it is perfectly acceptable to start out with a highly convenient abstraction that may be uneven in its quality as a means to move along the software development with the goal of addressing it later. What you definitely want to do is evaluate your choices and hopefully have 2 or 3 choices you can alternate between.
Automatic Network Communication
Some network sources need a very specific abstraction in order to access them. Relational databases hosted on a server tend to be this way. In that case, you many be more limited in your choice of abstraction. Data residing on a web server however can be accessed in more ways than a database depending on how the data is hosted, which ports are open, and which application protocols are in effect in order to access that data. When it comes to RSS feed data, the options for accessing it can be quite numerous.
Just like an HTML web page, an RSS feed can be accessed like a file. Since the RSS feed is generally encoded in an XML format, you can read the RSS data parts through an XML reader API. Many XML reader APIs these days also include network functionality built right inside the API. If you pass the XML reader a network address where the RSS feed data is located, it will execute the relevant network requests, acquire the data and contain it in an XML object you can process in the program. That can be highly convenient.
Depending on the program, this can be the right way to access data. Not necessarily XML, but using a data abstraction that can identify if the data is residing on local storage device versus a network location and execute the appropriate means to retrieve the data. That can speed up your software development time tremendously and give you the space needed to focus on either other parts of the software or the system overall. Indeed, unless you have an alternative abstraction ready to go, or are dealing with a peculiar network data source, it can be best to access networked data in this way in the beginning.
The downside of this approach is not all data APIs provide the means to fully customize the way they interact with a networked data source. Such data APIs may be fully competent in their core operation of processing the data, but may offer network retrieval as a convenience. When you encounter problems unrelated to data representation but related to network interaction itself, you could be left with little to no recourse at all in mitigating problems. It is another area where you have to balance the convenience of the API with the range of capabilities that may be lacking. Again, that does not mean avoid it, but simply know more of what you are getting and use it if appropriate to the present time or task at hand.
cURL and Data Exchange in Gautier RSS
I decided to use cURL. The cURL API is an abstraction over the operating system’s own, more granular, C API. I would say that cURL sits somewhere above the operating system API in terms of convenience and below an even more convenient API. I decided to use this API which is written in C because it among the most widely tested API for network communications that also provides a balance of capabilities. I then wrote a very thin wrapper in C++ atop cURL in a fashion that mirrored the API style I used throughout the Gautier RSS application.
At the application level, the Gautier RSS application merely passes around an RSS feed name and web address. This then gets passed to an API that takes the web address and passes it to cURL. At that one spot, that thin wrapper instructs cURL whether or not to use SSL and which additional values to send to the website besides the web address so that the network request can be fulfilled in a way that works best for the program. In this case, one of those additional values was to indicate to the website a preference for raw data instead of data with additional HTML formatting as well as the preferred RSS encoding. That eliminates additional code needed to disambiguate HTML from XML, speeds up processing, and makes conversion of the retrieved data more straightforward.
It is not a matter of if, but when you will get errors in data communications. Much of that depends on the situation. I have been in professional situations in which I have greatly lengthened the timeout for communicating with a database. That was acceptable in some situations since it was critical that the communication finish without interruption or retry. Stated differently, such implementations had few transaction recovery guarantees in favor of throughput.
In other cases, lengthy timeouts would be unacceptable since timing, distance, and the number of hops involved produced data loss. The wrong approach to communication errors and stalls in those situations resulted in time-consuming clean-up and costly rework. In other cases, timing is non-critical and the process is inherently resilient to communications errors due to the nature of the data involved. In all cases however, overall flow of the program will be impeded if communication error is not handled well.
Whether you use error codes or exception handlers (choose one and be consistent), you need to account for communication errors. One of your tasks is to determine your level of resiliency. A program such as Gautier RSS reader is dealing largely with read-only data available at websites. This program pulls the data and shows it on the screen. The main communication error it has to deal with is whether or not the data retrieval operation succeeded at a generic rather than granular level. Any number of issues can be the cause of a communication issue and this program sees them all as a failure. When a failure occurs in this case, the strategy is simply to revert back to the data that was last successfully retrieved. Such recovery from a UI standpoint is smooth and transparent. Whereas if data retrieval was more critical, then more actions on the part of the program would be required.
You do yourself and others a great service when you determine the error handling process. It is true that security is not something you want to add into a system after the fact. That is great advice from experts. However, and more controversially, I would suggest considering your error handling approach but not implementing it right away. While it is good to have error handling, it is not good to have avoidable errors that are simply swallowed up by error handlers. It could be that the design of your program is not quite right when it comes to the proportion of successful versus unsuccessful behavior. Therefore, you may want to let the program crash while you are developing it so you can better understand the flaws in the program.
It is possible to write programs devoid of error handlers that nonetheless do not generate technical errors. When you can rigorously execute an error free program (free of technical errors) in severe conditions, then you know the technical basis of the implementation is sound. After several rigorous trials running the program, you could still incorporate structured error handling. That way, if something changes in the conditions under which the program runs, you still have the means to identify and resolve errors that occur by logging the errors for later examination. Data communications is an area in which errors are likely to occur due to unexpected changes in server locations; data formats; access duration; and security interface requirements. A minimal application of error handling recognizes these possibilities (and more) and allows you to circumvent the problems you are able and have some means for recovery when you are not.
You will have to make a choice regarding the communication cycle. A chatty cycle is one in which you do some minimal communication and data exchange. Perhaps you want to establish the existence and availability of the server. A few seconds later, you do a little more testing before pulling the data. Or you may pull the data in small pieces for the purposes of recovery. You may decide to keep the line open until you are certain you retrieved everything you needed. Many people do that, but I generally avoid it.
Generally speaking, I try to keep the network communication duration as short as possible. The downside to that is for the next exchange I initiate, communication with the server may be unavailable. That can hurt when attempting to keep up a semblance of a complete transaction between the client and server. The upside to this approach is more clients are able to communicate with the server at the same time. Whichever approach you take, factor in the pros and cons into your decision.
What I try to avoid is an offline data pattern. I primarily avoid it simply due to the amount of effort required. Yet, I chose that pattern for the Gautier RSS reader application. The way it works is, I attempt to retrieve some data. If the data is available over the network, I save a copy to the local storage device either as an explicitly file the program knows about or in a local database. On the next attempt to retrieve a newer version of the same data, if the network communication fails, I fall back to the saved database entry or file.
That offline pattern makes the communication process seamless and eliminates data access disruption from the standpoint of the user interface. Overall, it keeps things going from a functionality standpoint. I once applied the database approach with the first versions of the program. However, that required a lot of code and I really did not want to maintain a lot of code in this case. It was unnecessary for this program. Later on, I switched to the file approach. It was simpler since my needs regarding the acquisition and display of the data was simpler. This pattern works well in an RSS reader program but it does not work as well in certain kinds of programs like those parts of a banking app dealing with real-time money transfers.
This part is actually a huge topic that I will not spend a lot of time on. Transfer medium covers a lot of ground, but I will emphasize one part of it. You should think about the typical use case involving communications and data exchange. What I intend here is you need to consider the primary connection hardware you or the users of the software will use. Each type of hardware will have a dominant impact on all the choices and considerations mentioned so far. I am not saying you should begin here, but as you become aware of the hardware involved—WiFi vs. hard-line Ethernet vs. cellular vs. serial vs. T1 vs. hypervisor LAN—you will want to design the solution accordingly.
In most cases, the communication abstraction will not care about the actual hardware involved because the operating system will generally wrap that through a generic TCP/IP interface of some kind. Different hardware drivers for each kind of hardware will cooperate with the operating system to present the hardware through a common software interface when possible. When not possible, there will still be a software interface at some level that will often be accounted for by many of the abstractions used in software to access a network through a defined piece of hardware. However, the qualities of that hardware will have a huge impact in terms of response time; network level security requirements; reliability; allowed data encoding; and quality of encoded data retrieved or posted.
The preceding then illustrates that you have to take much into consideration. Further, do not take for granted that a more difficult or tenuous transfer medium in which the software operates well automatically confers benefits if the transfer medium changes. The reason is that you may have an rotation of acute issues in terms of emphasis that you will have to account for in your error handling approach. Fortunately, Gautier RSS is a none-critical software program that does not have to take much of this into account, but you should “consider” and weigh these issues each time.