Gautier RSS Update – 7/14/2018

I looked into the C++ version of cURL. While it appears productive, I decided the C-based implementation is suitably abstracted from the rest of the program through the http class I have defined. Since C++ cURL relies on C-based cURL, I decided that sticking with the latter would be more efficient from a build/compile/link/deploy standpoint. That also keeps dependencies to a minimum.

I also eliminated all references to the POCO C++ library in the areas of HTTP, networking, and file system interaction. The POCO C++ library now just handles XML parsing (which I may switch back to Apache Xerces that I used in a 2015 version of the RSS program) and minor string value tasks. That cuts the dependency on a large library I found does not handle everything well in favor of specific libraries better specialized to tasks.

The compiled size of the debug version of the program did not change. The size still holds at 1.2MB. However, observed efficiency is better. The latest update for 7/14/2018 on github represents everything stated so far. You can build it on Linux using the build script if you have development versions of libcurl, POCO, and GTKmm 3 installed. Other than tweaking which XML library I use, the plan is to work on a more detailed description of the program at michaelgautiertechnology.wordpress.com.

Advertisements

Gautier RSS 100%

The last update had issues pulling content from site web content delivered over SSL. I solved the problem by switching away from POCO C++ libraries as far as the web content download. In its place, I fully embraced cURL. The implementation of cURL I use is the primary one written in the C language instead of C++. The cURL website at https://curl.haxx.se/ has links to fully fledged examples that are vetted and ready to go.

The transition was smooth with 0 debugging or adjustment needed. Now rss feeds are retrieved consistently with a cross platform network communication library that works everywhere. I actually did not use the website itself. Instead, the website had a pdf document titled, Everything cURL. That is a very handy reference that had literally everything I was looking for in terms of API functions I needed to call and ways to do it. In any event, the 7/12/2018 commit to github has the final version of Gautier RSS for the desktop designed in theory to work on Linux, macOS, and Windows.

Later versions will be the usual cosmetic adjustments in terms of how the code is structured and visualized information on screen. I will eventually seek to update the use of cURL with the C++ version to keep things consistent. After some more testing, I will see packaging the rss reader for inclusion in mainstream Linux repositories at a minimum. This was a long running project spanning several years, but it is good to have finally arrive at this milestone.

Gautier RSS Reader 1% Remain

The latest commit on github reflects a solution that is 99.9% done. I have used the reader over the last few days since the previous update. A time boundary now exist. A given feed is never requested more than once in an hour. POCO C++ libraries streamlined that a bit. I changed the rss feed parse to not only focus on the main data values for each headline but do so with a more compact recursive function. The overall solution looks great.

Other than that, I also studied request headers using Mozilla Firefox Developer Tools to tune the request headers emitted through the http interface in the program. I made great progress there for websites accessible on port 80. I am almost at 100%. The POCO C++ library has an issue with the basic http client abstraction when accessing some websites on port 443 aka SSL and increasingly TLS. I am sure it works from Microsoft Windows, but on a mainstream Linux desktop, the standard abstraction for HTTPSClientSession in the POCO C++ library does not work well. Only www.phoronix.com is inaccessible from the current reader implementation. Content comes back but decodes garbled. I suspect an issue with certificate authority access and client selection. No matter, I prefer an abstraction that handles it well with minimal initialization and API calls. I plan to change the program to use a different network communication library. Present candidates are curl and cpr. Fortunately, most of the feeds I use are accessible on port 80 which means all the feeds I am testing with except one pulls into the reader for viewing and access.

Very simple RSS reader by Michael Gautier

Gautier RSS Reader 99% Complete

After 4 years, I finally have a version of the rss reader implementation I find acceptable. The program is now fully functional according to the goals I established a year and a half ago. The full source code is available on github with a commit date of July 4, 2018 at 4:44PM CST. The successions of 4s was not planned. As I started writing this, I noticed a single missed piece of functionality. After I added it, the time was 4:44PM as I posted it to github.

Anyway, I ran the program and it works as I intended. A feed is pulled, shown on the screen, and a button can be clicked to open the feed in the web browser. New feeds can be saved. A few websites do not recognize requests from the program. That is due to user-agent string I selected. On the advice of the Mozilla website, I chose googlebot for the user-agent which causes some sites and ignore feed requests from the program. I am contemplating if a user-agent adjustment is in order, but I am thankful for the websites that do recognize the feed requests. With the functionality proven out, the next step is to put in time limits. That will prevent excessive network requests.

All in all, I finally got the program I was looking for all those years. I can recompile it for Linux, Windows, and Mac and it should still work on those operating systems. The underlying APIs are stable in the form of POCO C++ libraries and Gnome GTK (C++ rendition). The time limits are the final, but small piece so it is 99% rather than 100%. Beyond that, the amount of code written is not too excessive and the UI is manageable. The general program definition is a solid expression of the program’s intent. The path has been long as some things take a while, but it is good to finally have reached this point.

Gautier RSS Reader Updates – 6/29

About a month ago I decided to revamp the RSS reader API. I am satisfied with the user interface, but I wanted to revisit the underlying code for pulling and saving rss feed data. The previous API was okay. The list of feeds were read from a file. The list of headlines from each feed’s website was pulled based on the website address for the feed. It worked. However, I found that not every website that publishes feeds use the same format. That means that in some cases, you get a description for a headline, but in other cases, you do not.

I decided to implement the ability to pull the actual news article content. While it is true that I could have simply added this to the previous API, I had another goal as well. Most websites have a limit on how often you can connect to them when it comes to things like RSS feeds. As an example, Slashdot has a limit of 1 connection every 30 minutes. If you connect more often than this, you could be blocked. I realized I had an API that pulled all feeds in the feeds lists with associated headlines data every time the RSS program ran. That was a problem.

The goal then is to get to the point where time limits apply to the API. I am not there yet because I realize time-stamp data will not be consistent between websites. I will have to generalize an API to handle different date/time representations in connection with C++ / C time functions. At the same time (the puns are unavoidable) I will need to plan the right approach to introduce time limit control in the program. I will either allow the user interface to drive this, or I will pre-program a limit into the API (say an hour). I wanted to use time values based on when the program runs but that is not a good approach in this case. Rather, the time will have to be based on the time of the file that contains headline data.

The past month I was able to commit about 1 – 3 nights a week to the endeavor. I would start around 10 PM or 11 PM and finish around 3 AM. The available time was rare and it has been tough staying motivated enough to make progress. The normal daytime experience afterwards could be quite rough. However, I stuck with it. The biggest breakthrough came when I had a couple of hours on a day off where I could just sit down and not write any code but instead plan everything on notebook paper in a coffee shop. I sat and thought deeply through the API expression, the next level goals, and the foundation for future progress. I needed an approach that would give me the biggest gains in quality in a short amount of time (infrequent late night/early morning sprints).

After I finished writing and planning the new API on notebook paper, I paused. That same hour, I dived right in and wrote the first code for the API but just the function definitions without the implementation. Some people call that “stubbing”. I was never a fan of stub function implementation in the past. No real reason, but in this case, I found that doing the stub functions provided progress and maintained commitment to the end goal. Over several weeks and when fatigue was low and motivation arose, I incrementally worked the new API into existence.

When I ran the command-line version of the program a day or two ago, it was funny how the output was the same but the underlying implementation was different. That is a good result since re-integration with the user interface will be fairly straightforward. A “meat” of the API is done, but it will take several revisions to get the time-based limit control just right. The POCO C++ libraries may prove useful in that regard. POCO has all the functions I need to do the time-based limit control in a cross-platform, universal manner. It will come down to the best abstraction from the perspective of the user interface and command-line programs in terms of how that time values are retrieved, stored, and evaluated.

The most recent updates described here are in the 6/29/2018 commit to github. More updates will follow. As I work through this, I realize that go get the RSS reader to a pristine level of functionality that for a simple process, the final fit requires some strong reworking of the article presentation. I have a raw understanding of where I would like to go with that, but it is too early to describe that at length as I am still deciding between easy versus more extensive effort.

Gautier RSS using gtkmm 3.0 – IV

The rss program now uses WebKit. One of the advantages of using GTK+ (I am using the gtkmm wrapper via C++) is WebKitGTK+ merges in very well. This is the web browser technology foundation originally in Google Chrome and Apple Safari. I didn’t really care about that however. What is useful is the integration with GTK+ and the ability to present web pages accurately. The June 7, 2018 version of Gautier RSS now shows web content within the program. When you click on an article headline, the web page related to that headline is shown at the bottom.

In terms of general capabilities, the program, Gautier RSS is done. I was able to cover more ground with GTK+ through the gtkmm wrapper in C++ to reach this point. Continued efforts will be to polish and refine this program. The program is stable and is usable as is. Yet, one goal is to reduce network calls to maybe once an hour. Providers of RSS feeds tend to frown on too frequent access to the same feed and an RSS program is not truly ready until the feeds accesses are kept only to the minimum needed to build headlines.