Recently, someone asked if I could show them something about databases. Nearly all of my professional work involved Microsoft SQL Server on Windows and I was not setup on my home computer for this. I thought, this might be a good time to save the effort of setting up a virtual machine to run Windows and instead see how well I could get into the use of PostgreSQL on my Ubuntu environment. Below are the results of my exploration.
Setup Postgre and Tools
The first thing you have to do is install Postgre. Next, you will want to change the Postgre password. After this, you can install pgAdmin 3 which is similar to Microsoft SQL Server Management Studio which is a really good thing. The recommended instructions for installing PostgreSQL on Ubuntu can be found on their website at the following link.
The Tool is pgAdmin, A Gateway to PostgreSQL
Everything you need to use PostgreSQL can be done on the command-line. However, it may be more expedient in some cases to see the various tables and other database minutiae organized in a tree hierarchy. Likewise, common operations can be executed rather straightforwardly when you do not have to lookup the invocation procedure for the operation. One such tool is pgAdmin which appears to work quite well for this purpose. It is very similar to Microsoft SQL Server Management Studio, but, from my short time with it, seems much more versatile.
In this case, we are going to connect to the PostgreSQL service running on my computer and begin working with a database I’ve already created. As you can see from the screenshot below, the pgAdmin tool presents a wealth of information that gives us good detail about the databases. In this case, I’ve drilled down into the tables listing for the selected database.
While I was exploring PostgreSQL, I thought it would be a great time to run some rudimentary analysis on my own work history. In order to do this, I needed to populate a database with cross referenced data that I could then run queries on surface relevant connections between where I’ve been and what I’ve done. After visually scanning through my own LinkedIn profile and hand entering information, I had the basic set of information I needed to run queries. Before doing that though, let’s look at some meta-data about our data.
We look through the INFORMATION SCHEMA to view, at a glance, the tables we currently have setup to this point. It is a useful guide to keep track of which tables we’ve added to a query versus those yet to be added. The INFORMATION SCHEMA is quite valuable in reducing the clicks and right-click in visual tools to examine the configuration of tables and other database objects in on one screen.0
Build Database Views About Our Work Roles
We begin to build a general query that links together all the information about our work participation. Now, we could simply run a query just on the employmentroles table and be done. Yet, having a generalized, consolidated query gives us a more time efficient means to address multiple points of consideration from a single source of data as our questions change over time. Second, the use of consolidated queries of this kind automatically places the data into the right context in terms of relevant data relationships. As you can see in the screenshot below, we have this consolidated database view in place.
Query Work Participation Information
When we run the query, we see details beyond basic work history begin to take shape. However, there is a lot of noise here that we can reshape to see more clearly what we need to know. The begin the journey of filtering out the noise to create clarity.
Where and What
We now have a more summarized view of the information as shown below. We took the consolidated and cross linked data and reduced Cartesian Products to gain just the unique set of information we need minus any other information we really do not need. In the example below, we definitely do not want any entries that are unrelated to job roles. In the where clause, we cause the query to omit those entries that are without job roles. We then use a group by clause to ensure each data pair consisting of company name and job role appears no more than once.
Aggregate and Order
We are just about ready. We will take the filtered information up to this point and run an aggregate SQL function over the result set. Given a role name, how many times have we been in or applied the functions of that role across the companies in which we have participated? The Count() function will help us quickly determine that as we group the data by role name. With each grouped role name, how many rows involve that role name? The screenshot below shows the results so far.
Keep Up with Queries
As we define each query, we are also defining them as database views. Doing this will allow us to refer back to those queries in the future rather than having to write them again. It is an excellent way to reuse a query throughout a database so that the query is defined once and any refinements to the query applied everywhere it is referenced.
Output the Query to HTML
We are going to use string concatenation to pull the information into a format that will work for publication. Oftentimes, a query will result in many rows and columns that if you took the time to reformat all that information, it would take hours or days. The ability to systematically translate plain rows and columns into formatted rows and columns is a huge time saver. We will take our knowledge of HTML and pre-produce the database results in HTML format for copy and paste into WordPress.
Fix Up HTML Output
Of course, just because you can easily do something in a tool, does not mean it will come out 100% the way you expect. In this case, we paste the output into a text editor to examine it before copying and pasting it from the text editor into WordPress. What we find is that there are double quotes on every line that will cause the HTML text to not be recognized the way we intend. We fix this with a copy and paste as shown below.
Sure, there are probably options in pgAdmin to disable the automatic setting of double quotes, however, most of the time, you do want double quotes as that identifies which piece of data is text versus numeric in some copy and paste scenarios (such as pouring the data into spreadsheets). We could also have written a computer program to do all this automatically for us, but sometimes we do not have time or make time to do such things and this seems quicker.
The Published Result
Our data has gone from concept to analysis to published reality. The final output is now shown below from the May 2014 version of my WordPress Profile page. The application of PostgreSQL is now complete.