Author of The Analytics Revolution, Bill Franks, recently wrote a fascinating exploration of database structure in Forbes magazine. This might sound like an oxymoron – database structure and interesting – but bear with me.
When most people think of databases they think of relational data – fields of information. A database captures specific information such as name, address, phone number, all in a certain way so each field of information conforms to certain standards such as type (text or numbers) and length.
If you have never designed a database then think of it like this, you have rows of information that are the records of data – let’s say each one describes a customer. Then you have columns of information, each one is a field of data like a phone number or email address. So each record contains many fields… each “person” has a name, phone number, address… and so on.
For many years now the standard way of querying a relational database like this has been SQL – Structured Query Language. SQL is a series of commands and tools that make it possible to logically extract information from a database, in simple terms if you want to extract all the entries from a list of personal details where the date of birth is before 1980 then it’s a simple query. It’s just like asking a question and the database returns the answer.
But in the world of Big Data most of our assumptions around how a database is formed and how we can query it are different. There may be no fixed structure in a universe of data that is constantly expanding and changing. This makes the process of querying a Big Data set very different.
Of course this has been well known and many tools already exist that help to support Big Data analysis, but what Franks is arguing is that the skills and tools we need today are what we used to have before SQL became accepted as the standard way to interact with a database. Before we got all organized and relational, people had to query data in a much less structured way.
His book addresses this in more detail, but I find it fascinating that we can sometimes forget what we already knew about accessing data. Perhaps there will be a greater demand for people who can remember how data was queried before SQL became commonly used? It’s an interesting idea and goes to show that, in technology, the new is not always new.