Rubikloud Labs

Share on

Big Data – Big Questions

Posted in Engineering, Labs
By Laura Leslie on August 5, 2014


With the advancements in civilization and technology, we are now seeing the world’s digital data double almost every three years. This exponential growth is leading to innovations and improved efficiencies across multiple industries.

When talking about large amounts of data we hear things like machine learning, deep learning, data mining, data lakes, batch and real-time processing, but what do these terms actually mean?

Looking to uncover some insight, I asked a few of the Rubikrew their perspective and understanding on the topic. Each one is working closely with big data or building products in the industry and had some interesting things to say about what it means to them:

The first engineer I asked explained his first impression of big data, “I thought it was referring to the volume of data that you are processing, meaning it is so big it cannot fit on to a single computer. A single computer can fit a lot of data, hundreds of terabytes even.” He further explained that people should try harder to do computation on a single computer – it would save some time, as writing a program for multiple computers is often far more difficult.

Another engineer at Rubikloud confirmed the above point when he explained that, “people don’t realize that not using big data techniques is almost always preferable to using them. “Big Data” techniques are a concession that sometimes have to be grudgingly made to the realities of your dataset and the questions you have of it – it’s not something to aspire to in and of itself. Big data techniques are almost always less accurate, more expensive, and more complicated than the traditional relational/batch processing that works on normal datasets. However, because it’s a problem that is symptomatic of success, many companies trip over themselves to use “Big Data” on datasets that would be better-served by conventional processing.”

Interested to hear a data junkie’s view, I then spoke with one of Rubikrew’s data scientists. I asked why she felt the term was being thrown around so much lately. She explained her thoughts, “people realize that business decisions should not be based on intuition alone. A company needs to keep track of all of their metrics. The fact that there now exists good enough talent that makes good enough software to allow companies to make more money has fuelled the hype.” When asked how she would like to see it benefit the world further, she told me about her hopes to see it used in the health industry. If doctors were able to have more records at their disposal they would be better equipped.

I asked our creative guy what all this big data talk meant to him. Big data affects everyone in a different way, and he sees big data from a different side than the others. “I think about the challenge we face,” he said, “there is the challenge businesses face with visualizing their data. One terabyte – anyone can do, but to handle mass amounts, constantly growing and streaming daily, that is a whole other challenge.” He went on to explain that businesses will find themselves in two different buckets. A) They understand the big data need/challenge but may not have the resources to make it truly efficient or B) They don’t understand the major differences in working with a palatable amount of data and big data. The latter will leave those people in “a world of hurt” he explained.

Finally, to round it out, I asked our product guy his two sentence version of what big data ultimately means to him. “Big Data simply means processing and storing large quantities in a more efficient manner than traditional database systems.”

When understanding big data, it is important to remember that it is only as good as the information derived from it. Aggregating and storing data can be an arduous process, but efficiently analyzing and using the information is the key to understanding your business.