Security in the Era of Big Data

10 10 2012

Big data has become a buzzword lately. Companies see it as a lucrative technology that turns a massive amount of data from both online and offline sources into useful information to predict behaviors and trends.  When companies talk about big data, volume, velocity, variety, and veracity usually comes to their mind [1]. Security, however, is still an afterthought for many enterprises. In this article, I will discuss the security consideration of big data, especially the concerns related to NoSQL.

Before discussing the security issues, let’s take a quick look at big data and NoSQL. As I mentioned earlier, big data is not a matter of size. It is about 4 Vs: volume, velocity, variety, and veracity [1].

  • Volume: companies may deal with terabytes or even petabytes of information [1]
  • Velocity: how long it takes to translate big data to information? Is it in real time? [1]
  • Variety: both structured and unstructured data, such as data from sensors, videos, audios, social media sites, cellphone signals [1]
  • Veracity: what if you don’t trust your data source? How do I deal with untrusted data? [1]

Due to the complexity of big data, our traditional database management system that stores structured data with the relational database may not be suitable anymore when considering big unstructured data. For example, companies need to think about how do we deal with data such as graphs or audio that does not fit into rows and columns in relational database. As a result, instead of using SQL language, NoSQL has been used by many big companies such as Google and Amazon for storing both structured and unstructured data [2].

What is NoSQL?

NoSQL standards for “not only SQL” or “not relational” [3]. There are 6 key characteristics of NoSQL [3]. However, for the purpose of our discussion, I would only focus on three main features:

1)    Horizontal scalability [3]

2)    Data replication and distribution [3]

3)    Simple call level interface [3]

4)    Weaker concurrency model than ACID properties [3]

5)    Efficient use of distributed indexes and RAM [3]

6)    Flexible schema [3]

Horizontal scalability is probably one of most well known features of NoSQL. It basically means NoSQL has the ability to distribute data and workload over multiple servers instead of improving a single’s capability [4]. The economic benefit of it is reducing costs of upgrading one single server by using multiple relatively inexpensive pieces of hardware [4]. This also reduces the single point of failure (or the bottleneck) and increases availability and velocity [4].

Data replication and distribution goes along with horizontal scalability. Data can be replicated and distribute over different severs, which allows faster and more efficient operations [3].

Flexible schema is the major innovation of NoSQL. The traditional relational database requires us to define the table structure, while NoSQL allows us to add new attributes to data records dynamically without defining a fixed table schema first [3]. It also allows different kinds of data types such as graphs and documents [3].

There is a huge debate between SQL and NoSQL and which is better. Some people argue that SQL works just fine because most companies do not need to handle data that is as big as Google [2]. For the purpose of this paper, I would not focus on the debate. Rather, I would like to address some security concerns related to NoSQL.

Flexible schema:

Since NoSQL allows adding attributes to the data records dynamically, a forward-looking security mindset should be established [5]. That means before we add the attribute, we need to understand what happens with this new attribute, what are the security concerns of this new attribute, what privileges should be granted to this new attribute [5].

Data distribution/dispersion

Unlike a relational database, which only allows one piece of data stores in a single location to maintain data normalization, big data allows different data stores in different servers [6]. Therefore, it is harder to locate data and maintain its confidentiality [6].

Still a new model

Since NoSQL is a relatively new technology, security is not a build-in feature yet. Of course, it does not mean that the traditional relational database is vulnerability free. However, since most of those vulnerabilities are well known by the public, many countermeasures and policies have been effectively implemented to prevent it from happening. For example, SQL has already had a strict access control and privacy tools, NoSQL, on the other hand, does not put much emphasis on it yet [6]. Similar to SQL, NoSQL is subject to the vulnerabilities of input validation (such as a potential threat of NoSQL injections), weak authentication and unencrypted data [6].

The People

Most people who are working on NoSQL are new to this technology as well. They need to spend most of the time to understand the technology and make it work [6]. So security certainly is not in their considerations.

Big data, Big target

The companies who are dealing with big data by using NoSQL are more likely to be an attractive target of malicious parties. It is because 1) attackers know that those companies have a large amount of high quality data 2) NoSQL is not in its mature stage yet and security is not being well considered. Attackers are more interested in finding new vulnerabilities and catch companies off guard.

Privacy

The earlier news about how the big-box retailer Target figured out a girl’s pregnancy before his father did proves how company breached privacy by data mining its customers. Many users are using multiple online identities as a way to prevent themselves from being associated with their real identity. However, with the ability of associate data from different activities from different systems, they can consolidate our information together easily [6]. That greatly reduces users ability to prevent them from being tracked by companies [6].

In conclusion, NoSQL is still a new technology and there are many security issues associated with it. Although it has been implemented by technology pioneers like Google, Amazon and Yahoo!, we are still unsure whether or not it is a fad or a future data management trend yet. For most companies, they should be cautious when considering dealing with big data by using NoSQL. Moreover, security should not be an afterthought for NoSQL because it is inherently more risky than traditional SQL as its ability to add new attributes dynamically.  Thus, a forward-looking security is necessary if we are moving to the NoSQL stage.

___________

  1. What is big data?. IBM, Web. <http://www-01.ibm.com/software/data/bigdata/&gt;.
  2. Cogswell , Jeff . SQL vs. NoSQL: Which Is Better?. 2012. Web. <http://slashdot.org/topic/bi/sql-vs-nosql-which-is-better/&gt;
  3. Cattell, Rick. Scalable SQL and NoSQL Data Stores. 2011. Web. <http://cattell.net/datastores/Datastores.pdf&gt;.
  4. Keller, Eric . Horizontal Scalability. BitLancer, 2012. Web. <http://www.bitlancer.com/blog/2012/08/horizontal-scalability/&gt;
  5. Chickowski, Ericka . Does NoSQL Mean No Security?. Dark Reading, Jan 11, 2012. Web. <http://www.darkreading.com/database-security/167901020/security/news/232400214/does-nosql-mean-no-security.html?itc=edit_stub&gt;.
Advertisements

Actions

Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: