International Journal of Computer Engineering and Applications ...

0 downloads 180 Views 463KB Size Report
International Journal of Computer Engineering and Applications,. Volume XII, Special ..... International Conference onBi
International Journal of Computer Engineering and Applications, Volume XII, Special Issue, August 18, www.ijcea.com ISSN 2321-3469

BIG DATA: ERA OF PRIVACY AND SECURITY CONCERNS 1,2

Vidisha Kumari1and Sarabjot Singh2 Department of Information Technology & JECRC, Jaipur, India 1 [email protected] ,[email protected]

ABSTRACT: We create 2.5 million terabytes of data every day. It is almost 90% of the today’s data from 2012 and has been created in the last two years. This enormous growth has created a pressing need to secure data in order to avoid data breaches and to comply with regulation. Data security and data governance can be attained by a combination of appropriate security tools with customizing configuration, clear policy definition, and cohesion of best practices. This paper will feature the security and privacy problems which mainly arise from the use of multiple infrastructures for processing big data, the use of new computer infrastructure language such as NoSQL databases that have not been thoroughly vetted for security issues. The security methods for this rectification includes secure data storage and granular access control.

KEYWORDS: Data, Big Data, Privacy, Security, NoSQL

INTRODUCTION: We are already familiar with the traditional data management techniques, where the data is mostly structured. In big data, we deal with unstructured and raw data, a.k.a. “gray data”, which is generated at a much faster rate than ever before. While handling the big data and working inthe privacy and security, we mainly need to work on 3 V's that areVolume (quantity of data), Variety(Array of different datatypes) and Velocity(swiftness of data processing).[5] Examples of big data sources include multinational organizations, clickstream, record of user’s social media pattern, search logs. Big data applies smart algorithms with artificial intelligence and machine learning to large sets of data to discover hidden patter relevant in various scenarios.

Vidisha Kumari and Sarabjot Singh

1

BIG DATA: ERA OF PRIVACY AND SECURITY CONCERNS

Figure 1 Increasing volume, variety and velocity of data

Stage/Year Real-time Analytical Processing 2010 – 2015

Characteristics

Examples Apache Spark, Amazon Kinesis, Google Dataflow

Stream Processing 2010 - 2013

Decisionsare taken automatically for data streams which are generated from live channels or machine-to-machine applications. It is used to implement real-times rules for the incoming events and existing events within a domain. Before storing the data is pushed continuously as streams. The incoming pattern in the streaming data is usually unpredictable. So, these data streams are processed using high availability solutions.

Hadoop Streaming, Google Big Query, Google Dremel, Apache Drill, Samza Apache Flume/Hbase, Apache Kafka/Storm It access the datastores and provides simple Apache Hive/Pig, programming interfaces to query.The traditional PrestoDB, HStore, data warehousing mechanisms are similar to the Google Planner approach used for functionalities. Shortcomings of DFS are overcome by using CoachDB, Redis, random read/write access which ispreferable for Amazon DynamoDB, sequential data access.Support large unstructured Google Big Table, HBase, datasets storage along with solving the above issue Cassandra, MongoDB by adding support for column based and key-value stores. Same type of data is collected from different GFS, MR, sources and processed collectively to produce HDFS, batch results. For efficient processing we use Apache Hadoop parallel programming models.

SQL-like 2008 - 2010 Ad-hoc (NoSQL) 2005 - 2010

Batch Processing 2003-2008

Figure 2. Evolution of Big Data[6] Vidisha Kumari and Sarabjot Singh

2

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, August 18, www.ijcea.com ISSN 2321-3469 1. USES: Big data is used in healthcare to predict patients’ risk for certain rare diseases and tracking the spread of influenza viruses[7]. It is now becoming more and more dependent on information technology and health economics are collecting and sharing a variety of data related to patient’s medical report, clinical notes, and other medical data. In business[8] the big data unlocks a whole new level, it is the key to a new era of personalized business service delivery, by adding intelligence to their current business model it improvesbusiness efficiency and results in a competitive advantage over its competitors. By having tons of financial data being generated and stored, it becomes easy for analysts to exact and find hidden meaningful insights to detect fraud.[21] One of the major challenges, Global warming, can be well understood and tackled effectively by taking steps based on the notion presented by Big data analytic. With rapidly increasing the technology in smart sensors[9], the data from the sensors are used to predict peak demand at multiple scales, which helps in increasing the effectiveness of Energy Department. This new data can be used to detect electricity theft and fraud. Big data is changing public sector as well it plays an important role, by promoting transparency and government accountability. On the other hand, many Analysts and Scientists view these as gold mines which can be exploited by individuals and groups.

2. PRIVACY: By implementing Big data at bigger stages, the ethical and privacy concerns need to be taken into serious consideration. Given that the data subjects (i.e., individuals) should have sustainable control over their data, to prevent abuse and misuse by data controllers, while preserving data utility, i.e., the importance of big data for innovation, knowledge/ patterns discovery, and economic growth.[10] The Edward Snowden’s NSA leaks in 2013 which presented the software “XKeyscores” which was used by government agencies like NSA. It was an analytical tool which allows collection of almost everything done on the internet. This poses a serious threat to the Privacy issue concerned around Big Data. By using Machine learning and AI with databases as XKeyscores, it could record Individual behavioral patterns, and their lifestyle, worldview, sexual orientation, emotional state of mind, religious and political views, ethnicity and other personality traits. This could be used against individuals for discrimination. Let's take an example of Insurance companies. They could collect and analyze information from public repositories, patient’s web search logs and can discover insights, which can be used to discriminate against specific individuals on the basis sexual orientation or gender identity. This not only creates biases, it helps them to optimize their service quality and efficiency.[11]

2.1 Aggregation of data: The need to aggregating data from different sources for research is causing a major problem. As Big data is collected from different stakeholders with their own set of rules on these data sets. Therefore, the mapping the data together is a big issue.[12,21] The another major issue industries are facing nowadays is the aggregated data may contain sufficient information to identify individual’s personal information.[13] A risk-based approach can be implemented by industries to identify different risks in Big data and by prioritizing these risks they can reduce the damages. The risk here in aggregating different data sets differ from one data set to another data set, so there is no single way to secure all data coming from different sources which comprise different risks. Hence it is a big problem to give justice to both data collection and Vidisha Kumari and Sarabjot Singh

3

BIG DATA: ERA OF PRIVACY AND SECURITY CONCERNS

privacy at the same time.[14]By aggregating data sets some organizations are also combining both collection of data but their privacy is still a problem which require some time to think off.

3 SECURITY: Big data also results in an issue related to data security. Hadoop, which was one of the first framework built for Big data, was essentially built to address gigantic data storage and rapid processing, but it wasn’t developed keeping in mind the security of data.[15] The storage security in Big data can be implemented by dividing our data into subparts and storing them to different cloud storage. The sharing of data is also an issue to which we should pay attention.[16] To get maximum benefit from big data we usually share it in the cluster. So ensuring the security during sharing process is the problem. One solution to this problem is that the data owner can apply some access controls over the data so it cannot be accessed by any mistrusted person. Many organizations keep their data in big data system and whenever there is a system failure or any natural disaster occurs, the organization will be in great danger. So, we should specify some protocols that ensure the system will recover itself in such situation.[17] . Now a new issue nowadays is threats. There are two types of major threats, first is by injecting false data into the raw data or to steal a large volume of sensitive data, The another type of threat is by accessing different datasets that already have been analyzed.

3.1 Secure Data Storage and Transaction Logs: Data and transaction logs are managed using multi-tiered storage media. When the data is moved between different tiers it gives direct control to the IT manager over when and what data is being moved. Now, as the size of data is growing exponentially, it requires big data storage management. Since auto-tiering does not keep the track of where the data is stored, it leads to new challenges to secure data storage.[18]

3.1.1 Secure Data Storage: Security of storage refers to the solution that helps in preventing from: (i) Disclosure of data stored across our enterprise. (ii) Any kind of unauthorized modification. (iii) Compliance initiatives and supporting our key data security.[19]

3.1.2 Transaction Logs: A sequential record of all the changes made to the database is known as transaction logs whereas the actual data is contained in a separate file. Every database has at least one physical transaction log and this transaction log contains enough information that can undo all the changes made to the data file.

3.2 Granular Access Control: The perspective behind granular access control is secrecy-preventing access to data. We know that databases are supposed to make life easy, and at the same time we do not always want out sensitive data to be accessed by everyone. So, for this purpose, all our databases should include a powerful granular access control that puts only us in control of data security. It enables user roles and responsibilities to be set so that every individual is given access to only relevant areas or function of the system. Granular access control mechanisms are also a tool that can be used to reduce the restrictions on data without violating the policies.[18]

3.3 Data Provenance: Data Provenance mainly refers to the process of tracing a record that accounts for the origin of any data. Whenever we search for any data on the web, we may know the last address where it was Vidisha Kumari and Sarabjot Singh

4

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, August 18, www.ijcea.com ISSN 2321-3469 searched, but the same data might have been copied from any other location and in turn from a different location.[20] In this way, the complexity of Provenance metadata grows quite high and very large provenance graph is generated. It is almost impossible to find the initial location or the origin. So, in such cases, the analysis of this provenance graph to know the “source” is known as Data Provenance.

CONCLUSION: One of the major arguments which was addressed is how to make data subjects aware of complex data flow and use of their personal data and spread awareness of risks and harms this data may enable in the future. The use of big data may go further ahead than the economic and cultural divides that already shape part of our society. In this paper we explored the steps like secure data storage, transaction logs and granular access Control which can be taken in account to enhance data security in Big data environment.

REFERENCES: 5 Big data at the speed of business, [online]. http://www-01.ibm.com/soft-ware/data/bigdata/2012 6 S. Rusitschka and A. Ramirez, “Big Data Technologies and Infrastructures.” http://byte-project.eu/research/, Deliverable D1.4, Version 1.1, Sept. 2014 7Harsh KupwadePatil; Ravi Seshadri, “Big Data Security and Privacy Issues in Healthcare” 2014 IEEE International CongressonBigData, Year 2014. 8Dijcks, J.P. Oracle: Big data for the enterprise. In Oracle White Paper; Oracle Corporation: Redwood City,CA,USA, 2012. 9CharithPerera; Rajiv Ranjan; Lizhe Wang;Samee U. Khan; Albert Y. Zomaya, “Big Data Privacy in the Internet of Things Era”IT Professional, Year 2015 10Elisa Bertino, “Big Data – SecurityandPrivacy”.2015 IEEE International Congress on Big Data, Year 2015. 11Boel Nelson; Tomas Olovsson, "Securityandprivacy for big data: A systematic literature review”.2016 IEEE International Conference onBigData(Big Data), Year 2016.

12Stephen Kaisler; Frank Armour; J. Alberto Espinosa; William Money, “Big Data: Issues and Challenges Moving Forward”.2013 46th Hawaii International Conference on System Sciences, Year 2013. 13DuyguSinanc Terzi; Ramazan Terzi;SerefSagiroglu, “A surveyonsecurityandprivacyissues in big data”2015 10th International Conference for Internet Technology and Secured Transactions (ICITST), Year 2015. 14BardiMatturdi; Xianwei Zhou; Shuai Li; Fuhong Lin, “Big Data security and privacy: A review” China Communications Year 2014. 15Pradeep Adluru; SrikariSindhooriDatla;XiaowenZhang, “Hadoop eco system forbigdata security and privacy” 2015 Long Island Systems, Applications and Technology, Year: 2015 16Aditya Dev Mishra; Youddha Beer Singh, “Big data analytics forsecurityandprivacychallenges”. 2016 International Conference on Computing,Communication and Automation (ICCCA), Year 2016.

Vidisha Kumari and Sarabjot Singh

5

BIG DATA: ERA OF PRIVACY AND SECURITY CONCERNS 17Lin Liu, “Security and Privacy Requirements Engineering Revisited in the Big DataEra”. 2016 IEEE 24th International Requirements Engineering Conference Workshops (REW), Year 2016. 18Praveen K. Murthy, “Top ten challenges in Big Data security and privacy” 2014 International Test ConferenceYear: 2014 19Elisa Bertino, “Big data security and privacy”. 2016 IEEE International Conference on Big Data (Big Data), Year: 2016 20SerefSagiroglu; DuyguSinanc, “Big data: A review” 2013 International Conference on Collaboration Technologies and Systems (CTS), Year: 2013 21Rafae Bhatti; Ryan laSalle; Rob Bird; Tim Grance; Elisa Bertino, “Emerging trends around big data analytics and security: panel” SACMAT '12 Proceedings of the 17th ACM symposium on Access Control Models andTechnologies, Year: 2012.

Vidisha Kumari and Sarabjot Singh

6