How to Tackle Big Data from a Security Point of ViewCategory: Hadoop Training Posted:Dec 14, 2017 By: Serena Josh
Big Data is the buzzword across many global enterprises irrespective of size and scale today. Being the hot commodity that Big Data is, businesses investing in Big Data must be crystal clear on what they are trying to accomplish.
Big Data has to be looked at from a security standpoint considering the data to be protected and the Big Data approach needed to handle security incidents. The data can be both organizational and from the customer’s side and the Big Data approach can be used to analyze and predict security incidents.
Securing Your Big Data
Most businesses presently use Big Data for marketing and research, even if they don’t have the fundamentals down – this is especially true from a security standpoint. Big Data, like most new technologies are not completely looked into while setting up new technological measures.
Breaches in technology involving Big Data will be proportionally massive, accompanied by reputational damage and legal hassles which will be exponentially big in nature.
Blogs, click stream data, social media universes generate petabytes of data for enterprises that employ Big Data to gain insights in real time. Such insights will be on several levels for both customers and their enterprise. This is why classifying information turns into a critical set of tasks, with information ownership being addressed to facilitate all reasonable classifications.
Enterprises always have issues with implementation of concepts with such new technologies making it a bigger challenge than otherwise thought by experts. Identification of data ownership for Big Data processes, along with the source of and the actual raw data. Data ownership is different from information ownership with IT assuming ownership of the raw data and business units assuming ownership of the outputs.
The most convenient way to adopt a Big Data environment is to link it with the Cloud. Even if there are enterprises that grow Big Data in-house, the percentage of such businesses will be really small across the globe.
Attribute based encryption are the way to go when it comes protecting sensitive data and applying access controls. In this case, they will be attributes of the data itself, instead of the environment in which it is stored. Most of these concepts are in the nascent stage and not yet fully developed or implemented on larger scales.
Deploying Big Data for Security
The deployment of Big Data for enforcing security through fraud detection is a raging trend among smarter organizations that are cropping up every day. This is a great alternative to Security Incident and Event Management (SIEM) systems. With Big Data, the amount of overheads to cover will be drastically reduced as compared to conventional SIEM. The existing log management systems in enterprises and replacements for the same are no solutions for the revealed problem.
Big Data style analysis is the hottest trend emerging out of the current day Big Data scenario. This will help to solve the challenge involving the detection and prevention of advanced security threats to modern enterprises and tech conglomerates. Using Big Data style analysis will help greatly enhance the degree to which security threats can be detected early on and with greater precision. Complex pattern analysis combined with the analysis of multiple data sources are what we are looking at right now. Anomaly identification using feature extraction is also a distinct possibility to increase security all round.
As it stands today, logs are ignored until and unless there is a security incident in an enterprise. This is where Big Data comes to the rescue with the unique opportunity to integrate and evaluate logs from multiple sources automatically. This is an improvement over the earlier technique of analysis from singular sources. One can gain profound and far reaching insights, which cannot otherwise be obtained from singular logs across the establishment. This is significantly enhance the Intrusion Detection System (IDS) along with the Intrusion Prevention System (IPS) using progressive and dynamic adjustment through the adoption of what can be termed as “good behaviours”.
The integration of information from various physical security systems is made possible here between CCTVs and building access controls. Such an interface can prove to be very time saving for the user.
Integrating information from physical security systems, such as building access controls and even CCTV, could also significantly enhance IDS and IPS to a point where insider attacks and social engineering are factored in to the detection process. This presents the possibility of significantly more advanced detection of fraud and criminal activities. Organizational silos can be phased out with Big Data technologies, which often minimize the security system effectiveness.
To sum up, even in the worst case scenario, Big Data can lead to more practical and successful IPS, IDS and SIEM implementations.
Big Data Technologies and Risks
The risks associated with Big Data technologies may not be much, and if utilized in the right way. Improper technology introduction will lead to the creation of new vulnerabilities.
- Big Data implementations usually involve open source code, with the potential for unrecognized back doors and default credentials.
- The attack surface of the nodes in a cluster may not have been reviewed and servers adequately hardened.
- User authentication and access to data from varying locations may not be adequately controlled.
- Regulatory needs may not be fulfilled, with access to logs and audit trails may be problematic.
- There is a considerable probability for malicious data input and insufficient data validation.
Big Data is a term that is constantly being used with Hadoop. Conventional data warehouses and relational databases handle loads of structured data and are able to store massive quantities of data. The need for such structure will often lead to restriction of the data type which can be processed further.
Therefore, we can safely conclude, Hadoop is architected to process large amounts of data, regardless of its structure. Hadoop’s core consists of the MapReduce framework, designed at Google to counter the issue of creating web search indexes. MapReduce spreads out a computation over several nodes instead of just one. This will solve the issue of the data being too large to fit into a single machine. This approach can be combined with Linux servers to give a cost-effective option to setting up massive computing arrays.
HDFS or Hadoop Distributed File System allows for the individual servers in a cluster actually fail with cancel the computation process by ensuring that data replication happens with redundancy across the cluster. HDFS data stores are allowed to be without schema and structure. When comparing this with relational databases, they need data to be structured and the schemas to be defined before the data is stored.
In contrast, relational databases require that data to be structured and schemas to be defined before storing the data. With HDFS, making sense of the data is the responsibility of the Developer’s code.
Processing techniques and outputs form the core of Big Data has more to do with processing techniques and outputs that include things other than the data set size. This calls for a specialist to handle Big Data and use it efficiently. Big Data analysis, being the growing field that it is, has an overall shortage of specialist skill needed for the Big Data analysis.
Growth of Hadoop and the associated tech is powering demand for staff with niche skills. Content and text analysis, along with data mining and predictive modeling and more, all are in demand. Platform management professionals are also needed to implement Hadoop clusters, secure, manage and optimize them.
Enterprises must be crystal clear on what they want from Big Data to fully leverage the knowledge that they are gaining. Big Data continuously pushes against the wall of restricted future tech and brings in considerable challenges in its wake.