No Phishing in the Data Lake
How to Find and Mitigate Security Risks in Large Data Storage
For any business with a data strategy in place, the next step on the roadmap to data transformation is to capture all the structured and unstructured data flowing into the organization. To do so, organizations must create a data lake to store data from IoT devices, social media, mobile apps, and other disparate sources in a usable way.
What is a Data Lake?
A data lake, per AWS, is a centralized repository that allows an organization to store data as is, without having to first structure the data, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions.
Data lakes differ from data warehouses in that data warehouses are like libraries. As data comes into a warehouse, it gets carefully filed according to a structured system that has been defined in advance, making it easy and quick to find exactly what you’re looking for given a specific request. In data lakes, there’s no defined schema, which means data can be stored without needing to know what questions may require answers in the future. As in an e-Bookstore, you can search generally and call all relevant results from various media types and make decisions based on machine learning recommendations and other people’s insights.
Security Risks and Consequences Within Data Lakes
Over the past few decades, improvements in compute power and storage space coupled with much more affordable storage prices have made it possible to store massive amounts of data in one place. Not long ago, storing a database of every citizen’s Social Security number would have been impractical—now it’s pennies on the dollars to store as a table in a data lake.
As much opportunity as large data storage provides organizations, it also creates risk. When vulnerabilities occur in repositories, their infrastructure, or any dependencies, the level of impact depends on the type and scale of the information that was compromised. Since data lakes have vast amounts in a single location, when breaches occur, the impact is often spectacular in size and in magnitude.
Common tactics hackers use to exploit enterprise data are Initial Access, Defense Evasion, and Credential Access. Kurt Alaybeyoglu, Senior Director of Cybersecurity and Compliance at Strive Consulting, says organizations often make a mistake by focusing too strongly on preventing Initial Access—a cybercriminal getting into the org’s network. Data lakes interact with so many sources that it doesn’t take network access to be able to cause damage.
“The two primary security risks in a data lake,” Kurt says, “are exfiltration ofand impact to sensitive data.” As the name suggests, data exfiltration is the unauthorized transfer of data. Attackers can either steal specific piece(s) of data or, more often, simply take a copy of an entire lake—akin to a burglar carrying away a safe so they can open it and rifle through its contents at their leisure. Data impact ranges from encrypting the data in the lake, to wiping it, corrupting it, or destroying the means of access to the platform.
Both tactics can, and have been proven to, be catastrophic for an organization’s survival.
Are Data Lakes Worth the Risk?
Facing such dire consequences in the event of a cyberattack, why do businesses choose to use data lakes? Conventional wisdom says not to keep all your eggs in one basket—compartmentalizing data to avoid total compromise is surely more secure. But for many, according to Kurt, the rewards of data lakes outweigh the risks.
“Being able to access massive data at your fingertips with simple queries is what allows modern apps to exist,” he explains. “Take Uber as an example. Uber, as a technology, completely disrupted the taxi service model. It got rid of the need for dispatchers because at its heart was software that acted as one, pairing users and drivers faster than most humans can. Their software functions because Uber created a data lake that contains information like riders, drivers, maps, payment information, etc. that allow all of these disparate aspects to function seamlessly”
While separating this data into different repositories may be more secure, it would take significantly longer times for the application to function, from running all the queries to payment processing, to time calculation for the ride—it would completely preclude the app’s usefulness. Not to mention the added complexity would make securing data just as-if not more-difficult.
“As security professionals, we have to try to mitigate those risks as best as possible,” he says. “At the end of the day, data security is a business function. Our job is to say ‘yes, we can do that, but here are the risks.’ Leaders must decide what they’re willing to pay to mitigate, what they’ll pay to transfer, and what risks they’re willing to accept.”
3 Ways to Prevent Security Breaches in the Data Lake
What makes data lakes so risky is that the valuable commodity, data, by necessity must be accessible, whether that’s to a platform, an end user, or someplace else. The data must be available in order to be useful. So, an organization’s top three focus points to protect that data are as follows:
- Rigorous access control: More people with unfettered access to the data lake means more potential entry points for a hacker to attempt to exploit. To secure the data lake, be thoughtful about who can access it and when. Validate those users’ identities using strong passwords and multi-factor authentication (MFA). If the data lake contains particularly sensitive information, consider more advanced hardware solutions such as FIDO2 keys.
- Regular vulnerability scanning and testing: Because data lakes and supporting platforms aren’t tied to a single device, hackers no longer need to achieve initial access to get ahold of the data. For most applications that interact with data lakes, a successful breach may only take a SQL or command injection that forces the system to respond with data it’s not supposed to—no device compromise needed. Because of that risk, proactively looking for the holes in a data lake’s security is paramount. Use a combination of application threat modeling, vulnerability scans, and application penetration testing to identify weak points, then remediate them quickly.
- Better detection through better training: “Data lakes are examples of what modern storage/compute allows us to do,” Kurt says. “We haven’t put the same level of effort and value into collecting audit logs to be able to make detection and analytics earlier in the cyberattack chain possible.” The answer? Staffing and training. Proactive threat detection comes from a skillset that knows what to investigate. “How do I collect audit logs from the platform? What logs should I collect? How do I determine when someone has accessed the data versus what’s just noise? That investigative mindset and skillset is in high demand and low supply,” says Kurt.
His suggestion to overcome the talent gap: Companies that rely on data lakes should build detection skillsets from within. It’s easier to pay to train a person who is well-versed in the inner workings of an organization’s platform that can build data security than it is to bring in a security generalist to work within an org’s data lake.
The advantage of training an internal employee is that they have the full view of the data product roadmap, which means they can start developing future updates on the platform that build security in from the ground up. That’s security by design—the brass ring of risk management in a data lake.
Where exploitable data exists, opportunists will try to access it. Data lakes provide organizations an incomparable ability to un-silo work, answer new questions by drawing information from diverse sources, and innovate technology that creates the next apex experience. For that reason, businesses must up level their investment in data security in concert with their investment in data storage and usability. On the data roadmap, that’s the ultimate step toward data transformation.
Protect your data. Protect your business.
Take Charge of Active Directory Security with BloodHound from SpecterOps
Despite 2020 being dubbed “the year of ransomware,” bad actors have ramped up ransomware attacks in 2021 even more than last year. Somewhat more alarmingly is the speed and effectiveness with which actors can compromise entire networks, enabled by sub-optimal design and maintenance of an organization’s Active Directory. That shouldn’t come as a surprise to anyone in the security world. Hackers and penetration testers alike have targeted Active Directory for years as the most effective means of achieving the attacker’s end goals.
There are many reasons why Active Directory has been, and remains, a prime mark for attackers. As many of you know, Active Directory is Microsoft’s proprietary directory service. IT Administrators use it for a variety of tasks from organizational hierarchy, managing permissions and controlling access to network resources, to what your profile picture looks like or whether you can install an application on your machine.
Its very nature is why it’s so valuable to attackers. Active Directory serves as the central repository for all non-local account authentications and privileges. As such, Active Directory contains the proverbial keys to the kingdom. Attackers can query it to perform reconnaissance on the network; identifying accounts for privilege escalation, lateral movement, or maintaining persistence within the environment; and determining the shortest path to achieving an attacker’s goal (exfiltration of sensitive data, making an impact, or both).
One 2021 study found that 50% of organizations experienced an attack on Active Directory within the past two years and more than 40% reported that the attack was successful. However, Active Directory is inherently difficult to secure – and has been for decades. In fact, many of the features in Active Directory that actually make it work are also what make it so vulnerable.
Consider, for example, the domain controller’s sync function, which transfers and updates AD objects from one domain controller to another. Attackers can take advantage of this process using a DCSync attack, which, with the help of some vulnerable accounts, can impersonate an Active Directory domain controller to then get authentication credentials from other domain controllers. A process designed to maintain availability and prevent a single-point of failure can be abused to compromise every single credential in a domain, without ever actually compromising the Domain Controller itself.
Another example: attackers can exploit functionality in Kerberos, the computer network protocol used to authenticate identities, by finding service accounts with weak passwords and using a common attack known as Kerberoasting to grab the hash of the service account, crack it offline, then use that cracked password to progress further into the network. This ability to grab service account hashes is a feature, not a bug. There’s no patch to fix this, only principals of least privilege for the account, good password hygiene, and regular password changes. This assumes the admins even remember the account exists, let alone what purpose it was originally created to serve.
At the same time, IT teams can create additional security challenges by allowing group and privilege sprawl to creep into the Active Directory environment. Some common issues include: use of unconstrained delegation, poor change management practices/documentation (temporarily elevating privileges and never revoking them), use of simple passwords, and maintaining inactive accounts. Malicious actors know how to exploit all those scenarios to their advantage.
Yet, despite such security challenges and the significant potential for successful attacks, many organizations don’t devote enough attention and resources to assess the risk associated with their Active Directory environment and implement appropriate mitigation strategies.
That’s true in the utilities industry, too.
Many security and technology leaders in this field rely only on network segmentation to provide a layer of protection for OT networks. There is a misconception that if an attacker gains access to the information technology network, segmentation will prevent the attacker from accessing the operational technology. In reality, that’s not the case – even when an organization has established a proper demilitarized zone between segmented IT and OT environments.
That’s because there’s still trust that remains between the two environments (by necessity), so a hacker who is able to compromise a Domain Controller or launch a successful DCSync attack could use hacked credentials to pick a trusted IT machine to connect with one on the OT side of the house – knowing that the two servers trust each other and the credentials would be accepted.
So what’s the short of it?
It’s this: You can lock down and air gap your OT environment, but that won’t protect you from a threat actor who has compromised the IT’s Active Directory services, especially when those services are shared between IT and OT.
The question becomes: how do organizations be more attentive to the risks associated with Active Directory? How do they cut through the massive amount of data and understand what their posture is, and how to improve it? To do that, I recommend organizations start deploying BloodHound: a free open-source software provided by SpectreOps. You can be sure attackers already are.
BloodHound is a discovery tool, designed for users to understand an Active Directory environment. It does this using graph theory and visual representation to uncover hidden or unintended relationships, kerberoastable accounts, opportunities for DCSync attacks, and a number of other misconfigurations or flaws within the environment. It then creates a graph of that analysis, thereby giving security and technology leaders a simple and quick way to depict privilege relationships and design remediations.
I worked with one client to deploy BloodHound, allowing us to identify four kerberoastable service accounts that had the appropriate permissions to accomplish a DCSync attack. Imagine, without ever compromising the Domain Controller or a Domain Admin account – attackers could easily replicate the credentials of every single user account.
In another case, I worked with a client’s Chief Information Security Officer using BloodHound to analyze his organization’s Active Directory environment and found kerberoastable accounts that he thought had been remediated months ago but were actually still active.
Learning how to use BloodHound does require an investment of time, but it’s not a steep learning curve to put this tool to use. SpecterOps has free online tutorials and blogs to help security teams get started, and the tool itself has prewritten queries that enable teams to quickly make use of it with a simple point-and-click. You can even find queries that others have written to expand upon your library.
Given all this, I advise organizations to use BloodHound to audit their own Active Directory environments, or work with us on that analysis. Then, use what BloodHound uncovers to advance onto a path of active monitoring and remediation of identified risks.
This work is critical for defenders if they want to keep pace with their adversaries, who are, as mentioned earlier, also using BloodHound to identify the easiest pathways to a successful attack in their targets.
IT and OT teams together should own this work. Yes, the IT department typically maintains Active Directory, but the impact of a successful attack on Active Directory won’t be limited to IT; as previously stated, it could cripple the OT environment, too.
And that fact alone should make it a primary concern for both IT and OT teams, as well as security personnel and, really, the enterprise as a whole.