While technological advancement has been constant in the 21st century, the last decade has witnessed an unprecedented era of innovation within Artificial Intelligence (AI), permeating virtually every industry. Although the concept of AI has been a field of study since events like the 1956 Dartmouth Summer Research Project (Dartmouth, 2025), current-generation AI in the form of generative AI (GenAI), Large Language Models (LLMs), and agentic AI poses significant cybersecurity challenges. Today we will discuss those implications, examining how AI can escalate associated threats, while also enhancing cybersecurity and software development, providing solutions to complex problems.
Who Am I
My name is Daniella Efrach, and I recently graduated from Virginia Tech with a Computer Science degree, specializing in Secure Computing and minoring in mathematics. Though I gained a strong software development foundation during my studies, my calling was always in the realm of cybersecurity – the challenge of protection and defense captivated me beyond traditional software engineering. This led me to Dark Wolf, where I started as a Cybersecurity Associate in August. Here, I’m rapidly immersing myself in crucial areas of cybersecurity such as government compliance, secure cloud infrastructures, and optimizing the software development lifecycle leveraging both generative and agentic AI. As part of the next generation in cyber defense, I am dedicated to evolving my career to address the future’s most pressing cybersecurity challenges.
Driven by my technical background, my concentration has been in the intersection of cybersecurity and agentic AI development, particularly within the Google sphere. In my time here at Dark Wolf I’ve immersed myself in exploring various innovative tools in this domain, and I’d like to take some time to share insights into a few that I’ve been researching and experimenting with.
Real World AI Applications
GenAI’s impact on cybersecurity is two-sided: a powerful catalyst for the sophistication of cyber threats, yet also an unparalleled enabler of defense. Consider phishing, for example: AI agents can scrape social media profiles and perform other open-source intelligence (OSINT) tasks in minutes (Neotas, 2025). This collected data fuels GenAI to craft highly personalized and relevant spear-phishing communications, complete with deepfake content designed to deceive users. Just last year, British engineering giant Arup fell victim to a deepfake scam to the tune of $25 million. At Arup, an employee attended a video conference call, believing they were interacting with familiar colleagues whose images and voices had been fabricated. Based on ‘decisions’ made during the meeting, the employee authorized the transfer of millions of dollars across 15 transactions (CNN, 2024). This attack is just one example of the effectiveness of fraudulent communications and how it is becoming increasingly difficult for targets to distinguish legitimate and deceptive content.
Beyond the realm of social engineering, GenAI in combination with Agentic AI furthers the automated exploitation of software. Leveraging analysis and context awareness, Agentic AI, an evolution of GenAI, can be armed with the same tools humans use to scan networks and systems for vulnerabilities, and then assist with writing malicious code to exploit those vulnerabilities. Once a vulnerability is pinpointed, leveraging its integrated GenAI capabilities, the agent can develop tailored malicious code designed to exploit that flaw, from crafting custom shellcode to weaponizing zero-day vulnerabilities. These intelligent agents can work together to establish persistence and ensure continued access and control.
To demonstrate what GenAI malware is capable of, a team at Hyas exploited a popular LLM to produce polymorphic keylogging malware. Their research focused on two main objectives: eliminating traditional command and control, and promoting GenAI-driven polymorphism. The malware, BlackMamba, was designed to establish a decentralized, peer-to-peer command and control model. This was done by leveraging encrypted channels through benign platforms like Microsoft Teams for communication. This allowed it to not only exfiltrate attacker-bound data via Teams webhooks, but also receive commands without a dedicated server. They then leveraged code generation through OpenAI’s API to synthesize new variants of the malware code at runtime (Hyas, 2023). This dynamically generated malicious code remained entirely in-memory, executed within the context of the benign program using Python’s exec() function, a technique designed to bypass signature-based antivirus software and next-generation firewalls (NGFW). Notably, BlackMamba was tested against an industry-leading endpoint detection and response solution resulting in zero alerts or detections (Hyas, 2023).
GenAI’s capabilities have also enabled a new phenomenon of ‘vibe hackers’, who represent a significant evolution of the traditional ‘script kiddie.’ Script kiddies historically relied on pre-written exploits. Vibe hackers leverage GenAI with complex prompting and limited oversight to create novel malicious code based on a ‘vibe’ (Wired, 2025). This fundamentally lowers the barrier of entry for cybercrime, empowering even less sophisticated attackers to move beyond the simple deployment of known vulnerabilities to generating customized, and even novel exploits directly, posing a severe new challenge for cyber defenders.
The same capabilities that lower the barrier to entry for cybercrime can also be used to enhance productivity, and strengthen defenses. As we address the security risks inherent in these new tools, we must also explore the potential for both secure code development and active cyber defense. This perspective is essential as we transition from examining GenAI’s harmful applications to detailing how organizations, such as Google, are using AI for good.
How Google is Approaching Secure Code Development with AI Tools
AI tools exist to complement innate human intelligence and intuition, not replace or overshadow it. While it is important to recognize the capabilities and strengths of AI, it is imperative that we classify AI tools as just that – a tool. Their purpose is not to operate autonomously without human oversight, or diminish human capacity by any means. Conversely, we must deploy AI tooling to match the accelerating advancements of our adversaries. In order to use AI to its fullest potential, we need technical personnel that can understand and leverage machine learning tactics including the interpretation, design, optimization, and maintenance of these technologies.
This is why Google has committed to not only countering these threats, but being a disruptor in the cybersecurity industry to create tooling capable of providing an insurmountable advantage to cyber defenders. Three tools developed in the last five years – BigSleep, OSS-Fuzz and CodeMender – illustrate momentous achievements in automating the security of software development. Google has been a driving force in transforming the secure software development space largely due to the transparency of their efforts, encouraging both the public and private sectors to follow their example.
Big Sleep
Big Sleep is a Google-built AI tool that is credited as being the first AI to find a zero-day vulnerability in a SQLite open source database. The vulnerability was found prior to release, completely eliminating the possibility of exploitation from malicious actors. “Think of it as a tireless, logic-driven bug bounty hunter with infinite patience and zero tunnel vision” (Ahmed, 2025). Big Sleep is revolutionizing the approach to vulnerability research, as it has the capability to mimic a researcher’s reasoning when analyzing code, then hypothesize about where it may break, and further develop a proof of concept exploit to then test the actual vulnerability. BigSleep acts as a hybrid cross between software engineer and cybersecurity researcher.
Open Source Software Fuzz (OSS-Fuzz)
Open Source Software Fuzz is an open source project from Google that automates the entire fuzzing workflow: building, running, crash analysis, and bug reporting. By utilizing LLM’s, this allows for the agent to automatically create and run fuzz targets for thousands of functions. The agent can also reduce the number of false positives produced by spurious crashes, and filter them out for more effective fuzzing. It also allows for context-aware analysis, making random input testing obsolete. By analyzing the code’s expected inputs, the AI can generate complex variations to identify bugs. It allows for continuous fuzz testing of any open source software. It differs from traditional fuzzing methodology in its fully automated functionality – traditional fuzzing relies on manually implemented and tailored processes by software developers to test their own code. While OSS-Fuzz is scalable across a diverse range of open source projects, its utility is confined exclusively to them, leaving any privately developed projects out of reach.
Vividly demonstrating its critical role in securing foundational software, OSS-Fuzz reported a critical vulnerability in the OpenSSL library (CVE-2024-9143) in 2024. This vulnerability involved multiple memory corruption issues, capable of leading to denial-of-service or even remote code execution – a significant threat given that the OpenSSL library “underpins much of the internet infrastructure” (Google, 2024). The discovery exemplifies how instrumental OSS-Fuzz is in securing vital foundational software. Cumulatively, the project has helped maintainers patch over 11,000 vulnerabilities over the last eight years (Google, 2024).
CodeMender
Most recently, on October 6th, Google released the preliminary results of its new AI-driven automated code security agent, CodeMender. CodeMender takes a comprehensive approach to software security with both proactive and reactive response. It has the ability to patch new vulnerabilities, rewrite and secure existing code, and eliminate entire classes of vulnerabilities. Google’s DeepMind organization stated “…over the past 6 months CodeMender has upstreamed 72 security fixes across open source projects, including some as large as 4.5 million lines of code” (Google, 2025). CodeMender leverages the best of Gemini’s DeepThink capability to reason about code before making changes. This tool has the potential to become essential to the safeguarding of software systems.
Assured Open Source Software (AOSS)
Google is also contributing to risk management automation in the supply chain industry. The criticality of this issue has been spotlighted by recent high-profile incidents, such as the npm package breaches. These breaches reiterate the vulnerabilities inherent in modern software development ecosystems. The npm breach demonstrates how a single compromised open-source component can impact countless applications, making robust Software Supply Chain Risk Management (SSCRM) critical (Henig and Hyde, 2025).
While not an AI tool, Google’s Assured Open Source Software (AOSS) introduced a new way for developers to vet packages used in software development. AOSS provides a way to assure libraries’ safety utilizing Google’s security and vulnerability research, scanning, and patching techniques. AOSS gives developers the next evolution beyond a simple Software Bill of Materials (SBOM), and provides context to software provenance – in other words, instead of simply listing the “ingredients,” AOSS ensures a safe supply chain of the software included in a package. This is very similar to what Chainguard offers, just specific to a Google ecosystem.
At the time of release, the packages were limited to Java and Python, assuring about 1000 libraries. This is what Google uses in their own development to prevent software supply chain vulnerabilities.
How does Dark Wolf harness this advantage?
To ensure US warfighters possess an unparalleled technological advantage against the rise of automated GenAI-driven threats, Dark Wolf’s mission strategically integrates the best of both public and private sector innovation. We lead the defense space on two interconnected fronts: first, by extensively automating our own threat detection, analysis and response processes through advanced AI; and second, by actively progressing in harmony with private sector AI advancements. For example, we seamlessly integrate solutions from industry leaders, such as Google, into our efforts. This strategy allows Dark Wolf to not only understand the evolving adversarial landscape, but also to anticipate and counter GenAI-driven techniques, ultimately pioneering truly dominant capabilities vital for our nation’s security.
Dark Wolf also capitalizes on the full promise of GenAI by strategically integrating open-source capabilities, such as Google Cloud’s offerings, with our own accelerators. These include innovations like Saving Throw, gigaBruce, and Warp Pipelines. This multi-faceted approach embeds security throughout the entire software development lifecycle, ensuring every end product is inherently secure by design, not just at final delivery. This fusion of GenAI tools, with deep cybersecurity expertise enables Dark Wolf to not only streamline the Authorization to Operate (ATO) process, but to deliver compliant solutions at unprecedented speeds, with the cybersecurity assurance of enhanced scrutiny for 0-days, vulnerability discovery and remediation, and notably reduced person-effort in producing RMF evidence.
1 Saving Throw is an internally developed accelerator project that automates static binary analysis, leveraging Ghidra and the Common Weakness Enumeration system to detect zero days before pushing to production.
2 gigaBruce is another internal accelerator that provides an AI-powered RMF assistant that facilitates information sharing between ISSOs, ISSMs, developers, and the system itself via API calls.
3 Warp Pipelines is an internal accelerator, providing GitLab CI/CD pipelines that integrate best practices, and help bootstrap projects quickly
References:
Ahmed, N. (2025, July 22). How a thinking machine changed cybersecurity forever. LinkedIn. https://www.linkedin.com/pulse/big-sleep-googles-ai-agent-hunted-down-zero-day-before-nazeer-ahmed-w3dif/?trackingId=Ej7EohCkQ3eq61NIR7GHvQ%3D%3D
AnuPriya. (2025, July 16). Google’s big sleep AI detects and halts active exploitation of SQLite 0-day vulnerability. Cyber Security News. https://cyberpress.org/googles-big-sleep-ai-detects/
Benjamini, G. (2024, October 15). From naptime to Big Sleep. Google Project Zero Blog. https://googleprojectzero.blogspot.com/2024/10/from-naptime-to-big-sleep.html
Code Intelligence. (2021, April 16). Short intro to OSS-Fuzz. Code Intelligence Blog. https://www.code-intelligence.com/blog/intro-to-oss-fuzz
Cybersecurity and Infrastructure Security Agency (CISA). (2025, September 23). Widespread supply chain compromise impacting npm ecosystem. CISA. https://www.cisa.gov/news-events/alerts/2025/09/23/widespread-supply-chain-compromise-impacting-npm-ecosystem
Dartmouth. (n.d.). Artificial intelligence (AI) coined at Dartmouth. https://home.dartmouth.edu/about/artificial-intelligence-ai-coined-dartmouth
Google. (2025, October 6). How we’re securing the AI frontier. Google Blog. https://blog.google/technology/safety-security/ai-security-frontier-strategy-tools/
Google Cloud. (n.d.). Assured open source software. https://cloud.google.com/security/products/assured-open-source-software?hl=en
Google Cloud. (n.d.). Use AI securely and responsibly | Cloud architecture center. https://cloud.google.com/architecture/framework/security/use-ai-securely-and-responsibly
Google Security Blog. (2024, November 13). Leveling up fuzzing: Finding more vulnerabilities, faster with LLMs. https://security.googleblog.com/2024/11/leveling-up-fuzzing-finding-more.html
Magramo, K. (2024, May 17). British engineering giant Arup revealed as $25 million deepfake scam victim | CNN Business. CNN. https://www.cnn.com/2024/05/16/tech/arup-deepfake-scam-loss-hong-kong-intl-hnk
Neotas – Due Diligence and Employment Screening. (2023, December 4). AI for social media checks & OSINT | Social media screening. Neotas. https://www.neotas.com/ai-for-social-media-checks-and-osint/
OpenSSL Project. (2024, October 16). [OpenSSL security advisory 20241016]. https://openssl-library.org/news/secadv/20241016.txt
OSS-Fuzz. (n.d.). OSS-Fuzz. https://google.github.io/oss-fuzz/
Packetlabs. (n.d.). Google Assured OSS program. Packetlabs Blog. https://www.packetlabs.net/posts/google-assured-oss-program/
Palo Alto Networks. (2025, September 10). Breakdown: Widespread NPM supply chain attack puts billions of weekly downloads at risk. Palo Alto Networks Blog. https://www.paloaltonetworks.com/blog/cloud-security/npm-supply-chain-attack/
Thomas, S. (2023, April 13). Google Cloud’s open-source security bolstered with Assured OSS general roll-out. ERP Today. https://erp.today/google-clouds-open-source-security-bolstered-with-assured-oss-general-roll-out/