Research of AttackDetect

Abstract:

题目叫AttackDetect而不是IntrusionDetect，是想和传统的入侵检测作一个区分，这些工作除了判断出系统内出现攻击之外，往往还能给出攻击已经到达的阶段（或者其他更复杂的信息）。
除此之外，关注一些大模型的方法，无论是否能够直接用于检测，最好是一些比较有想象力的工作。

题目	单位	年份	来源	数据集	备注
HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs)	University of Oulu, Faculty of Information Technology and Electrical Engineering Center for Ubiquitous Computing, Oulu, Finland	2023
LActDet: An Automatic Network Attack Activity Detection Framework for Multi-step Attacks	Institute of Information Engineering, Chinese Academy of Sciences	2023	2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)	DARPA2000、ISCXIDS2012、Strato-sphere	攻击活动分类
Anomaly Detection in Continuous-Time Temporal Provenance Graphs	Huawei Munich Research Center	2023	NeurIPS	DARPA Engagement 3 -> THEIA,TRACE	异常检测
LMTracker: Lateral movement path detection based on heterogeneous graph embedding	College of Cyber Science and Engineering, Sichuan University	2022	Neurocomputing	LANL、CERT6.2	横向移动路径检测
Euler: Detecting Network Lateral Movement via Scalable Temporal Link Prediction	The George Washington University, USA	2022	NDSS-2022	LANL	架构优化，链接预测
APT-KGL: An Intelligent APT Detection System Based on Threat Knowledge and Heterogeneous Provenance Graph Learning	College of Computer Science andTechnology, Zhejiang University ofTechnology, Hangzhou, China	2022	IEEE Transactions on Dependable and Secure Computing	DARPA	节点分类
KRYSTAL: Knowledge graph-based framework for tactical attack discovery in audit data	Vienna University of Economics and Business, Welthandelsplatz 1, Vienna, Austria	2022	Computers & Security	DARPA TC	全链条知识推理方法
SLEUTH: Real-time Attack Scenario Reconstruction from COTS Audit Data	Stony Brook University	2017	26th USENIX Security Symposium	DARPA TC	溯源图开山文
MEGR-APT: A Memory-Efficient APT Hunting System Based on Attack Representation Learning	Department of Computer Science and Software Engineering, Concordia University	2024	IEEE Transactions on Information Forensics and Security
[]

Content:

HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs)

Year: 2023, Author: University of Oulu, Faculty of Information Technology and Electrical Engineering Center for Ubiquitous Computing, Oulu, Finland

Abstract:

Machine learning (ML) methods for network anomaly detection are emerging as effective proactive strategies in threat hunting, substantially reducing the time required for threat detection and response. However, the challenges in training and maintaining ML models, coupled with frequent false positives, diminish their acceptance and trustworthiness. In response, Explainable AI (XAI) techniques have been introduced to enable cybersecurity operations teams to assess alerts generated by AI systems more confidently. Despite these advancements, XAI tools have encountered limited acceptance from incident responders and have struggled to meet the decision-making needs of both analysts and model maintainers. Large Language Models (LLMs) offer a unique approach to tackling these challenges. Through tuning, LLMs have the ability to discern patterns across vast amounts of information and meet varying functional requirements. In this research, we introduce the development of HuntGPT, a specialized intrusion detection dashboard created to implement a Random Forest classifier trained utilizing the KDD99 dataset. The tool incorporates XAI frameworks like SHAP and Lime, enhancing user-friendliness and intuitiveness of the model. When combined with a GPT-3.5 Turbo conversational agent, HuntGPT aims to deliver detected threats in an easily explainable format, emphasizing user understanding and offering a smooth interactive experience. We investigate the system’s comprehensive architecture and its diverse components, assess the prototype’s technical accuracy using the Certified Information Security Manager (CISM) Practice Exams, and analyze the quality of response readability across six unique metrics. Our results indicate that conversational agents, underpinned by LLM technology and integrated with XAI, can enable a robust mechanism for generating explainable and actionable AI solutions, especially within the realm of intrusion detection systems.

Record:

以易于理解的格式提供威胁检测结果；感觉可能更像从产品角度出发；文章展示了其交互页面如下：

1）OpenAI连接器：OpenAI连接器主要用于使用OpenAI API进行身份验证，并初始化预先录制的消息历史记录。它还可以跟踪用户的对话。
2）异常数据包数据获取：它浏览了弹性搜索索引中的所有文档，并从该文档中提取了我们需要的所有重要信息。（RAG吧）
3）OpenAI API单元：将检测到的数据包流数据与精心策划的微调提示集成，并将它们提供给OpenAI API。（哪来的微调数据集？）
4）人工智能助手分析：人工智能助手接收文档中保存的所有数据，并伴随着来自该单元的细化提示，并为人类代理生成一个全面的分析。这种分析不仅揭示了细节，而且促进了直接与人类代理的交互交流，使信息和见解的无缝交换。（知识库？）

和大模型相关的表述如上，其中有用的部分可能就是3）的微调数据集

LActDet: An Automatic Network Attack Activity Detection Framework for Multi-step Attacks

Year: 2023, Author: Institute of Information Engineering, Chinese Academy of Sciences

Abstract:

With the evolution of attack tactics, cyber-attacks are presenting a sophisticated trend. The multi-step attack has become the mainstream attack form, where adversaries implement multiple attack steps to achieve their goals, which poses server challenges to attack detection. Traditional research mainly concentrates on how a particular attack step is exploited but fails to identify the whole attack activity automatically. Manual analysis is required to correlate multiple steps and determine the fine-grained type of attack activities, which is a heavy workload. In addition, the high error rate of alerts results in a negative impact on attack-activity detection performance.
To address these challenges, we propose a framework, LActDet, to automatically identify attack activities from the raw alerts end-to-end. Firstly, it utilizes a document-embedding method to vectorize attack-event descriptions. Second, a seq2seq model is implemented to embed the attack-event sequence into the attack-phase sequence to represent the framework of attack activity, aiming at improving the fault tolerance for error alerts. In the end, we propose a temporal-sequence-based classifier to identify attack activities. Our experimental results demonstrate that LActDet achieves higher detection accuracy, lower artificial dependence, and less system overhead.

Record:

告警数据->攻击阶段->活动分类，详情见博客内独立文章。

Anomaly Detection in Continuous-Time Temporal Provenance Graphs

Year: 2023, Author: Huawei Munich Research Center

Abstract:

Recent advances in Graph Neural Networks (GNNs) have matured the field of learning on graphs, making GNNs essential for prediction tasks in complex, interconnected, and evolving systems. In this paper, we focus on self-supervised, inductive learning for continuous-time dynamic graphs. Without compromising generality, we propose an approach to learn representations and mine anomalies in provenance graphs, which are a form of large-scale, heterogeneous, attributed, and continuous-time dynamic graphs used in the cybersecurity domain, syntactically resembling complex temporal knowledge graphs. We modify the Temporal Graph Network (TGN) framework to heterogeneous input data and directed edges, refining it specifically for inductive learning on provenance graphs. We present and release two pioneering large-scale, continuous-time temporal, heterogeneous, attributed benchmark graph datasets. The datasets incorporate expert-labeled anomalies, promoting subsequent research on representation learning and anomaly detection on intricate real-world networks. Comprehensive experimental analyses of modules, datasets, and baselines underscore the effectiveness of TGN-based inductive learning, affirming its practical utility in identifying semantically significant anomalies in real-world systems

Record:

现实世界的网络通常是异质（翻译是一个学问，有的时候和异构一个意思-heterogeneity，有的时候有细微的区分）的，并且随着时间的推移不断演变，因此需要关注动态图的连续时间表示（对于一般离散时间图（如每天变化的图），可以通过切片转化为快照，而连续时间表示则需要一些其他的方案）。

注：

静态异构图（HINs）：$G=(V,E), V\rightarrow T(type), E\rightarrow R(type), v_i\in R^{d_v}, e_{ij}\in R^{d_e}$，文中没有明确说明异构的概念。
连续时间动态异构图（CTDHGs）：$G=((i,r,g,t,e): i,j\in V(t), r\in R)$，其中$(i,r,g,t,e)$表示在时间 t ，节点 i 和 j 之间增加了一条类型为 r 的边。每个事件还包括一个特征向量 e ，它定义了与当前事件相关的状态，$\color{FF0000}{/什么叫状态/}$。节点集$V(t)$是时间的函数，检索截至并包括时间 $t$ 观察到的节点集。通过丢弃时间信息并随后移除具有相同类型和方向的重复边，可以将 CTDHG 转换为 HIN。
时间链接预测：预测在特定时间戳是否存在边。形式化为$f(\epsilon):V\times V\times R^+ \rightarrow [0,1]$，即映射节点对、时间、边类别到一个概率。训练基于负采样$(i,j,t)$得到$(i,k,t)$（图里只有正边）形成$\bar{G}={(i,k,t):(i,k,t)\notin G}$。
使用这种自监督损失训练的模型应当将基于异常程度的边出现概率关联得更低；因此，此类模型的一个应用就是异常边的检测。$\color{FF0000}{/如果某个边的预测概率很低，但是出现了，就说明是异常边？/}$
溯源图：日志生成溯源图。在溯源图中，进程是主要的行为者，与其他实体互动。例如，当一个进程 /usr/bin/vim 打开一个文件 /etc/hosts 时，这就会产生一个边 /etc/hosts -> /usr/bin/vim （边是有向的、type: read 、有属性、带时间戳；节点的种类可能是 File, Process, Socket, …）。

数据集：两个由DARPA Engagement 3的审计日志收集构建的数据集，分别是THEIA和TRACE。每个数据集包含一个主机两周期间的审计数据，并包含良性和恶意活动；后者由红队执行。

方法：基于扩展的TGN完成链接预测。

代码：https://github.com/JakubReha/ProvCTDG/

LMTracker: Lateral movement path detection based on heterogeneous graph embedding

Year: 2022, Author: College of Cyber Science and Engineering, Sichuan University

Abstract:

Advanced Persistent Threats(APT) with the purpose of stealing confidential data take place all the time. In the APT life cycle, lateral movement is a critical stage towards high-level authority and confidential data. Existing lateral movement detection mainly concentrates on endpoint protection to distinguish compromised hosts. These approaches not only have unfortunate effect but also can not detect lateral movement behavior comprehensively. We design LMTracker, an attack path detection algorithm based on the heterogeneous graph, in order to make up for above shortcomings. LMTracker consists of three modules: heterogeneous graph construction, path representation generation, and unsupervised anomaly-based attack path detection. The core idea of LMTracker is to use event logs and traffic to establish heterogeneous graphs and generate representation vectors for lateral movement paths, then use unsupervised algorithm to implement anomaly-based path detection. This method can not only detect lateral movement paths effectively but also preserve the path relationships. Security professionals can use these paths to analyze attack activities. In two frequently-used public datasets, the evaluation results demonstrate that LMTracker performs significantly better than other methods and can adapt to attack detection in different scenarios. The area under the ROC curve is as high as 0.95.

Record:

传统方法关注使用系统调用链来阻止在指定的损坏主机上的攻击行为，即取证问题或主机分类问题$\color{FF0000}{/节点评分？/}$
安全审计日志、流量->

异构图构建：节点或边的类型是否相同->是否是异构图。
- 节点：计算机、用户、进程和文件。
- 边：计算机之间的关系包括网络连接和用户的远程访问；计算机与用户之间的关系包括登录和使用。计算机与用户和进程的关系在于用户可以在计算机上创建或终止进程，文件与进程同理。文件可以执行进程，进程可以创建文件。
- 横向移动的活动发生在计算机或用户之间。然而，在内部网络中，用户到用户的横向移动不能脱离计算机独立进行，必须借助计算机作为媒介。
- 日志元组：$<time,C_{src},U_{src},C_{dst},U_{dst},T,A(事件属性)>$ T为事件类型：登录、登出、进程创建、文件创建等，A为属性；登录事件，有登录类型、认证类型、登录结果等。对于网络连接，包括协议、端口等。
- 异构图 $<V,E,T_V,T_E>$
路径表示生成：
- 通过结合元路径引导的Random Walk和Skip-Gram模型来捕捉异质图中的复杂关系，并将其转化为可处理的向量形式。通过metapath2vec++（将随机游走限制在绑定的元路径上）随机游走采样序列后，使用Skip-Gram来学习节点表示，这种方法不仅保留了异质图中的结构信息，还保留了语义信息。
- 专家->元路径 $\color{FF0000}{/需要预先给定特定的路径规则，但这个规则最好是学出来的/}$
- word2vec采用Skip-Gram
- 在获得节点的嵌入表示后，路径由构成路径的所有边的第一个和最后一个节点的均方误差或平均绝对误差来表示
无监督基于异常的攻击路径检测：
- 自编码器能够学习未标记样本的数据特征，并以低损失重构它们。然而，不符合这些特征的数据的重构误差会很大。我们设计了一个包含多层神经网络的自编码器，用以区分良性路径与异常的横向移动路径，而不是使用聚类方法。$\color{FF0000}{/处理未标注数据的另一种方法/}$

数据集：2015年洛斯阿拉莫斯国家实验室（LANL）发布的全面的多源网络安全事件数据集LANL。卡内基梅隆大学软件工程学院发布的第6.2版合成内部威胁测试数据集CERT6.2。

代码：https://github.com/quanghuy-ngo/LMTracker/tree/master/DARPA

Euler: Detecting Network Lateral Movement via Scalable Temporal Link Prediction

Year: 2022, Author: The George Washington University, USA

Abstract:

Lateral movement is a key stage of system compromise used by advanced persistent threats. Detecting it is no simple task. When network host logs are abstracted into discrete temporal graphs, the problem can be reframed as anomalous edge detection in an evolving network. Research in modern deep graph learning techniques has produced many creative and complicated models for this task. However, as is the case in many machine learning fields, the generality of models is of paramount importance for accuracy and scalability during training and inference. In this article, we propose a formalized approach to this problem with a framework we call Euler. It consists of a model-agnostic graph neural network stacked upon a model-agnostic sequence encoding layer such as a recurrent neural network. Models built according to the Euler framework can easily distribute their graph convolutional layers across multiple machines for large performance improvements. Additionally, we demonstrate that Euler-based models are as good, or better, than every state-of-the-art approach to anomalous link detection and prediction that we tested. As anomaly-based intrusion detection systems, our models efficiently identified anomalous connections between entities with high precision and outperformed all other unsupervised techniques for anomalous lateral movement detection. Additionally, we show that as a piece of a larger anomaly detection pipeline, Euler models perform well enough for use in real-world systems. With more advanced, yet still lightweight, alerting mechanisms ingesting the embeddings produced by Euler models, precision is boosted from 0.243, to 0.986 on real-world network traffic.

Record:

将异常横向移动检测构建为一个时间图链接预测问题。
现有方法结合了图神经网络（GNN）和序列编码器（如循环神经网络RNN）来捕捉演变网络的拓扑和时间特征：

先前的方法在嵌入的 GNN 阶段依赖于 RNN 输出，或者仅仅将 GNN 合并到 RNN 架构中，这迫使模型串行工作，一次一个快照。EULER 框架可以利用多个工作机器来保存离散时间图的连续快照。这些工作人员通过每台机器共享的复制 GNN 并行处理快照。$\color{FF0000}{/暂时不关心架构的变化/}$
这篇文章使用了快照（离散时间图）以实现并行。$G={G_1,G_2,…,G_T}, G_t={V,\Epsilon_t,X_t}$，这里，V表示网络中出现的所有节点的集合，Et表示t时刻节点之间的关系，Xt表示t时刻与节点相关的任何特征。在这项工作中，所有的图都是有向的，一些有加权边，$W: E\rightarrow R$表示每个快照所包含的时间段内的边缘频率。
用户与网络中机器在特定时间的交互$I$表示为多组$<src、dst、ts>$。在这里，src是一个在ts时刻与实体dst交互的实体。从这个多集出发，我们可以构建具有时间窗δ的时间图$G={G_0,…,G_T}$，取时间窗为$[t,t+δ)$。
数据集：LANL，代码：https://github.com/iHeartGraph/Euler

APT-KGL: An Intelligent APT Detection System Based on Threat Knowledge and Heterogeneous Provenance Graph Learning

Year: 2022, Author: College of Computer Science andTechnology, Zhejiang University ofTechnology, Hangzhou, China

Abstract:

APTs (Advanced Persistent Threats) have caused serious security threats worldwide. Most existing APT detection systems are implemented based on sophisticated forensic analysis rules. However, the design of these rules requires in-depth domain knowledge and the rules lack generalization ability. On the other hand, deep learning technique could automatically create detection model from training samples with little domain knowledge. However, due to the persistence, stealth, and diversity of APT attacks, deep learning technique suffers from a series of problems including difficulties of capturing contextual information, low scalability, dynamic evolving of training samples, and scarcity of training samples. Aiming at these problems, this paper proposes APT-KGL, an intelligent APT detection system based on provenance data and graph neural networks. First, APT-KGL models the system entities and their contextual information in the provenance data by a HPG (Heterogeneous Provenance Graph), and learns a semantic vector representation for each system entity in the HPG in an offline way. Then, APT-KGL performs online APT detection by sampling a small local graph from the HPG and classifying the key system entities as malicious or benign. In addition, to conquer the difficulty of collecting training samples of APT attacks, APT-KGL creates virtual APT training samples from open threat knowledge in a semi-automatic way. We conducted a series of experiments on two provenance datasets with simulated APT attacks. The experiment results show that APT-KGL outperforms other current deep learning based models, and has competitive performance against state-of-the-art rule-based APT detection systems.

Record:

APT-KGL总体架构如下：

Provenance Graph Construction: 图节点包括进程、文件、套接字、进程间通信（IPC）、内存、网络和属性，关系有8种。
- 溯源图转为异构图：溯源图考虑系统实体的具体实例（例如，每个不同的文件被视为完全不同的节点）。相反，HPG使用泛化概念来定义每个节点。例如，不同文件之间的语义关系可以通过共同属性节点来捕获，这有助于学习泛化模式。
Heterogeneous Graph Learning: $V\rightarrow R^d$
- HPG包含多种类型的节点和边，传统的图嵌入技术（例如，DeepWalk、node2vec、LINE）无法解决这个问题，因此引入了随机游走者来连接各种类型的节点，在元路径上为每个节点生成一组邻居。
- 元路径：元路径是在HPG模式中定义的路径$A_1→R_1→A_2→R_2→…→R_{L−1}→A_L$，$R = R_1◦R_2◦R_{L−1}$。在实践中，一个元路径中的A1和AL通常具有相同的节点类型，因此随机游走在一个元路径上结束，可以立即在另一个元路径上开始。具体长这样：
- 用的模型是HGAT。
APT Detector: 与其他方法不同的是，这里没有使用链接预测，而是直接对“进程”类别节点进行分类。
- 增量方法：对于不断处理新进入的系统实体，频繁重新运行HPG嵌入程序是不可行的。基于局部图采样的APT检测策略。新实体交互的系统实体（记为NVS），将NVS中的每个系统实体链接到现有的HPG（组合图记为CG，属性能保证全都有连到），从CG采样一个子图SG。
- 节点分类模型是R-GCN。
Threat Knowledge Extraction: 从CTI或TTP中提取威胁知识，并以半自动的方式将威胁知识建模为查询图（提取APT HPG -> 加入良性噪声）。

数据集：实验自制和DARPA数据集。源码来自：https://github.com/hwwzrzr/APT-KGL

KRYSTAL: Knowledge graph-based framework for tactical attack discovery in audit data

Year: 2022, Author: Vienna University of Economics and Business, Welthandelsplatz 1, Vienna, Austria

Abstract:

Attack graph-based methods are a promising approach towards discovering attacks and various techniques have been proposed recently. A key limitation, however, is that approaches developed so far are monolithic in their architecture and heterogeneous in their internal models. The inflexible custom data models of existing prototypes and the implementation of rules in code rather than declarative languages on the one hand make it difficult to combine, extend, and reuse techniques, and on the other hand hinder reuse of security knowledge – including detection rules and threat intelligence. KRYSTAL tackles these challenges by providing a knowledge graph-based, modular framework for threat detection, attack graph and scenario reconstruction, and analysis based on RDF as a standard model for knowledge representation. This approach provides query options that facilitate contextualization over internal and external background knowledge, as well as the integration of multiple detection techniques, including tag propagation, attack signatures, and graph queries. We implemented our framework in an openly available prototype and demonstrate its applicability on multiple scenarios of the DARPA Transparent Computing dataset. Our evaluation shows that the combination of different threat detection techniques within our framework improved detection capabilities. Furthermore, we find that RDF provenance graphs are scalable and can efficiently support a variety of threat detection techniques.

Record:

几个之前没见过的亮点：1、同时使用了溯源图和攻击图，2、对外部知识使用查询，3、知识推理，4、攻击链路->TTP的映射；KRYSTAL整体流程图如下所示：

溯源图构建：审计日志->溯源图json->rdf表达 $\color{FF0000}{/没看懂这个地方推理是如何起作用的/}$，图压缩：URI去重、HDT压缩。
威胁检测和告警：
- 衰减和衰变：利用Tag attenuation和Tag decay解决dependence explosion（出处图中的一个节点与大量的系统对象互动时，就会出现这种情况，导致大量的良性事件被标记为攻击的一部分）$\color{FF0000}{/看论文里这一部分也是基于推理和规则进行的/}$
- 基于溯源图的告警：查询是否满足某种条件？（类似攻击图的安全条件，我怎么知道满足哪些条件的时候告警？靠专家编规则吗？）
- 基于签名和规则的威胁检测：
红色部分是所谓映射，KRYSTAL使用了Sigma 作为规则，其中包含了有关TTP的映射。
$\color{FF0000}{/完全没看懂/}$
攻击图构建：一旦构建了起源图并引发了警报，下一个重要步骤是了解警报是如何关联的，并重新构建潜在的攻击步骤。为此，我们通过后向-前向链接和图查询这两个步骤，从起源图构建攻击图和攻击场景。
- 反向-正向链接用于首先识别攻击的潜在根本原因，然后重建整个攻击步骤。由于基于来源的警报可能会产生大量警报，我们需要对它们进行优先级排序，并识别攻击的潜在根本原因警报。这可以通过在向后搜索期间分配警报分数来实现，即递增路径上每个在先警报的警报分数。
- 图查询可以基于攻击模式检测起源图中的攻击行为。图形查询模式可以从观察到的行为或发布的CTI、事故报告、公共恶意软件文档等中的现有信息中构建。在此基础上，可以手动构建模式，或者通过自动提取方法构建模式。

数据集：DARPA TC

SLEUTH: Real-time Attack Scenario Reconstruction from COTS Audit Data

Year: 2017, Author: Stony Brook University

Abstract:

We present an approach and system for real-time reconstruction of attack scenarios on an enterprise host. To meet the scalability and real-time needs of the problem, we develop a platform-neutral, main-memory based, dependency graph abstraction of audit-log data. We then present efficient, tag-based techniques for attack detection and reconstruction, including source identification and impact analysis. We also develop methods to reveal the big picture of attacks by construction of compact, visual graphs of attack steps. Our system participated in a red team evaluation organized by DARPA and was able to successfully detect and reconstruct the details of the red team’s attacks on hosts running Windows, FreeBSD and Linux.

Record:

原来溯源图提出时间也不早，这篇来自石溪大学和伊利诺伊大学的文章，我觉得大概就是这类工作的鼻祖了，值得好好读一下，而且从文章描述来看，现在常用的数据集DARPATC很项目可能就是这两个大学提供TA2部分的支持。
首先，作者就此前的工作对当下实时检测提出了如下的痛点：

事件存储和分析：我们如何有效地存储来自事件流中的数百万条记录，并让算法在几秒钟内筛选这些数据？
对分析实体的优先排序：我们如何帮助被数据量所淹没的分析师，对最有可能的攻击场景进行优先排序并快速“放大”？
场景重构：我们如何简洁地总结攻击场景，从攻击者的入口点开始，识别整个战役对系统的影响？
处理常见的使用场景：如何处理类似于在攻击期间观察到的正常、良性的活动，例如软件下载？
快速、交互式推理：我们如何为分析师提供通过数据进行有效推理的能力，比如说，用另一种假设？

直接展示其整体流程图：

来自这些操作系统的审计数据被处理成一个平台中立（比较高级的叫法是多源数据融合）的图表示，其中顶点表示 subjects (processes)、objects
(files, sockets)，而边表示audit events (e.g., operations such as read, write, execute, and connect)。该图是攻击检测、因果关系分析和场景重建的基础。

好像这篇文章也没有讲构建细节，基于标签的方法，用于识别最可能涉及攻击的主体、对象和事件。

MEGR-APT: A Memory-Efficient APT Hunting System Based on Attack Representation Learning

Year: 2024, Author: Department of Computer Science and Software Engineering, Concordia University

Abstract:

The stealthy and persistent nature of Advanced Persistent Threats (APTs) makes them one of the most challenging cyber threats to uncover. Several systems adopted the development of provenance-graph-based security solutions to capture this persistent nature. Provenance graphs (PGs) represent system audit logs by connecting system entities using causal relations and information flows. Hunting APTs demands the processing of ever-growing large-scale PGs of audit logs for a wide range of activities over months or years, i.e., multiterabyte graphs. Existing APT hunting systems are typically memory-based, which suffers colossal memory consumption, or disk-based, which suffers from performance hits. Therefore, these systems are hard to scale in terms of graph size or time performance. In this paper, we propose MEGR-APT, a scalable APT hunting system to discover suspicious subgraphs matching an attack scenario (query graph) published in Cyber Threat Intelligence (CTI) reports. MEGR-APT hunts APTs in a twofold process: (i) memory-efficient extraction of suspicious subgraphs as search queries over a graph database, and (ii) fast subgraph matching based on graph neural network (GNN) and our effective attack representation learning. We compared MEGR-APT with state-of-the-art (SOTA) APT systems using popular APT benchmarks, such as DARPA TC3 and OpTC. We also tested it using a real enterprise dataset. MEGR-APT achieves an order of magnitude reduction in memory consumption while achieving comparable performance to SOTA in terms of time and accuracy.

Record:

Threat Detection and Investigation with System-level Provenance Graphs: A Survey

Year: 2020, Author: Zhejiang University

Abstract:

With the development of information technology, the border of the cyberspace gets much broader and thus also exposes increasingly more vulnerabilities to attackers. Traditional mitigation-based defence strategies are challenging to cope with the current complicated situation. Security practitioners urgently need better tools to describe and modelling attacks for defense.
The provenance graph seems like an ideal method for threat modelling with powerful semantic expression ability and attacks historic correlation ability. In this paper, we firstly introduce the basic concepts about system-level provenance graph and present a typical system architecture for provenance graph-based threat detection and investigation. A comprehensive provenance graph-based threat detection system can be divided into three modules: data collection module, data management module, and threat detection modules. Each module contains several components and involves different research problems. We systematically taxonomize and compare the existing algorithms and designs involved in them. Based on these comparisons, we identify the strategy of technology selection for real-world deployment. We also provide insights and challenges about the existing work to guide future research in this area.

Record:

知乎的一个博主写的，有关于溯源图的一篇综合性文章。

Reference:

https://blog.csdn.net/Eastmount/article/details/120555733