CodeQL基础学习-cpp

psych

2025-11-07

cpp安全 › 代码扫描

cpp安全, 代码扫描

教程

官方：https://codeql.githubdocs.cn/docs/codeql-overview/

CodeQL 入门

环境配置

codeql-cli：https://github.com/github/codeql-cli-binaries/releases
codeql-sdk：https://github.com/github/codeql.git

可分开安装在 /usr/local 目录下，并写入环境变量。

貌似要很好的适配只能使用vscode，cursor之类的貌似不太适配：

由于 CodeQL 是 GitHub 开源的,而 GitHub 又被微软全资收购了,且 Visual Studio Code 又是微软开源的,所以 CodeQL 只对 Visual Studio Code 做了全面友好的支持,我们如果需要本地使用CodeQL，那就不得不下载 Visual Studio Code!

CodeQL 分析包括三个步骤

通过创建 CodeQL 数据库来准备代码
针对数据库运行 CodeQL 查询
解释查询结果

因此第一步需要创建数据库来进行查询，它把源码转换成“可查询的结构化数据”。由于c++是需要编译的语言，正常情况codeql需要先对c++进行编译，然后构建对应的 database，这样后续的分析就会包括编译生成的 Generated code。但主包手里的 C++ 仓库编译不了，卡在这一步很久，查阅文档发现目前 C++ 已经支持不用 build就可以查询。https://github.blog/changelog/2025-10-14-codeql-scanning-rust-and-c-c-without-builds-is-now-generally-available/

只需要设置 --build-mode none 就可以直接开始构建了。

一些 CodeQL CLI 的命令：

> codeql database create codeql-dbs --source-root=src \\
    --db-cluster --language=java --command=./myBuildScript
    
> codeql database analyze codeql-dbs/java java-code-scanning.qls \\
    --format=sarif-latest --sarif-category=java --output=java-results.sarif

> codeql github upload-results \\
    --repository=my-org/example-repo \\
    --ref=refs/heads/main --commit=deb275d2d5fe9a522a0b7bd8b6b6a1c939552718 \\
    --sarif=java-results.sarif

不过要想做全面扫描，肯定还是带编译的建库更好，所以编译这一块还是要想一下怎么搞。

QL 查询

常见的查询方式类似于

from /* ... variable declarations ... */
where /* ... logical formulas ... */
select /* ... expressions ... */

/* ... e.g. ... */
from date start, date end
where start = "10/06/2017".toDate() and end = "28/09/2017".toDate()
select start.daysTo(end)

变量的类型常见的有：

字符串 - string
浮点数 - float
布尔值 - boolean
日期(日/月/年) - date

Example

import tutorial

predicate isSouthern(Person p){
 p.getLocation() = 'south'
}

class Southern Extends People{
	Southern() {
		isSouthern(this)
	}
}

class Child Extends People {
	Child(){
		this.age() < 10
	}
	
	override predicate isAllowedIn(string region){
		region = this.getLocation()	
	}
}

predicate isBalds(Person p){
	not exists(string c | p.getHairColor() = c)
}

from Southern s
where s.isAllowedIn("north") and isBalds(s)
select s

C / C++ 学习

基础查询

函数查找

使用 Function 和 FunctionCall 来查找对函数 sprintf 的调用。

import cpp

from FunctionCall fc
where fc.getTaeget().getQualifiedName() = "spinrf"
select fc,"sprintf called with variable format string"

表达式查找

查找将 0 赋值给整数

import cpp
from AssignExpr e
where e.getRValue().getValue().toInt() = 0
	and e.getLValue().getType().getUnspecifiedType() instamceof IntegralType
select e, "Assigning the value 0 to an integer"

数据流

使用数据流分析来跟踪可能存在恶意或不安全数据流，这些数据流会导致代码库中的漏洞。

本地数据流与 Node

本地数据流（local dataflow）：只考虑同一个函数（或同一个作用域）内部的数据如何流动。比如：函数体里 a = b; c = a;，都是本地数据流。本地数据流库位于模块 DataFlow 中，该模块定义了表示数据可以流经的任何元素的类 Node（表示“数据流图上一个节点”的抽象。一个节点可以表示“某个表达式的值”或“某个参数槽”。）。 Node 分为表达式节点 (ExprNode, IndirectExprNode) 和参数节点 (ParameterNode, IndirectParameterNode)。间接节点表示经过固定数量的指针解引用后的表达式或参数。Node 类提供一些成员谓词把它映射回源码层面的 Expr 或 Parameter。

class Node {
  /**
   * 如果这个 Node 对应一个表达式（例如变量读、字面量、调用表达式），返回该 Expr。否则为空/不匹配。
   */
  Expr asExpr() { ... }

  /**
   * 如果这个 Node 表示“对某个表达式解引用 index 次之后”的位置，则返回解引用前的 Expr。
   */
  Expr asIndirectExpr(int index) { ... }

  /**
   * 把 Node 映射回函数参数 Parameter（比如第 0 个参数 slot），或映射为解引用若干次后的参数槽（间接参数）。
   */
  Parameter asParameter() { ... }

  /**
   * the parameter `index` times.
   */
  Parameter asParameter(int index) { ... }

  ...
}

本地污点跟踪

本地污点追踪（local taint）在 local dataflow 基础上做两方面扩展：

包含非“值传递”的传播：数据“污点”不总是以“值从一个变量拷贝到另一个变量”的形式出现。某些函数使用值但不复制它们的值（non-value-preserving）：例如 malloc(i * sizeof(...))，这里 i 并没有“值传递”给 malloc 的返回值，但 i 的值影响了堆大小 —— 这是语义上“污点到用来分配大小的参数”的传播（对安全问题很重要，称作非值保留传播）。
扩展节点种类：localTaintStep 可能定义额外的“传播原语”，例如“函数参数作为长度参数会被认为是传播到内存分配行为”，或“某些API会把参数当作配置/长度/size/flags”等等。

可以按以下方式查找从参数 source 到表达式 sink 在零个或多个本地步骤中的污点传播，其中 nodeFrom 和 nodeTo 的类型为 DataFlow::Node

1
2
3

nodeFrom.asParameter() = source 
**and** nodeTo.asExpr() = sink 
**and** TaintTracking::localTaint(nodeFrom, nodeTo)

e.g. 命令注入

以通过 system 调用存在变量的查询为例：

import cpp

from FunctionCall fc
where fc.getTarget().getName().matches("system") and not fc.getArgument(0).isConstant()
select fc.getEnclosingFunction(), fc, fc.getArgument(0)

污点追踪

要进一步判断变量是否外部可控，需要使用 codeql 的污点跟踪功能，由 TaintTracking 模块提供。codeql 支持 local 和 global 两种模式，期中 local 的污点追踪只能追踪函数内的代码，而 global 则会在整个源码中进行追踪。以 local 模式为例，TaintTracking::localTaint(source, sink) 含义就是查找从 source 到 sink 的查询。如：用户可控输入的参数 userinput 为 source，系统调用执行变量的 system 函数是 sink 点。污点追踪可以查询 system(sink) 的变量由 userinput(source) 返回中控制的调用点。

但如果这一系列过程不是在一个函数内完成的，比如 system 调用（sink）或 get_user_input （source）的操作封装成了函数，这样使用 local 模式就会漏（无法追踪函数外的部分，且很容易漏掉新的 wrapper ），除非能把封装的函数也考虑在内，因此可以选用 global 模式进行查询，或者把对应封装的函数也包含进去。

污点传播的过程大概可以分为三个阶段：

定位source点，source点代表来源，一般是用户可控的输入作为污点。

定位sink点，sink点一般是敏感函数，比如sql查询等。

污点传播/回溯，当判断 source 点能传播到 sink 的时候，报告漏洞和对应传播路径。如果存在一些安全过滤，那么需要再加入 sanitizer/clean 点，来去除污点。

——by woodpeckerjs

简易代码实现：

import cpp
import semmle.code.cpp.dataflow.TaintTracking

class SystemCfg extend TaintTracking::Configuration {
	systemCfg() { this = "SystemCfg"}
	
	override predicate isSource(DtaFlow::Dode source){
		source.asExpr().(FunctionCall).getTarget().getName().matches("get_user_input")
	}
	
	override predicate isSink(DataFlow::Node sink) {
    exists(FunctionCall call |
      sink.asExpr() = call.getArgument(0) and
      call.getTarget().getName() = "system"
    )
  }
}

from DataFlow::PathNode sink, DataFlow::PathNode source, SystemCfg cfg
where cfg.hashFlowPath(source, sink)
select source, sink

继承 TaintTracking::Configuration 用来定义“一个污点分析的配置”，这个配置会告诉数据流引擎哪些节点是source，哪些是sink，以及其他选项。在下面覆写 isSource 和 isSink ，最后通过hashFlowPath 直接查询从 source 到 sink 的数据流 path。（它能处理跨函数调用、通过参数/返回、通过字段/成员、通过堆分配/加载/存储等传播，只要配置允许，CodeQL 的 C++ 数据流 extractor 已经支持大多数传播原语）

但是上述查询还有很多不完善的地方，如source和sink包含的范围；数据流过程中经过一些escape操作的情况需要过滤（sanitizers）；某些函数会把污点从参数传回返回值或从参数到参数（propagation）

较详细版本实现：

import cpp
import semmle.code.cpp.dataflow.TaintTracking

class SystemCfg extends TaintTracking::Configuration {
  SystemCfg() { this = "SystemCfg" }
  
  // 1) sources: 把常见的用户可控输入都当作 source
  override predicate isSource(DataFlow::Node source) {
    ···
  }
  
  // 2) sinks: system 及自家 wrapper
  override predicate isSink(DataFlow::Node sink) {
	  ···
	}
	
	// 3) sanitizers: 显式白名单/转义函数，若数据经过这些函数则认为被净化
	override predicate isSanitizer(DataFlow::Node node){
		// 例如对 shell 参数做转义的函数
    exists(FunctionCall f |
      node.asExpr() = f and
      f.getTarget().getName().matches("escape")
    )
    // 例如检查输入是否合法的校验函数（返回 bool 的验证函数）
    or exists(FunctionCall f |
      node.asExpr() = f and
      f.getTarget().getName().matches("whitelist")
    )
	}
	
	// 4) propagation: 显式告诉引擎某些库函数会把污点从某个实参传递到另一个实参或返回值。字符串库函数、容器方法、格式化函数都是常见传播点。
  ovrride predicate isPropagation(DataFlow::Node src, DataFlow::Node dst){
	  // e.g.1 直接字符串拷贝： strcpy(dst, src)
    exists(FunctionCall call |
      // src 对应 call 的第二个参数（strcpy(dst, src)）
      src.asExpr() = call.getArgument(1) and
      // dst 对应 call 的第一个参数
      dst.asExpr() = call.getArgument(0) and
      call.getTarget().getName() = "strcpy"
    )
    
    // e.g.2 C++ std::string 的 assign/append/replace（示例）
    or exists(MethodAccess ma |
      src.asExpr() = ma.getArgument(0) and
      dst.asExpr() = ma.getQualifier() and
      ma.getMethod().getName() = "append"
    )
    ···
  }
}

from DataFlow::PathNode src, DataFlow::PathNode sink, SystemCfg cfg
where cfg.hasFlowPath(src, sink)
select src, sink, sink.getNode(), sec.getNode()

教程