buildkitelogs

package module

v0.6.7 Latest Latest Go to latest Published: Feb 15, 2026 License: MIT Imports: 25 Imported by: 4

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/buildkite/buildkite-logs

Links

Open Source Insights

README ¶

Buildkite Logs Search & Query Library

A Go library for searching and querying Buildkite CI/CD logs with intelligent caching and high-performance data analytics. Includes CLI tools for testing and debugging log parsing.

Overview

This library provides a high-level client API for searching and querying Buildkite CI/CD logs with intelligent caching and fast data analytics. Unlike terminal-to-html which focuses on log display and rendering, this library is designed for log data analysis, search, and programmatic access.

The library automatically downloads logs from the Buildkite API, caches them locally as efficient Parquet files, and provides powerful search and query capabilities. It handles Buildkite's special OSC sequence format (\x1b_bk;t=timestamp\x07content) and converts logs into structured, searchable data.

Features

Primary: High-Level Client API

Intelligent Caching: Automatic download and caching of Buildkite logs with Time To Live (TTL) support
Fast Search & Query: Built-in search capabilities with regex patterns, filtering, and context
Buildkite API Integration: Direct fetching from Buildkite jobs via REST API with authentication
Parquet Storage: Efficient columnar storage for fast analytics and data processing using Apache Arrow.
Streaming Processing: Memory-efficient processing of logs of any size using Go iterators
Observability Hooks: Optional hooks for tracing and logging without framework coupling

Log Processing Engine

OSC Sequence Parsing: Correctly handles Buildkite's \x1b_bk;t=timestamp\x07content format
Group Tracking: Automatically associate entries with build sections (~~~, ---, +++)
Content Classification: Identifies commands, group headers, and regular output
ANSI Code Handling: Optional stripping of ANSI escape sequences for clean text output
Multiple Output Formats: Text, JSON, and Parquet export with filtering support

CLI Tools (Development & Debugging)

Parse Command: Convert logs to various formats for testing
Query Command: Fast querying of cached Parquet files
Debug Command: Troubleshoot OSC sequence parsing issues

Quick Start

For common use cases, the library provides a high-level Client API that simplifies downloading, caching, and querying Buildkite logs:

package main

import (
    "context"
    "fmt"
    "time"
    
    "github.com/buildkite/go-buildkite/v4"
    buildkitelogs "github.com/buildkite/buildkite-logs"
)

func main() {
    // Create buildkite client
    client, _ := buildkite.NewOpts(buildkite.WithTokenAuth("your-token"))
    
	ctx := context.Background()

    // Create high-level Client
    buildkiteLogsClient, err := buildkitelogs.NewClient(ctx, client, "file://~/.bklog")
    if err != nil {
        panic(err)
    }
    defer buildkiteLogsClient.Close()
        
    // Download, cache, and get a reader in one step
    reader, err := buildkiteLogsClient.NewReader(
        ctx, "myorg", "mypipeline", "123", "job-id",
        time.Minute*5, false, // TTL and force refresh
    )
    if err != nil {
        panic(err)
    }
    
    // Query the logs
    for entry, err := range reader.ReadEntriesIter() {
        if err != nil {
            panic(err)
        }
        fmt.Println(entry.Content)
    }
}

The Client provides:

Simplified API: Easy-to-use methods for common operations
Automatic caching: Intelligent caching with TTL support
Multiple backends: Support for both official *buildkite.Client and custom BuildkiteAPI implementations
Parameter validation: Built-in validation with descriptive error messages
Hooks System: Optional hooks for observability and tracing without coupling to specific frameworks

For detailed documentation, see docs/client-api.md. For a complete working example, see examples/high-level-client/.

CLI Tools (Development & Debugging)

Installation

Using Make (recommended):

# Build with tests and linting
make all

# Quick development build
make dev

# Build with specific version
make build VERSION=v1.2.3

# Other useful targets
make clean test lint help

Manual build:

make build

Build a snapshot with goreleaser:

goreleaser build --snapshot --clean --single-target

Check version:

./build/bklog version
# or
./build/bklog -v
# or  
./build/bklog --version

Examples

Local File Processing

Parse a log file with timestamps:

./build/bklog parse -file buildkite.log

Output only sections:

./build/bklog parse -file buildkite.log -filter section

Output only group headers:

./build/bklog parse -file buildkite.log -filter group

JSON output:

./build/bklog parse -file buildkite.log -json

Buildkite API Integration

Fetch logs directly from Buildkite API:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog parse -org myorg -pipeline mypipeline -build 123 -job abc-def-456

Export API logs to Parquet:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog parse -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -parquet logs.parquet -summary

Filter and export only sections from API:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog parse -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -filter section -json

Show processing statistics:

./build/bklog parse -file buildkite.log -summary

Output:

--- Processing Summary ---
Bytes processed: 24.4 KB
Total entries: 212
Entries with timestamps: 212

Sections: 13
Regular output: 184

Show group/section information:

./build/bklog parse -file buildkite.log -groups | head -5

Output:

[2025-04-22 21:43:29.921] [~~~ Running global environment hook] ~~~ Running global environment hook
[2025-04-22 21:43:29.921] [~~~ Running global environment hook] $ /buildkite/agent/hooks/environment
[2025-04-22 21:43:29.948] [~~~ Running global pre-checkout hook] ~~~ Running global pre-checkout hook
[2025-04-22 21:43:29.949] [~~~ Running global pre-checkout hook] $ /buildkite/agent/hooks/pre-checkout
[2025-04-22 21:43:29.975] [~~~ Preparing working directory] ~~~ Preparing working directory

Export to Parquet format:

./build/bklog parse -file buildkite.log -parquet output.parquet -summary

Output:

--- Processing Summary ---
Bytes processed: 24.4 KB
Total entries: 212
Entries with timestamps: 212

Sections: 13
Regular output: 184
Exported 212 entries to output.parquet

Export filtered data to Parquet:

./build/bklog parse -file buildkite.log -parquet sections.parquet -filter section -summary

This exports only section entries to a smaller Parquet file for analysis.

Querying Parquet Files

The CLI provides fast query operations on previously exported Parquet files:

List all groups with statistics:

./build/bklog query -file output.parquet -op list-groups

Output:

Groups found: 5

GROUP NAME                                ENTRIES          FIRST SEEN           LAST SEEN
------------------------------------------------------------------------------------------------------------
~~~ Running global environment hook             2        1 2025-04-22 21:43:29 2025-04-22 21:43:29
~~~ Running global pre-checkout hook            2        1 2025-04-22 21:43:29 2025-04-22 21:43:29
--- :package: Build job checkout dire...        2        1 2025-04-22 21:43:30 2025-04-22 21:43:30

--- Query Statistics ---
Total entries: 10
Matched entries: 10
Total groups: 5
Query time: 2.36 ms

Filter entries by group pattern:

./build/bklog query -file output.parquet -op by-group -group "environment"

Output:

Entries in group matching 'environment': 2

[2025-04-22 21:43:29.921] [GRP] ~~~ Running global environment hook
[2025-04-22 21:43:29.922] [CMD] $ /buildkite/agent/hooks/environment

--- Query Statistics ---
Total entries: 10
Matched entries: 2
Query time: 0.36 ms

Search entries using regex patterns:

./build/bklog query -file output.parquet -op search -pattern "git clone"

Output:

Matches found: 1

[2025-04-22 21:43:29.975] [~~~ Preparing working directory] MATCH: $ git clone -v -- https://github.com/buildkite/bash-example.git .

--- Search Statistics (Streaming) ---
Total entries: 212
Matches found: 1
Query time: 0.65 ms

Search with context lines (ripgrep-style):

./build/bklog query -file output.parquet -op search -pattern "error|failed" -C 3

Output:

Matches found: 2

[2025-04-22 21:43:30.690] [~~~ Running script] Running tests...
[2025-04-22 21:43:30.691] [~~~ Running script] Test suite started
[2025-04-22 21:43:30.692] [~~~ Running script] Running unit tests
[2025-04-22 21:43:30.693] [~~~ Running script] MATCH: Test failed: authentication error
[2025-04-22 21:43:30.694] [~~~ Running script] Cleaning up test files
[2025-04-22 21:43:30.695] [~~~ Running script] Test run completed
[2025-04-22 21:43:30.696] [~~~ Running script] Generating report
--
[2025-04-22 21:43:30.750] [~~~ Post-processing] Validating results
[2025-04-22 21:43:30.751] [~~~ Post-processing] Checking exit codes
[2025-04-22 21:43:30.752] [~~~ Post-processing] Build status: some tests failed
[2025-04-22 21:43:30.753] [~~~ Post-processing] MATCH: Build failed due to test failures
[2025-04-22 21:43:30.754] [~~~ Post-processing] Uploading logs
[2025-04-22 21:43:30.755] [~~~ Post-processing] Notifying team
[2025-04-22 21:43:30.756] [~~~ Post-processing] Cleanup completed

Search with separate before/after context:

./build/bklog query -file output.parquet -op search -pattern "npm install" -B 2 -A 5

Case-sensitive search:

./build/bklog query -file output.parquet -op search -pattern "ERROR" -case-sensitive

Invert match (show non-matching lines):

./build/bklog query -file output.parquet -op search -pattern "buildkite" -invert-match -limit 5

Reverse search (find recent errors first):

./build/bklog query -file output.parquet -op search -pattern "error|failed" -reverse -C 2

Reverse search from specific position:

./build/bklog query -file output.parquet -op search -pattern "test.*failed" -reverse -search-seek 1000

Search with JSON output:

./build/bklog query -file output.parquet -op search -pattern "git clone" -format json -C 1

JSON output for programmatic use:

./build/bklog query -file output.parquet -op list-groups -format json

Query without statistics:

./build/bklog query -file output.parquet -op list-groups -stats=false

Query last 20 entries:

./build/bklog query -file output.parquet -op tail -tail 20

Query specific row position:

./build/bklog query -file output.parquet -op seek -seek 100

Limit query results:

./build/bklog query -file output.parquet -op by-group -group "test" -limit 50

Get file information:

./build/bklog query -file output.parquet -op info

Dump all entries from the file:

./build/bklog query -file output.parquet -op dump

Dump with limited entries:

./build/bklog query -file output.parquet -op dump -limit 100

Dump all entries as JSON:

./build/bklog query -file output.parquet -op dump -format json

Dump entries with raw output (no timestamps/groups):

./build/bklog query -file output.parquet -op dump -raw

Dump entries with ANSI codes stripped:

./build/bklog query -file output.parquet -op dump -strip-ansi

Buildkite API Integration

The query command now supports direct API integration, automatically downloading and caching logs from Buildkite:

Query logs directly from Buildkite API:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op list-groups

Query specific group from API logs:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op by-group -group "tests"

Search API logs with regex patterns:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op search -pattern "error|failed" -C 2

Search API logs with case sensitivity:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op search -pattern "ERROR" -case-sensitive

Reverse search API logs (find recent failures):

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op search -pattern "test.*failed" -reverse -C 2

Query last 10 entries from API logs:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op tail -tail 10

Get file info for cached API logs:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op info

Dump all entries from API logs:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op dump

Query with custom cache TTL:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op info -cache-ttl=5m

Force refresh cached logs:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op list-groups -cache-force-refresh

Use custom cache location:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op info -cache-url=file:///tmp/bklogs

Logs are automatically downloaded and cached in ~/.bklog/ as {org}-{pipeline}-{build}-{job}.parquet files. Subsequent queries use the cached version unless the cache is manually cleared.

Debugging Parser Issues

The CLI includes a debug command for troubleshooting parser corruption issues, especially useful when investigating problems with OSC sequence parsing:

Debug parser behavior on specific lines:

./build/bklog debug -file buildkite.log -start 17 -limit 5 -verbose

Output:

=== Debug Mode: parse ===
File: buildkite.log
Lines: 17-21

--- Line 17 ---
Timestamp: 2025-07-01 09:20:41.629 +1000 AEST (Unix: 1751321141)
Content: "remote: Counting objects:   0% (1/287)K_bk;t=1751321141629remote: Counting objects:   1% (3/287)K..."
Group: ""
RawLine length: 6619
IsCommand: false
IsGroup: false

Show hex dump of corrupted lines:

./build/bklog debug -file buildkite.log -mode hex -start 17 -limit 1

Output:

=== Debug Mode: hex ===
File: buildkite.log
Lines: 17-17

--- Line 17 ---
Length: 6619 bytes
00000000  1b 5f 62 6b 3b 74 3d 31  37 35 31 33 32 31 31 34  |._bk;t=175132114|
00000010  31 36 32 39 07 72 65 6d  6f 74 65 3a 20 43 6f 75  |1629.remote: Cou|
00000020  6e 74 69 6e 67 20 6f 62  6a 65 63 74 73 3a 20 20  |nting objects:  |
00000030  20 30 25 20 28 31 2f 32  38 37 29 1b 5b 4b 1b 5f  | 0% (1/287).[K._|
00000040  62 6b 3b 74 3d 31 37 35  31 33 32 31 31 34 31 36  |bk;t=17513211416|

Show raw line content with line numbers:

./build/bklog debug -file buildkite.log -mode lines -start 100 -limit 3

Output:

=== Debug Mode: lines ===
File: buildkite.log
Lines: 100-102

--- Line 100 ---
Raw: "\x1b_bk;t=1751321141985\aremote: Total 2113 (delta 1830), reused 2113 (delta 1830), pack-reused 0\r"
Length: 98

--- Line 101 ---
Raw: "\x1b_bk;t=1751321142039\aReceiving objects: 100% (2113/2113), 630.45 KiB | 630.00 KiB/s, done.\r"
Length: 102

Debug with combined options:

./build/bklog debug -file buildkite.log -start 50 -end 55 -verbose -raw -hex

This will show verbose parse information, raw line content, and hex dump for lines 50-55.

Debug Command Options

./build/bklog debug [options]

Required:

-file <path>: Path to log file to debug (required)

Range Options:

-start <line>: Start line number (1-based, default: 1)
-end <line>: End line number (0 = start+limit or EOF, default: 0)
-limit <num>: Number of lines to process (default: 10)

Mode Options:

-mode <mode>: Debug mode: parse, hex, lines (default: parse)

Display Options:

-verbose: Show detailed parsing information (default: false)
-raw: Show raw line content (default: false)
-hex: Show hex dump of each line (default: false)
-parsed: Show parsed log entry (default: true)

Use Cases

Investigating Parser Corruption: The debug command is particularly useful for investigating issues where the parser only handles the first OSC sequence per line but ignores subsequent ones, causing content corruption.

Common Issues Debugged:

Multiple OSC sequences per line (e.g., progress updates)
Malformed OSC sequences missing proper terminators
ANSI escape sequences interfering with parsing
Timestamp extraction failures
Content/group association problems

Example Workflow:

# 1. Identify problematic lines in output
./build/bklog parse -file buildkite.log | grep -n "unexpected content"

# 2. Debug specific lines with verbose output
./build/bklog debug -file buildkite.log -start 142 -limit 1 -verbose

# 3. Examine raw bytes if needed
./build/bklog debug -file buildkite.log -start 142 -limit 1 -mode hex

# 4. Compare multiple lines to understand patterns
./build/bklog debug -file buildkite.log -start 140 -end 145 -raw

# 5. Extract all timestamps to CSV for analysis
./build/bklog debug -file buildkite.log -mode extract-timestamps -csv timestamps.csv

Extract all OSC timestamps to CSV:

./build/bklog debug -file buildkite.log -mode extract-timestamps -csv timestamps.csv

This extracts all OSC sequence timestamps from the log file into a CSV file with columns: line_number, osc_offset, timestamp_ms, timestamp_formatted.

Real Examples Using Test Data

The repository includes test data files that you can use to try out the tail functionality:

View last 5 entries from the test log:

./build/bklog query -file ./testdata/bash-example.parquet -op tail -tail 5

Output:

[2025-04-22 21:43:32.739] [CMD] $ echo 'Tests passed!'
[2025-04-22 21:43:32.740] Tests passed!
[2025-04-22 21:43:32.740] [GRP] +++ End of Example tests
[2025-04-22 21:43:32.740] [CMD] $ buildkite-agent annotate --style success 'Build passed'
[2025-04-22 21:43:32.748] Annotation added

View last 10 entries (default) with JSON output:

./build/bklog query -file ./testdata/bash-example.parquet -op tail -format json

Parse the raw log file and immediately query the last 3 entries:

# First create a fresh parquet file from the raw log
./build/bklog parse -file ./testdata/bash-example.log -parquet temp.parquet

# Then query the last 3 entries
./build/bklog query -file temp.parquet -op tail -tail 3

Combine with other operations - show file info then tail:

# Get file statistics
./build/bklog query -file ./testdata/bash-example.parquet -op info

# Then view the last few entries
./build/bklog query -file ./testdata/bash-example.parquet -op tail -tail 7

Dump all entries from the test file:

./build/bklog query -file ./testdata/bash-example.parquet -op dump

Dump first 10 entries as JSON:

./build/bklog query -file ./testdata/bash-example.parquet -op dump -limit 10 -format json

CLI Options

Parse Command

./build/bklog parse [options]

Local File Options:

-file <path>: Path to Buildkite log file (use this OR API parameters below)

Buildkite API Options:

-org <slug>: Buildkite organization slug (for API access)
-pipeline <slug>: Buildkite pipeline slug (for API access)
-build <number>: Buildkite build number or UUID (for API access)
-job <id>: Buildkite job ID (for API access)

Output Options:

-json: Output as JSON instead of text
-filter <type>: Filter entries by type (group, section)
-summary: Show processing summary at the end
-groups: Show group/section information for each entry
-parquet <path>: Export to Parquet file (e.g., output.parquet)
-jsonl <path>: Export to JSON Lines file (e.g., output.jsonl)

Query Command

./build/bklog query [options]

Data Source Options (choose one):

-file <path>: Path to Parquet log file (use this OR API parameters below)

Buildkite API Options:

-org <slug>: Buildkite organization slug (for API access)
-pipeline <slug>: Buildkite pipeline slug (for API access)
-build <number>: Buildkite build number or UUID (for API access)
-job <id>: Buildkite job ID (for API access)

Query Options:

-op <operation>: Query operation (list-groups, by-group, search, info, tail, seek, dump) (default: list-groups)
-group <pattern>: Group name pattern to filter by (for by-group operation)
-format <format>: Output format (text, json) (default: text)
-stats: Show query statistics (default: true)
-limit <number>: Limit number of entries returned (0 = no limit, enables early termination)
-tail <number>: Number of lines to show from end (for tail operation, default: 10)
-seek <row>: Row number to seek to (0-based, for seek operation)
-raw: Output raw log content without timestamps, groups, or other prefixes
-strip-ansi: Strip ANSI escape codes from log content

Search Options:

-pattern <regex>: Regex pattern to search for (for search operation)
-A <num>: Show NUM lines after each match (ripgrep-style)
-B <num>: Show NUM lines before each match (ripgrep-style)
-C <num>: Show NUM lines before and after each match (ripgrep-style)
-case-sensitive: Enable case-sensitive search (default: case-insensitive)
-invert-match: Show non-matching lines instead of matching ones
-reverse: Search backwards from end/seek position (useful for finding recent errors first)
-search-seek <row>: Start search from this row number (0-based, useful with -reverse)

Cache Options (API mode only):

-cache-ttl <duration>: Cache TTL for non-terminal jobs (default: 30s)
-cache-force-refresh: Force refresh cached entry (ignores cache)
-cache-url <url>: Cache storage URL (file://path, s3://bucket, etc., default: ~/.bklog)

Debug Command

./build/bklog debug [options]

Required:

-file <path>: Path to log file to debug (required)

Range Options:

-start <line>: Start line number (1-based, default: 1)
-end <line>: End line number (0 = start+limit or EOF, default: 0)
-limit <num>: Number of lines to process (default: 10)

Mode Options:

-mode <mode>: Debug mode: parse, hex, lines, extract-timestamps (default: parse)

Display Options:

-verbose: Show detailed parsing information (default: false)
-raw: Show raw line content (default: false)
-hex: Show hex dump of each line (default: false)
-parsed: Show parsed log entry (default: true)
-csv <path>: Output CSV file for extract-timestamps mode

Note: For API usage, set BUILDKITE_API_TOKEN environment variable. Logs are automatically downloaded and cached in ~/.bklog/.

Security: Keep your Buildkite API token secure. Never commit tokens to version control or expose them in logs. Use environment variables or secure secret management systems.

Log Entry Types

Commands

Lines that represent shell commands being executed:

[2025-04-22 21:43:29.975] $ git clone -v -- https://github.com/buildkite/bash-example.git .

Groups

Headers that mark different phases of the build (collapsible in Buildkite UI):

[2025-04-22 21:43:29.921] ~~~ Running global environment hook
[2025-04-22 21:43:30.694] --- :package: Build job checkout directory
[2025-04-22 21:43:30.699] +++ :hammer: Example tests

Build Groups and Sections

The parser automatically tracks which section or group each log entry belongs to:

[2025-04-22 21:43:29.921] [~~~ Running global environment hook] ~~~ Running global environment hook
[2025-04-22 21:43:29.921] [~~~ Running global environment hook] $ /buildkite/agent/hooks/environment
[2025-04-22 21:43:29.948] [~~~ Running global pre-checkout hook] ~~~ Running global pre-checkout hook

Each entry is automatically associated with the most recent group header (~~~, ---, or +++). This allows you to:

Group related log entries by build phase
Filter logs by group for focused analysis
Understand build structure and timing relationships
Export structured data with group context preserved

Parquet Export

The parser can export log entries to Apache Parquet format using the official Apache Arrow Go implementation for efficient storage and analysis. Parquet files can be directly queried by tools like DuckDB, Apache Spark, and Pandas for powerful log analytics:

Intelligent Caching System

The library uses a two-tier intelligent caching strategy that optimizes for both performance and data freshness:

flowchart TD
    A[Start: DownloadAndCache] --> B[Check blob storage cache]
    B --> C{Cache exists?}
    C -->|No| H[Download logs from API]
    C -->|Yes| D{Force refresh?}
    D -->|Yes| H
    D -->|No| E[Get job status]
    E --> F{Job is terminal?}
    F -->|Yes| G[Use cache immediately<br/>Terminal jobs never expire]
    F -->|No| I{Time elapsed < TTL?}
    I -->|Yes| J[Use cache<br/>Within TTL window]
    I -->|No| H
    H --> K[Parse logs to Parquet]
    K --> L[Store in blob storage with metadata]
    L --> M[Create local cache file]
    G --> N[Create local cache file]
    J --> N
    M --> O[Return local file path]
    N --> O

    classDef terminal fill:#1a472a,stroke:#4ade80,color:#ffffff
    classDef cache fill:#1e3a8a,stroke:#60a5fa,color:#ffffff
    classDef download fill:#7c2d12,stroke:#fb923c,color:#ffffff
    classDef decision fill:#374151,stroke:#9ca3af,color:#ffffff

    class G,F terminal
    class B,C,I,J,N cache
    class H,K,L,M download
    class D,E decision

Caching Strategy:

Terminal Jobs: Once a job completes, logs never change → cache forever (no TTL check)
Running Jobs: Logs may still be updated → respect TTL to ensure fresh data
Force Refresh: Override cache entirely for debugging or manual refresh scenarios

Benefits of Parquet Format

Columnar storage: Efficient compression and query performance
Schema preservation: Maintains data types and structure
Analytics ready: Compatible with Pandas, Apache Spark, DuckDB, and other data tools
Compact size: Typically 70-90% smaller than JSON for log data
Fast queries: Optimized for analytical workloads and filtering

Parquet Schema

The exported Parquet files contain the following columns:

Column	Type	Description
`timestamp`	int64	Unix timestamp in milliseconds since epoch
`content`	string	Log content after OSC sequence processing
`group`	string	Current build group/section name
`flags`	int32	Bitwise flags field (HasTimestamp=1, IsCommand=2, IsGroup=4)

Flags Field

The flags column uses bitwise operations to efficiently store multiple boolean properties:

Flag	Bit Position	Value	Description
`HasTimestamp`	0	1	Entry has a valid timestamp
`IsCommand`	1	2	Entry is a shell command
`IsGroup`	2	4	Entry is a group header

Usage Examples

Basic export:

./build/bklog -file buildkite.log -parquet output.parquet

Export with filtering:

./build/bklog -file buildkite.log -parquet commands.parquet -filter command

Export with streaming processing:

./build/bklog -file buildkite.log -parquet output.parquet -summary

This uses the modern iter.Seq2[*LogEntry, error] iterator pattern for memory-efficient processing.

API Reference

Types

type LogEntry struct {
    Timestamp time.Time  // Parsed timestamp (zero if no timestamp)
    Content   string     // Log content after OSC sequence
    RawLine   []byte     // Original raw log line as bytes
    Group     string     // Current section/group this entry belongs to
}

type Parser struct {
    // Internal regex patterns
}

Methods

Parser Methods

// Create a new parser
func NewParser() *Parser

// Parse a single log line
func (p *Parser) ParseLine(line string) (*LogEntry, error)

// Create iter.Seq2 iterator with proper error handling (streaming approach)
func (p *Parser) All(reader io.Reader) iter.Seq2[*LogEntry, error]

// Strip ANSI escape sequences
func (p *Parser) StripANSI(content string) string

LogEntry Methods

func (entry *LogEntry) HasTimestamp() bool
func (entry *LogEntry) CleanContent() string  // Content with ANSI stripped
func (entry *LogEntry) IsCommand() bool
func (entry *LogEntry) IsGroup() bool         // Check if entry is a group header (~~~, ---, +++)
func (entry *LogEntry) IsSection() bool       // Deprecated: use IsGroup() instead

Parquet Export Functions

// Export using iter.Seq2 streaming iterator
func ExportSeq2ToParquet(seq iter.Seq2[*LogEntry, error], filename string) error

// Export using iter.Seq2 with filtering
func ExportSeq2ToParquetWithFilter(seq iter.Seq2[*LogEntry, error], filename string, filterFunc func(*LogEntry) bool) error

// Create a new Parquet writer for streaming
func NewParquetWriter(file *os.File) *ParquetWriter

// Write a batch of entries to Parquet
func (pw *ParquetWriter) WriteBatch(entries []*LogEntry) error

// Close the Parquet writer
func (pw *ParquetWriter) Close() error

Parquet Query Functions

// Create a new Parquet reader
func NewParquetReader(filename string) *ParquetReader

// Stream entries from a Parquet file
func ReadParquetFileIter(filename string) iter.Seq2[ParquetLogEntry, error]

// Filter streaming entries by group pattern (case-insensitive)
func FilterByGroupIter(entries iter.Seq2[ParquetLogEntry, error], groupPattern string) iter.Seq2[ParquetLogEntry, error]

ParquetReader Methods

// Stream all log entries from the Parquet file
func (pr *ParquetReader) ReadEntriesIter() iter.Seq2[ParquetLogEntry, error]

// Stream entries filtered by group pattern
func (pr *ParquetReader) FilterByGroupIter(groupPattern string) iter.Seq2[ParquetLogEntry, error]

Query Result Types

type ParquetLogEntry struct {
    Timestamp   int64    `json:"timestamp"`    // Unix timestamp in milliseconds
    Content     string   `json:"content"`      // Log content
    Group       string   `json:"group"`        // Associated group/section
    Flags       LogFlags `json:"flags"`        // Bitwise flags (HasTimestamp=1, IsCommand=2, IsGroup=4)
}

// Backward-compatible methods
func (entry *ParquetLogEntry) HasTime() bool      // Returns Flags.HasTimestamp()
func (entry *ParquetLogEntry) IsCommand() bool    // Returns Flags.IsCommand()
func (entry *ParquetLogEntry) IsGroup() bool      // Returns Flags.IsGroup()

type LogFlags int32

// Bitwise flag operations
func (lf LogFlags) Has(flag LogFlag) bool         // Check if flag is set
func (lf *LogFlags) Set(flag LogFlag)             // Set flag
func (lf *LogFlags) Clear(flag LogFlag)           // Clear flag
func (lf *LogFlags) Toggle(flag LogFlag)          // Toggle flag

// Convenience methods
func (lf LogFlags) HasTimestamp() bool            // Check HasTimestamp flag
func (lf LogFlags) IsCommand() bool               // Check IsCommand flag  
func (lf LogFlags) IsGroup() bool                 // Check IsGroup flag

type GroupInfo struct {
    Name       string    `json:"name"`          // Group/section name
    EntryCount int       `json:"entry_count"`   // Number of entries in group
    FirstSeen  time.Time `json:"first_seen"`    // Timestamp of first entry
    LastSeen   time.Time `json:"last_seen"`     // Timestamp of last entry
    Commands   int       `json:"commands"`      // Number of command entries

}

Performance

Benchmarks

The parser includes comprehensive benchmarks to measure performance. Run them with:

go test -bench=. -benchmem

Key Results (Apple M3 Pro)

Single Line Parsing (Byte-based):

OSC sequence with timestamp: ~64 ns/op, 192 B/op, 3 allocs/op
Regular line (no timestamp): ~29 ns/op, 128 B/op, 2 allocs/op
ANSI-heavy line: ~68 ns/op, 224 B/op, 3 allocs/op

Memory Usage (10,000 lines):

Seq2 Streaming Iterator: ~3.5 MB allocated, 64,006 allocations
Constant memory footprint regardless of file size

Streaming Throughput:

100 lines: ~51,000 ops/sec
1,000 lines: ~5,200 ops/sec
10,000 lines: ~510 ops/sec
100,000 lines: ~54 ops/sec

ANSI Stripping: ~7.7M ops/sec, 160 B/op, 2 allocs/op

Parquet Export Performance (1,000 lines, Apache Arrow):

Seq2 streaming export: ~1,100 ops/sec, 1.2 MB allocated

Content Classification Performance (1,000 entries):

IsCommand(): ~15,000 ops/sec, 84 KB allocated
IsGroup(): ~14,000 ops/sec, 84 KB allocated
CleanContent(): ~15,000 ops/sec, 84 KB allocated

Parquet Streaming Query Performance (Apache Arrow Go v18):

ReadEntriesIter: Constant memory usage, ~5,700 entries/sec
FilterByGroupIter: Early termination support, ~5,700 entries/sec
Memory-efficient: Processes files of any size with constant memory footprint

Streaming Query Scalability:

Constant memory usage regardless of file size
Early termination support for partial processing
Linear processing time scales with data size
No memory allocation growth for large files

Performance Improvements

Byte-based Parser vs Regex:

10x faster OSC sequence parsing (~46ns vs ~477ns)
10x faster ANSI stripping (~127ns vs ~1311ns)
Fewer allocations (2 vs 5 for ANSI stripping)
Better memory efficiency for complex lines

Streaming Memory Efficiency:

Constant memory footprint regardless of file size
True streaming processing for files of any size
Early termination capability with immediate resource cleanup
Memory-safe processing of multi-gigabyte files

Testing

Run the test suite:

go test -v

Run benchmarks:

go test -bench=. -benchmem

The tests cover:

OSC sequence parsing
Timestamp extraction
ANSI code stripping
Content classification
Stream processing
Iterator functionality
Memory usage patterns

Acknowledgments

This library was developed with assistance from Claude (Anthropic) for parsing, query functionality, and performance optimization.

License

This project is licensed under the MIT License.

Documentation ¶

Index ¶

func ExportSeq2ToParquet(seq iter.Seq2[*LogEntry, error], filename string) error
func ExportSeq2ToParquetWithFilter(seq iter.Seq2[*LogEntry, error], filename string, ...) error
func FilterByGroupIter(entries iter.Seq2[ParquetLogEntry, error], groupPattern string) iter.Seq2[ParquetLogEntry, error]
func GenerateBlobKey(org, pipeline, build, job string) string
func GetDefaultStorageURL(storageURL string, noTempDir bool) (string, error)
func GetRuntimeInfo() map[string]string
func IsContainerizedEnvironment() bool
func IsTerminalState(state JobState) bool
func ReadParquetFileIter(filename string) iter.Seq2[ParquetLogEntry, error]
func StripANSI(s string) string
func StripANSIRegex(s string) string
func ValidateAPIParams(org, pipeline, build, job string) error
type AfterBlobStorageFunc
type AfterCacheCheckFunc
type AfterJobStatusFunc
type AfterLocalCacheFunc
type AfterLogDownloadFunc
type AfterLogParsingFunc
type BaseResult
type BlobMetadata
type BlobStorage
- func NewBlobStorage(ctx context.Context, storageURL string, opts *BlobStorageOptions) (*BlobStorage, error)
- func (bs *BlobStorage) Close() error
- func (bs *BlobStorage) Delete(ctx context.Context, key string) error
- func (bs *BlobStorage) Exists(ctx context.Context, key string) (bool, error)
- func (bs *BlobStorage) GetModTime(ctx context.Context, key string) (time.Time, error)
- func (bs *BlobStorage) ReadWithMetadata(ctx context.Context, key string) (*BlobMetadata, error)
- func (bs *BlobStorage) Reader(ctx context.Context, key string) (io.ReadCloser, error)
- func (bs *BlobStorage) WriteWithMetadata(ctx context.Context, key string, data []byte, metadata *BlobMetadata) error
type BlobStorageOptions
type BlobStorageResult
type BuildkiteAPI
type BuildkiteAPIClient
- func NewBuildkiteAPIClient(apiToken, version string) *BuildkiteAPIClient
- func NewBuildkiteAPIExistingClient(client *buildkite.Client) *BuildkiteAPIClient
- func (c *BuildkiteAPIClient) GetJobLog(ctx context.Context, org, pipeline, build, job string) (io.ReadCloser, error)
- func (c *BuildkiteAPIClient) GetJobStatus(ctx context.Context, org, pipeline, build, jobID string) (*JobStatus, error)
type ByteParser
- func NewByteParser() *ByteParser
- func (p *ByteParser) ParseLine(line string) (*LogEntry, error)
type CacheCheckResult
type Client
- func NewClient(ctx context.Context, client *buildkite.Client, storageURL string) (*Client, error)
- func NewClientWithAPI(ctx context.Context, api BuildkiteAPI, storageURL string) (*Client, error)
- func (c *Client) Close() error
- func (c *Client) DownloadAndCache(ctx context.Context, org, pipeline, build, job string, ttl time.Duration, ...) (string, error)
- func (c *Client) Hooks() *Hooks
- func (c *Client) NewReader(ctx context.Context, org, pipeline, build, job string, ttl time.Duration, ...) (*ParquetReader, error)
type GroupInfo
type Hooks
- func (h *Hooks) AddAfterBlobStorage(hook AfterBlobStorageFunc)
- func (h *Hooks) AddAfterCacheCheck(hook AfterCacheCheckFunc)
- func (h *Hooks) AddAfterJobStatus(hook AfterJobStatusFunc)
- func (h *Hooks) AddAfterLocalCache(hook AfterLocalCacheFunc)
- func (h *Hooks) AddAfterLogDownload(hook AfterLogDownloadFunc)
- func (h *Hooks) AddAfterLogParsing(hook AfterLogParsingFunc)
type JobState
type JobStatus
- func (js *JobStatus) ShouldRefreshCache(cacheTime time.Time, ttl time.Duration) bool
type JobStatusProvider
type JobStatusResult
type LocalCacheResult
type LogDownloadResult
type LogEntry
- func (entry *LogEntry) ComputeFlags() LogFlags
- func (entry *LogEntry) HasTimestamp() bool
- func (entry *LogEntry) IsGroup() bool
- func (entry *LogEntry) IsSection() booldeprecated
type LogFlag
type LogFlags
- func (lf *LogFlags) Clear(flag LogFlag)
- func (lf LogFlags) Has(flag LogFlag) bool
- func (lf LogFlags) HasTimestamp() bool
- func (lf LogFlags) IsGroup() bool
- func (lf *LogFlags) Set(flag LogFlag)
- func (lf *LogFlags) Toggle(flag LogFlag)
type LogIteratordeprecated
- func (li *LogIterator) Entry() *LogEntry
- func (li *LogIterator) Err() error
- func (li *LogIterator) Next() bool
type LogParsingResult
type LogProvider
type ParquetFileInfo
type ParquetLogEntry
- func (entry *ParquetLogEntry) CleanContent(stripANSI bool) string
- func (entry *ParquetLogEntry) CleanGroup(stripANSI bool) string
- func (entry *ParquetLogEntry) HasTime() bool
- func (entry *ParquetLogEntry) IsGroup() bool
type ParquetReader
- func NewParquetReader(filename string) *ParquetReader
- func (pr *ParquetReader) FilterByGroupIter(groupPattern string) iter.Seq2[ParquetLogEntry, error]
- func (pr *ParquetReader) GetFileInfo() (*ParquetFileInfo, error)
- func (pr *ParquetReader) ReadEntriesIter() iter.Seq2[ParquetLogEntry, error]
- func (pr *ParquetReader) SearchEntriesIter(options SearchOptions) iter.Seq2[SearchResult, error]
- func (pr *ParquetReader) SeekToRow(startRow int64) iter.Seq2[ParquetLogEntry, error]
type ParquetWriter
- func NewParquetWriter(file *os.File) (*ParquetWriter, error)
- func (pw *ParquetWriter) Close() error
- func (pw *ParquetWriter) WriteBatch(entries []*LogEntry) error
type Parser
- func NewParser() *Parser
- func (p *Parser) All(reader io.Reader) iter.Seq2[*LogEntry, error]
- func (p *Parser) NewIterator(reader io.Reader) *LogIteratordeprecated
- func (p *Parser) ParseLine(line string) (*LogEntry, error)
- func (p *Parser) Reset()deprecated
type QueryResult
type QueryStats
type SearchOptions
type SearchResult

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ExportSeq2ToParquet ¶

func ExportSeq2ToParquet(seq iter.Seq2[*LogEntry, error], filename string) error

ExportSeq2ToParquet exports log entries using Go 1.23+ iter.Seq2 for efficient iteration

func ExportSeq2ToParquetWithFilter ¶

func ExportSeq2ToParquetWithFilter(seq iter.Seq2[*LogEntry, error], filename string, filterFunc func(*LogEntry) bool) error

ExportSeq2ToParquetWithFilter exports filtered log entries using iter.Seq2

func FilterByGroupIter ¶

func FilterByGroupIter(entries iter.Seq2[ParquetLogEntry, error], groupPattern string) iter.Seq2[ParquetLogEntry, error]

FilterByGroupIter returns an iterator over entries that belong to groups matching the specified pattern

func GenerateBlobKey ¶

func GenerateBlobKey(org, pipeline, build, job string) string

GenerateBlobKey creates a consistent key for blob storage

func GetDefaultStorageURL ¶

func GetDefaultStorageURL(storageURL string, noTempDir bool) (string, error)

GetDefaultStorageURL returns the default storage URL based on environment

If noTempDir is true, the returned file:// URL will include the no_tmp_dir parameter, which causes gocloud.dev/blob/fileblob to create temporary files in the same directory as the final destination, avoiding cross-filesystem rename errors.

This function applies the noTempDir setting to both user-provided and default URLs.

func GetRuntimeInfo ¶

func GetRuntimeInfo() map[string]string

GetRuntimeInfo returns information about the current runtime environment

func IsContainerizedEnvironment ¶

func IsContainerizedEnvironment() bool

IsContainerizedEnvironment detects if we're running in a container

func IsTerminalState ¶

func IsTerminalState(state JobState) bool

IsTerminalState returns true if the given job state is terminal

func ReadParquetFileIter ¶

func ReadParquetFileIter(filename string) iter.Seq2[ParquetLogEntry, error]

ReadParquetFileIter is a convenience function to get an iterator over entries from a Parquet file

func StripANSI ¶

func StripANSI(s string) string

StripANSI removes ANSI escape sequences using strings.Builder for efficiency

func StripANSIRegex ¶

func StripANSIRegex(s string) string

StripANSIRegex removes ANSI escape sequences from a string using regex

func ValidateAPIParams ¶

func ValidateAPIParams(org, pipeline, build, job string) error

ValidateAPIParams validates that all required API parameters are provided

Types ¶

type AfterBlobStorageFunc ¶

type AfterBlobStorageFunc func(ctx context.Context, result *BlobStorageResult)

type AfterCacheCheckFunc ¶

type AfterCacheCheckFunc func(ctx context.Context, result *CacheCheckResult)

Hook function types for different stages of downloadAndCacheWithBlobStorage

type AfterJobStatusFunc ¶

type AfterJobStatusFunc func(ctx context.Context, result *JobStatusResult)

type AfterLocalCacheFunc ¶

type AfterLocalCacheFunc func(ctx context.Context, result *LocalCacheResult)

type AfterLogDownloadFunc ¶

type AfterLogDownloadFunc func(ctx context.Context, result *LogDownloadResult)

type AfterLogParsingFunc ¶

type AfterLogParsingFunc func(ctx context.Context, result *LogParsingResult)

type BaseResult ¶

type BaseResult struct {
	Org, Pipeline, Build, Job string
	Duration                  time.Duration
}

BaseResult contains common fields for all hook results

type BlobMetadata ¶

type BlobMetadata struct {
	JobID        string    `json:"job_id"`
	JobState     string    `json:"job_state"`
	IsTerminal   bool      `json:"is_terminal"`
	CachedAt     time.Time `json:"cached_at"`
	TTL          string    `json:"ttl"` // duration string like "30s"
	Organization string    `json:"organization"`
	Pipeline     string    `json:"pipeline"`
	Build        string    `json:"build"`
}

BlobMetadata contains metadata for cached blobs

type BlobStorage ¶

type BlobStorage struct {
	// contains filtered or unexported fields
}

BlobStorage provides an abstraction over blob storage backends

func NewBlobStorage ¶

func NewBlobStorage(ctx context.Context, storageURL string, opts *BlobStorageOptions) (*BlobStorage, error)

NewBlobStorage creates a new blob storage instance from a storage URL Supports file:// URLs for local filesystem storage

The opts parameter allows configuring blob storage behavior. Pass nil to use default options.

func (*BlobStorage) Close ¶

func (bs *BlobStorage) Close() error

Close closes the blob storage connection

func (*BlobStorage) Delete ¶

func (bs *BlobStorage) Delete(ctx context.Context, key string) error

Delete removes a blob from storage

func (*BlobStorage) Exists ¶

func (bs *BlobStorage) Exists(ctx context.Context, key string) (bool, error)

Exists checks if a blob exists in storage

func (*BlobStorage) GetModTime ¶

func (bs *BlobStorage) GetModTime(ctx context.Context, key string) (time.Time, error)

GetModTime returns the modification time of a blob

func (*BlobStorage) ReadWithMetadata ¶

func (bs *BlobStorage) ReadWithMetadata(ctx context.Context, key string) (*BlobMetadata, error)

ReadWithMetadata reads data from blob storage with metadata

func (*BlobStorage) Reader ¶ added in v0.6.1

func (bs *BlobStorage) Reader(ctx context.Context, key string) (io.ReadCloser, error)

Reader returns an io.ReadCloser for streaming blob data from the specified key. The caller is responsible for closing the returned reader when done.

func (*BlobStorage) WriteWithMetadata ¶

func (bs *BlobStorage) WriteWithMetadata(ctx context.Context, key string, data []byte, metadata *BlobMetadata) error

WriteWithMetadata writes data to blob storage with metadata

type BlobStorageOptions ¶ added in v0.6.3

type BlobStorageOptions struct {
	// NoTempDir controls whether to use the no_tmp_dir URL parameter for file:// URLs.
	// When true, temporary files are created in the same directory as the final destination,
	// avoiding cross-filesystem rename errors. This may result in stranded .tmp files if
	// the process crashes before cleanup runs.
	//
	// When false (default), temporary files are created in os.TempDir(), which may cause
	// "invalid cross-device link" errors if the temp directory is on a different filesystem
	// than the storage directory.
	NoTempDir bool
}

BlobStorageOptions contains configuration options for blob storage

type BlobStorageResult ¶

type BlobStorageResult struct {
	BaseResult
	BlobKey    string
	DataSize   int64
	IsTerminal bool
	TTL        time.Duration
}

BlobStorageResult contains the result of storing data in blob storage

type BuildkiteAPI ¶

type BuildkiteAPI interface {
	JobStatusProvider
	LogProvider
}

BuildkiteAPI combines both job status and log providers

type BuildkiteAPIClient ¶

type BuildkiteAPIClient struct {
	// contains filtered or unexported fields
}

BuildkiteAPIClient provides methods to interact with the Buildkite API Now wraps the official go-buildkite v4 client

func NewBuildkiteAPIClient ¶

func NewBuildkiteAPIClient(apiToken, version string) *BuildkiteAPIClient

NewBuildkiteAPIClient creates a new Buildkite API client using go-buildkite v4

func NewBuildkiteAPIExistingClient ¶

func NewBuildkiteAPIExistingClient(client *buildkite.Client) *BuildkiteAPIClient

NewBuildkiteAPI creates a new Buildkite API client using the provided go-buildkite client

func (*BuildkiteAPIClient) GetJobLog ¶

func (c *BuildkiteAPIClient) GetJobLog(ctx context.Context, org, pipeline, build, job string) (io.ReadCloser, error)

GetJobLog fetches the log output for a specific job using go-buildkite org: organization slug pipeline: pipeline slug build: build number or UUID job: job ID

func (*BuildkiteAPIClient) GetJobStatus ¶

func (c *BuildkiteAPIClient) GetJobStatus(ctx context.Context, org, pipeline, build, jobID string) (*JobStatus, error)

GetJobStatus gets the current status of a job

type ByteParser ¶

type ByteParser struct{}

ByteParser handles byte-level parsing of Buildkite log files

func NewByteParser ¶

func NewByteParser() *ByteParser

NewByteParser creates a new byte-based parser

func (*ByteParser) ParseLine ¶

func (p *ByteParser) ParseLine(line string) (*LogEntry, error)

ParseLine parses a single log line using byte scanning

type CacheCheckResult ¶

type CacheCheckResult struct {
	BaseResult
	BlobKey string
	Exists  bool
}

CacheCheckResult contains the result of checking blob storage cache

type Client ¶

type Client struct {
	// contains filtered or unexported fields
}

Client provides a high-level convenience API for common buildkite-logs-parquet operations

func NewClient ¶

func NewClient(ctx context.Context, client *buildkite.Client, storageURL string) (*Client, error)

NewClient creates a new Client using the provided go-buildkite client

func NewClientWithAPI ¶

func NewClientWithAPI(ctx context.Context, api BuildkiteAPI, storageURL string) (*Client, error)

NewClientWithAPI creates a new Client using a custom BuildkiteAPI implementation

func (*Client) Close ¶

func (c *Client) Close() error

Close closes the underlying blob storage connection

func (*Client) DownloadAndCache ¶

func (c *Client) DownloadAndCache(ctx context.Context, org, pipeline, build, job string, ttl time.Duration, forceRefresh bool) (string, error)

DownloadAndCache downloads and caches job logs as Parquet format, returning the local file path

Parameters:

org: Buildkite organization slug
pipeline: Pipeline slug
build: Build number or UUID
job: Job ID
ttl: Time-to-live for cache (use 0 for default 30s)
forceRefresh: If true, forces re-download even if cache exists

Returns the local file path of the cached Parquet file

func (*Client) Hooks ¶

func (c *Client) Hooks() *Hooks

Hooks returns the hooks instance for registering callback functions

func (*Client) NewReader ¶

func (c *Client) NewReader(ctx context.Context, org, pipeline, build, job string, ttl time.Duration, forceRefresh bool) (*ParquetReader, error)

NewReader downloads and caches job logs (if needed) and returns a ParquetReader for querying

Parameters:

org: Buildkite organization slug
pipeline: Pipeline slug
build: Build number or UUID
job: Job ID
ttl: Time-to-live for cache (use 0 for default 30s)
forceRefresh: If true, forces re-download even if cache exists

Returns a ParquetReader for querying the log data

type GroupInfo ¶

type GroupInfo struct {
	Name       string    `json:"name"`
	EntryCount int       `json:"entry_count"`
	FirstSeen  time.Time `json:"first_seen"`
	LastSeen   time.Time `json:"last_seen"`
}

GroupInfo contains statistical information about a log group

type Hooks ¶

type Hooks struct {
	OnAfterCacheCheck  []AfterCacheCheckFunc
	OnAfterJobStatus   []AfterJobStatusFunc
	OnAfterLogDownload []AfterLogDownloadFunc
	OnAfterLogParsing  []AfterLogParsingFunc
	OnAfterBlobStorage []AfterBlobStorageFunc
	OnAfterLocalCache  []AfterLocalCacheFunc
}

Hooks contains all registered hook functions

func (*Hooks) AddAfterBlobStorage ¶

func (h *Hooks) AddAfterBlobStorage(hook AfterBlobStorageFunc)

func (*Hooks) AddAfterCacheCheck ¶

func (h *Hooks) AddAfterCacheCheck(hook AfterCacheCheckFunc)

Hook registration methods

func (*Hooks) AddAfterJobStatus ¶

func (h *Hooks) AddAfterJobStatus(hook AfterJobStatusFunc)

func (*Hooks) AddAfterLocalCache ¶

func (h *Hooks) AddAfterLocalCache(hook AfterLocalCacheFunc)

func (*Hooks) AddAfterLogDownload ¶

func (h *Hooks) AddAfterLogDownload(hook AfterLogDownloadFunc)

func (*Hooks) AddAfterLogParsing ¶

func (h *Hooks) AddAfterLogParsing(hook AfterLogParsingFunc)

type JobState ¶

type JobState string

JobState represents the possible states of a Buildkite job

const (
	JobStateFinished JobState = "finished"  // Job completed (passed or failed)
	JobStatePassed   JobState = "passed"    // Job completed successfully
	JobStateFailed   JobState = "failed"    // Job completed with failure
	JobStateCanceled JobState = "canceled"  // Job was canceled
	JobStateExpired  JobState = "expired"   // Job expired before being picked up
	JobStateTimedOut JobState = "timed_out" // Job timed out during execution
	JobStateSkipped  JobState = "skipped"   // Job was skipped
	JobStateBroken   JobState = "broken"    // Job configuration is broken
)

Terminal job states - jobs in these states will not change

const (
	JobStatePending         JobState = "pending"          // Job is pending
	JobStateWaiting         JobState = "waiting"          // Job is waiting
	JobStateWaitingFailed   JobState = "waiting_failed"   // Job waiting failed
	JobStateBlocked         JobState = "blocked"          // Job is blocked
	JobStateBlockedFailed   JobState = "blocked_failed"   // Job blocked failed
	JobStateUnblocked       JobState = "unblocked"        // Job is unblocked
	JobStateUnblockedFailed JobState = "unblocked_failed" // Job unblocked failed
	JobStateLimiting        JobState = "limiting"         // Job is limiting
	JobStateLimited         JobState = "limited"          // Job is limited
	JobStateScheduled       JobState = "scheduled"        // Job is scheduled
	JobStateAssigned        JobState = "assigned"         // Job is assigned
	JobStateAccepted        JobState = "accepted"         // Job is accepted
	JobStateRunning         JobState = "running"          // Job is currently running
	JobStateCanceling       JobState = "canceling"        // Job is being canceled
	JobStateTimingOut       JobState = "timing_out"       // Job is timing out
)

Non-terminal job states - jobs in these states may still change

type JobStatus ¶

type JobStatus struct {
	ID         string     `json:"id"`
	State      JobState   `json:"state"`
	IsTerminal bool       `json:"is_terminal"`
	WebURL     string     `json:"web_url,omitempty"`
	ExitStatus *int       `json:"exit_status,omitempty"`
	FinishedAt *time.Time `json:"finished_at,omitempty"`
}

JobStatus contains information about a Buildkite job's current status

func (*JobStatus) ShouldRefreshCache ¶

func (js *JobStatus) ShouldRefreshCache(cacheTime time.Time, ttl time.Duration) bool

ShouldRefreshCache determines if a cached entry should be refreshed based on job status and TTL

type JobStatusProvider ¶

type JobStatusProvider interface {
	GetJobStatus(ctx context.Context, org, pipeline, build, job string) (*JobStatus, error)
}

JobStatusProvider defines the interface for getting job status

type JobStatusResult ¶

type JobStatusResult struct {
	BaseResult
	JobStatus *JobStatus
}

JobStatusResult contains the result of fetching job status

type LocalCacheResult ¶

type LocalCacheResult struct {
	BaseResult
	LocalPath string
	FileSize  int64
}

LocalCacheResult contains the result of creating local cache file

type LogDownloadResult ¶

type LogDownloadResult struct {
	BaseResult
	LogSize int64 // Size of downloaded logs in bytes
}

LogDownloadResult contains the result of downloading logs from API

type LogEntry ¶

type LogEntry struct {
	Timestamp time.Time
	Content   string // Parsed content after OSC processing, may still contain ANSI codes
	RawLine   []byte // Original line bytes including all OSC sequences and formatting
	Group     string // The current section/group this entry belongs to
}

LogEntry represents a parsed Buildkite log entry

func (*LogEntry) ComputeFlags ¶

func (entry *LogEntry) ComputeFlags() LogFlags

ComputeFlags returns the consolidated flags for this log entry

func (*LogEntry) HasTimestamp ¶

func (entry *LogEntry) HasTimestamp() bool

HasTimestamp returns true if the log entry has a valid timestamp

func (*LogEntry) IsGroup ¶

func (entry *LogEntry) IsGroup() bool

IsGroup returns true if the log entry appears to be a group header

func (*LogEntry) IsSection deprecated

func (entry *LogEntry) IsSection() bool

Deprecated: IsSection is an alias for IsGroup. Use IsGroup instead.

type LogFlag ¶

type LogFlag int32

const (
	HasTimestamp LogFlag = iota
	IsGroup
)

type LogFlags ¶

type LogFlags int32

LogFlags represents a bitwise combination of log flags

func (*LogFlags) Clear ¶

func (lf *LogFlags) Clear(flag LogFlag)

Clear clears the specified flag

func (LogFlags) Has ¶

func (lf LogFlags) Has(flag LogFlag) bool

Has returns true if the specified flag is set

func (LogFlags) HasTimestamp ¶

func (lf LogFlags) HasTimestamp() bool

HasTimestamp returns true if HasTimestamp flag is set

func (LogFlags) IsGroup ¶

func (lf LogFlags) IsGroup() bool

IsGroup returns true if IsGroup flag is set

func (*LogFlags) Set ¶

func (lf *LogFlags) Set(flag LogFlag)

Set sets the specified flag

func (*LogFlags) Toggle ¶

func (lf *LogFlags) Toggle(flag LogFlag)

Toggle toggles the specified flag

type LogIterator deprecated

type LogIterator struct {
	// contains filtered or unexported fields
}

LogIterator provides an iterator interface for processing log entries.

Deprecated: Use Parser.All() which returns an iter.Seq2 instead.

func (*LogIterator) Entry ¶

func (li *LogIterator) Entry() *LogEntry

Entry returns the current log entry Only valid after a successful call to Next()

func (*LogIterator) Err ¶

func (li *LogIterator) Err() error

Err returns any error encountered during iteration

func (*LogIterator) Next ¶

func (li *LogIterator) Next() bool

Next advances the iterator to the next log entry Returns true if there is a next entry, false if EOF or error

type LogParsingResult ¶

type LogParsingResult struct {
	BaseResult
	ParquetSize int64 // Size of generated Parquet data in bytes
	LogEntries  int   // Number of log entries processed
}

LogParsingResult contains the result of parsing logs to Parquet

type LogProvider ¶

type LogProvider interface {
	GetJobLog(ctx context.Context, org, pipeline, build, job string) (io.ReadCloser, error)
}

LogProvider defines the interface for getting job logs

type ParquetFileInfo ¶

type ParquetFileInfo struct {
	RowCount     int64 `json:"row_count"`
	ColumnCount  int   `json:"column_count"`
	FileSize     int64 `json:"file_size_bytes"`
	NumRowGroups int   `json:"num_row_groups"`
}

ParquetFileInfo contains metadata about a Parquet file

type ParquetLogEntry ¶

type ParquetLogEntry struct {
	RowNumber int64    `json:"row_number"` // 0-based row position in the Parquet file
	Timestamp int64    `json:"timestamp"`
	Content   string   `json:"content"`
	Group     string   `json:"group"`
	Flags     LogFlags `json:"flags"`
}

ParquetLogEntry represents a log entry read from a Parquet file

func (*ParquetLogEntry) CleanContent ¶

func (entry *ParquetLogEntry) CleanContent(stripANSI bool) string

CleanContent returns the content with optional ANSI stripping and whitespace trimming

func (*ParquetLogEntry) CleanGroup ¶

func (entry *ParquetLogEntry) CleanGroup(stripANSI bool) string

CleanGroup returns the group name with optional ANSI stripping and whitespace trimming

func (*ParquetLogEntry) HasTime ¶

func (entry *ParquetLogEntry) HasTime() bool

HasTime returns true if the entry has a timestamp (backward compatibility)

func (*ParquetLogEntry) IsGroup ¶

func (entry *ParquetLogEntry) IsGroup() bool

IsGroup returns true if the entry is a group header (backward compatibility)

type ParquetReader ¶

type ParquetReader struct {
	// contains filtered or unexported fields
}

ParquetReader provides functionality to read and query Parquet log files

func NewParquetReader ¶

func NewParquetReader(filename string) *ParquetReader

NewParquetReader creates a new ParquetReader for the specified file

func (*ParquetReader) FilterByGroupIter ¶

func (pr *ParquetReader) FilterByGroupIter(groupPattern string) iter.Seq2[ParquetLogEntry, error]

FilterByGroupIter returns an iterator over entries that belong to groups matching the specified name pattern

func (*ParquetReader) GetFileInfo ¶

func (pr *ParquetReader) GetFileInfo() (*ParquetFileInfo, error)

GetFileInfo returns metadata about the Parquet file

func (*ParquetReader) ReadEntriesIter ¶

func (pr *ParquetReader) ReadEntriesIter() iter.Seq2[ParquetLogEntry, error]

ReadEntriesIter returns an iterator over log entries from the Parquet file

func (*ParquetReader) SearchEntriesIter ¶

func (pr *ParquetReader) SearchEntriesIter(options SearchOptions) iter.Seq2[SearchResult, error]

SearchEntriesIter returns an iterator over search results with context

func (*ParquetReader) SeekToRow ¶

func (pr *ParquetReader) SeekToRow(startRow int64) iter.Seq2[ParquetLogEntry, error]

SeekToRow returns an iterator starting from the specified row number (0-based)

type ParquetWriter ¶

type ParquetWriter struct {
	// contains filtered or unexported fields
}

ParquetWriter provides streaming Parquet writing capabilities

func NewParquetWriter ¶

func NewParquetWriter(file *os.File) (*ParquetWriter, error)

NewParquetWriter creates a new Parquet writer for streaming

func (*ParquetWriter) Close ¶

func (pw *ParquetWriter) Close() error

Close closes the Parquet writer

func (*ParquetWriter) WriteBatch ¶

func (pw *ParquetWriter) WriteBatch(entries []*LogEntry) error

WriteBatch writes a batch of log entries to the Parquet file

type Parser ¶

type Parser struct {
	// contains filtered or unexported fields
}

Parser handles parsing of Buildkite log files

func NewParser ¶

func NewParser() *Parser

NewParser creates a new Buildkite log parser

func (*Parser) All ¶

func (p *Parser) All(reader io.Reader) iter.Seq2[*LogEntry, error]

All returns an iterator over all log entries using Go 1.23+ iter.Seq2 pattern Each iteration yields a *LogEntry and an error, following Go's idiomatic error handling This method creates isolated parser state to prevent contamination between iterations

func (*Parser) NewIterator deprecated

func (p *Parser) NewIterator(reader io.Reader) *LogIterator

NewIterator creates a new LogIterator for memory-efficient processing.

Deprecated: Use Parser.All() which returns an iter.Seq2 instead.

func (*Parser) ParseLine ¶

func (p *Parser) ParseLine(line string) (*LogEntry, error)

ParseLine parses a single log line

func (*Parser) Reset deprecated

func (p *Parser) Reset()

Reset clears the parser's internal state, useful for reusing the parser for multiple independent parsing operations.

Deprecated: State isolation is now handled internally by All() and LogIterator. This method will be removed in a future major version.

type QueryResult ¶

type QueryResult struct {
	Groups  []GroupInfo       `json:"groups,omitempty"`
	Entries []ParquetLogEntry `json:"entries,omitempty"`
	Stats   QueryStats        `json:"stats,omitempty"`
}

QueryResult holds the results of a query operation

type QueryStats ¶

type QueryStats struct {
	TotalEntries   int     `json:"total_entries"`
	MatchedEntries int     `json:"matched_entries"`
	TotalGroups    int     `json:"total_groups"`
	QueryTime      float64 `json:"query_time_ms"`
}

QueryStats contains performance and result statistics for queries

type SearchOptions ¶

type SearchOptions struct {
	Pattern       string // Regex pattern to search for
	CaseSensitive bool   // Enable case-sensitive matching
	InvertMatch   bool   // Show non-matching lines
	BeforeContext int    // Lines to show before match
	AfterContext  int    // Lines to show after match
	Context       int    // Lines to show before and after (overrides BeforeContext/AfterContext)
	Reverse       bool   // Search backwards from end/seek position
	SeekStart     int64  // Start search from this row (useful with Reverse)
}

SearchOptions configures regex search behavior

type SearchResult ¶

type SearchResult struct {
	Match         ParquetLogEntry   `json:"match"`
	BeforeContext []ParquetLogEntry `json:"before_context,omitempty"`
	AfterContext  []ParquetLogEntry `json:"after_context,omitempty"`
}

SearchResult represents a match with context lines

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
bklog command
examples
high-level-client command
query command
smart-cache command

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

Buildkite Logs Search & Query Library

Overview

Table of Contents

Features

Primary: High-Level Client API

Log Processing Engine

CLI Tools (Development & Debugging)

Quick Start

CLI Tools (Development & Debugging)

Installation

Examples

Local File Processing

Buildkite API Integration

Querying Parquet Files

Buildkite API Integration

Debugging Parser Issues

Debug Command Options

Use Cases

Real Examples Using Test Data

CLI Options

Parse Command

Query Command

Debug Command

Log Entry Types

Commands

Groups

Build Groups and Sections

Parquet Export

Intelligent Caching System

Benefits of Parquet Format

Parquet Schema

Flags Field

Usage Examples

API Reference

Types

Methods

Parser Methods

LogEntry Methods

Parquet Export Functions

Parquet Query Functions

ParquetReader Methods

Query Result Types

Performance

Benchmarks

Key Results (Apple M3 Pro)

Performance Improvements

Testing

Acknowledgments

License

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func ExportSeq2ToParquet ¶

func ExportSeq2ToParquetWithFilter ¶

func FilterByGroupIter ¶

func GenerateBlobKey ¶

func GetDefaultStorageURL ¶

func GetRuntimeInfo ¶

func IsContainerizedEnvironment ¶

func IsTerminalState ¶

func ReadParquetFileIter ¶

func StripANSI ¶

func StripANSIRegex ¶

func ValidateAPIParams ¶

Types ¶

type AfterBlobStorageFunc ¶

type AfterCacheCheckFunc ¶

type AfterJobStatusFunc ¶

type AfterLocalCacheFunc ¶

type AfterLogDownloadFunc ¶

type AfterLogParsingFunc ¶

type BaseResult ¶

type BlobMetadata ¶

type BlobStorage ¶

func NewBlobStorage ¶

func (*BlobStorage) Close ¶

func (*BlobStorage) Delete ¶