buildkitelogs

package module
v0.6.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2026 License: MIT Imports: 25 Imported by: 4

README

Buildkite Logs Search & Query Library

A Go library for searching and querying Buildkite CI/CD logs with intelligent caching and high-performance data analytics. Includes CLI tools for testing and debugging log parsing.

Build status Go Report Card Documentation

Overview

This library provides a high-level client API for searching and querying Buildkite CI/CD logs with intelligent caching and fast data analytics. Unlike terminal-to-html which focuses on log display and rendering, this library is designed for log data analysis, search, and programmatic access.

The library automatically downloads logs from the Buildkite API, caches them locally as efficient Parquet files, and provides powerful search and query capabilities. It handles Buildkite's special OSC sequence format (\x1b_bk;t=timestamp\x07content) and converts logs into structured, searchable data.

Table of Contents

Features

Primary: High-Level Client API
  • Intelligent Caching: Automatic download and caching of Buildkite logs with Time To Live (TTL) support
  • Fast Search & Query: Built-in search capabilities with regex patterns, filtering, and context
  • Buildkite API Integration: Direct fetching from Buildkite jobs via REST API with authentication
  • Parquet Storage: Efficient columnar storage for fast analytics and data processing using Apache Arrow.
  • Streaming Processing: Memory-efficient processing of logs of any size using Go iterators
  • Observability Hooks: Optional hooks for tracing and logging without framework coupling
Log Processing Engine
  • OSC Sequence Parsing: Correctly handles Buildkite's \x1b_bk;t=timestamp\x07content format
  • Group Tracking: Automatically associate entries with build sections (~~~, ---, +++)
  • Content Classification: Identifies commands, group headers, and regular output
  • ANSI Code Handling: Optional stripping of ANSI escape sequences for clean text output
  • Multiple Output Formats: Text, JSON, and Parquet export with filtering support
CLI Tools (Development & Debugging)
  • Parse Command: Convert logs to various formats for testing
  • Query Command: Fast querying of cached Parquet files
  • Debug Command: Troubleshoot OSC sequence parsing issues

Quick Start

For common use cases, the library provides a high-level Client API that simplifies downloading, caching, and querying Buildkite logs:

package main

import (
    "context"
    "fmt"
    "time"
    
    "github.com/buildkite/go-buildkite/v4"
    buildkitelogs "github.com/buildkite/buildkite-logs"
)

func main() {
    // Create buildkite client
    client, _ := buildkite.NewOpts(buildkite.WithTokenAuth("your-token"))
    
	ctx := context.Background()

    // Create high-level Client
    buildkiteLogsClient, err := buildkitelogs.NewClient(ctx, client, "file://~/.bklog")
    if err != nil {
        panic(err)
    }
    defer buildkiteLogsClient.Close()
        
    // Download, cache, and get a reader in one step
    reader, err := buildkiteLogsClient.NewReader(
        ctx, "myorg", "mypipeline", "123", "job-id",
        time.Minute*5, false, // TTL and force refresh
    )
    if err != nil {
        panic(err)
    }
    
    // Query the logs
    for entry, err := range reader.ReadEntriesIter() {
        if err != nil {
            panic(err)
        }
        fmt.Println(entry.Content)
    }
}

The Client provides:

  • Simplified API: Easy-to-use methods for common operations
  • Automatic caching: Intelligent caching with TTL support
  • Multiple backends: Support for both official *buildkite.Client and custom BuildkiteAPI implementations
  • Parameter validation: Built-in validation with descriptive error messages
  • Hooks System: Optional hooks for observability and tracing without coupling to specific frameworks

For detailed documentation, see docs/client-api.md. For a complete working example, see examples/high-level-client/.

CLI Tools (Development & Debugging)

Installation

Using Make (recommended):

# Build with tests and linting
make all

# Quick development build
make dev

# Build with specific version
make build VERSION=v1.2.3

# Other useful targets
make clean test lint help

Manual build:

make build

Build a snapshot with goreleaser:

goreleaser build --snapshot --clean --single-target

Check version:

./build/bklog version
# or
./build/bklog -v
# or  
./build/bklog --version
Examples
Local File Processing

Parse a log file with timestamps:

./build/bklog parse -file buildkite.log

Output only sections:

./build/bklog parse -file buildkite.log -filter section

Output only group headers:

./build/bklog parse -file buildkite.log -filter group

JSON output:

./build/bklog parse -file buildkite.log -json
Buildkite API Integration

Fetch logs directly from Buildkite API:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog parse -org myorg -pipeline mypipeline -build 123 -job abc-def-456

Export API logs to Parquet:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog parse -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -parquet logs.parquet -summary

Filter and export only sections from API:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog parse -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -filter section -json

Show processing statistics:

./build/bklog parse -file buildkite.log -summary

Output:

--- Processing Summary ---
Bytes processed: 24.4 KB
Total entries: 212
Entries with timestamps: 212

Sections: 13
Regular output: 184

Show group/section information:

./build/bklog parse -file buildkite.log -groups | head -5

Output:

[2025-04-22 21:43:29.921] [~~~ Running global environment hook] ~~~ Running global environment hook
[2025-04-22 21:43:29.921] [~~~ Running global environment hook] $ /buildkite/agent/hooks/environment
[2025-04-22 21:43:29.948] [~~~ Running global pre-checkout hook] ~~~ Running global pre-checkout hook
[2025-04-22 21:43:29.949] [~~~ Running global pre-checkout hook] $ /buildkite/agent/hooks/pre-checkout
[2025-04-22 21:43:29.975] [~~~ Preparing working directory] ~~~ Preparing working directory

Export to Parquet format:

./build/bklog parse -file buildkite.log -parquet output.parquet -summary

Output:

--- Processing Summary ---
Bytes processed: 24.4 KB
Total entries: 212
Entries with timestamps: 212

Sections: 13
Regular output: 184
Exported 212 entries to output.parquet

Export filtered data to Parquet:

./build/bklog parse -file buildkite.log -parquet sections.parquet -filter section -summary

This exports only section entries to a smaller Parquet file for analysis.

Querying Parquet Files

The CLI provides fast query operations on previously exported Parquet files:

List all groups with statistics:

./build/bklog query -file output.parquet -op list-groups

Output:

Groups found: 5

GROUP NAME                                ENTRIES          FIRST SEEN           LAST SEEN
------------------------------------------------------------------------------------------------------------
~~~ Running global environment hook             2        1 2025-04-22 21:43:29 2025-04-22 21:43:29
~~~ Running global pre-checkout hook            2        1 2025-04-22 21:43:29 2025-04-22 21:43:29
--- :package: Build job checkout dire...        2        1 2025-04-22 21:43:30 2025-04-22 21:43:30

--- Query Statistics ---
Total entries: 10
Matched entries: 10
Total groups: 5
Query time: 2.36 ms

Filter entries by group pattern:

./build/bklog query -file output.parquet -op by-group -group "environment"

Output:

Entries in group matching 'environment': 2

[2025-04-22 21:43:29.921] [GRP] ~~~ Running global environment hook
[2025-04-22 21:43:29.922] [CMD] $ /buildkite/agent/hooks/environment

--- Query Statistics ---
Total entries: 10
Matched entries: 2
Query time: 0.36 ms

Search entries using regex patterns:

./build/bklog query -file output.parquet -op search -pattern "git clone"

Output:

Matches found: 1

[2025-04-22 21:43:29.975] [~~~ Preparing working directory] MATCH: $ git clone -v -- https://github.com/buildkite/bash-example.git .

--- Search Statistics (Streaming) ---
Total entries: 212
Matches found: 1
Query time: 0.65 ms

Search with context lines (ripgrep-style):

./build/bklog query -file output.parquet -op search -pattern "error|failed" -C 3

Output:

Matches found: 2

[2025-04-22 21:43:30.690] [~~~ Running script] Running tests...
[2025-04-22 21:43:30.691] [~~~ Running script] Test suite started
[2025-04-22 21:43:30.692] [~~~ Running script] Running unit tests
[2025-04-22 21:43:30.693] [~~~ Running script] MATCH: Test failed: authentication error
[2025-04-22 21:43:30.694] [~~~ Running script] Cleaning up test files
[2025-04-22 21:43:30.695] [~~~ Running script] Test run completed
[2025-04-22 21:43:30.696] [~~~ Running script] Generating report
--
[2025-04-22 21:43:30.750] [~~~ Post-processing] Validating results
[2025-04-22 21:43:30.751] [~~~ Post-processing] Checking exit codes
[2025-04-22 21:43:30.752] [~~~ Post-processing] Build status: some tests failed
[2025-04-22 21:43:30.753] [~~~ Post-processing] MATCH: Build failed due to test failures
[2025-04-22 21:43:30.754] [~~~ Post-processing] Uploading logs
[2025-04-22 21:43:30.755] [~~~ Post-processing] Notifying team
[2025-04-22 21:43:30.756] [~~~ Post-processing] Cleanup completed

Search with separate before/after context:

./build/bklog query -file output.parquet -op search -pattern "npm install" -B 2 -A 5

Case-sensitive search:

./build/bklog query -file output.parquet -op search -pattern "ERROR" -case-sensitive

Invert match (show non-matching lines):

./build/bklog query -file output.parquet -op search -pattern "buildkite" -invert-match -limit 5

Reverse search (find recent errors first):

./build/bklog query -file output.parquet -op search -pattern "error|failed" -reverse -C 2

Reverse search from specific position:

./build/bklog query -file output.parquet -op search -pattern "test.*failed" -reverse -search-seek 1000

Search with JSON output:

./build/bklog query -file output.parquet -op search -pattern "git clone" -format json -C 1

JSON output for programmatic use:

./build/bklog query -file output.parquet -op list-groups -format json

Query without statistics:

./build/bklog query -file output.parquet -op list-groups -stats=false

Query last 20 entries:

./build/bklog query -file output.parquet -op tail -tail 20

Query specific row position:

./build/bklog query -file output.parquet -op seek -seek 100

Limit query results:

./build/bklog query -file output.parquet -op by-group -group "test" -limit 50

Get file information:

./build/bklog query -file output.parquet -op info

Dump all entries from the file:

./build/bklog query -file output.parquet -op dump

Dump with limited entries:

./build/bklog query -file output.parquet -op dump -limit 100

Dump all entries as JSON:

./build/bklog query -file output.parquet -op dump -format json

Dump entries with raw output (no timestamps/groups):

./build/bklog query -file output.parquet -op dump -raw

Dump entries with ANSI codes stripped:

./build/bklog query -file output.parquet -op dump -strip-ansi
Buildkite API Integration

The query command now supports direct API integration, automatically downloading and caching logs from Buildkite:

Query logs directly from Buildkite API:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op list-groups

Query specific group from API logs:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op by-group -group "tests"

Search API logs with regex patterns:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op search -pattern "error|failed" -C 2

Search API logs with case sensitivity:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op search -pattern "ERROR" -case-sensitive

Reverse search API logs (find recent failures):

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op search -pattern "test.*failed" -reverse -C 2

Query last 10 entries from API logs:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op tail -tail 10

Get file info for cached API logs:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op info

Dump all entries from API logs:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op dump

Query with custom cache TTL:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op info -cache-ttl=5m

Force refresh cached logs:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op list-groups -cache-force-refresh

Use custom cache location:

export BUILDKITE_API_TOKEN="bkua_your_token_here"
./build/bklog query -org myorg -pipeline mypipeline -build 123 -job abc-def-456 -op info -cache-url=file:///tmp/bklogs

Logs are automatically downloaded and cached in ~/.bklog/ as {org}-{pipeline}-{build}-{job}.parquet files. Subsequent queries use the cached version unless the cache is manually cleared.

Debugging Parser Issues

The CLI includes a debug command for troubleshooting parser corruption issues, especially useful when investigating problems with OSC sequence parsing:

Debug parser behavior on specific lines:

./build/bklog debug -file buildkite.log -start 17 -limit 5 -verbose

Output:

=== Debug Mode: parse ===
File: buildkite.log
Lines: 17-21

--- Line 17 ---
Timestamp: 2025-07-01 09:20:41.629 +1000 AEST (Unix: 1751321141)
Content: "remote: Counting objects:   0% (1/287)K_bk;t=1751321141629remote: Counting objects:   1% (3/287)K..."
Group: ""
RawLine length: 6619
IsCommand: false
IsGroup: false

Show hex dump of corrupted lines:

./build/bklog debug -file buildkite.log -mode hex -start 17 -limit 1

Output:

=== Debug Mode: hex ===
File: buildkite.log
Lines: 17-17

--- Line 17 ---
Length: 6619 bytes
00000000  1b 5f 62 6b 3b 74 3d 31  37 35 31 33 32 31 31 34  |._bk;t=175132114|
00000010  31 36 32 39 07 72 65 6d  6f 74 65 3a 20 43 6f 75  |1629.remote: Cou|
00000020  6e 74 69 6e 67 20 6f 62  6a 65 63 74 73 3a 20 20  |nting objects:  |
00000030  20 30 25 20 28 31 2f 32  38 37 29 1b 5b 4b 1b 5f  | 0% (1/287).[K._|
00000040  62 6b 3b 74 3d 31 37 35  31 33 32 31 31 34 31 36  |bk;t=17513211416|

Show raw line content with line numbers:

./build/bklog debug -file buildkite.log -mode lines -start 100 -limit 3

Output:

=== Debug Mode: lines ===
File: buildkite.log
Lines: 100-102

--- Line 100 ---
Raw: "\x1b_bk;t=1751321141985\aremote: Total 2113 (delta 1830), reused 2113 (delta 1830), pack-reused 0\r"
Length: 98

--- Line 101 ---
Raw: "\x1b_bk;t=1751321142039\aReceiving objects: 100% (2113/2113), 630.45 KiB | 630.00 KiB/s, done.\r"
Length: 102

Debug with combined options:

./build/bklog debug -file buildkite.log -start 50 -end 55 -verbose -raw -hex

This will show verbose parse information, raw line content, and hex dump for lines 50-55.

Debug Command Options
./build/bklog debug [options]

Required:

  • -file <path>: Path to log file to debug (required)

Range Options:

  • -start <line>: Start line number (1-based, default: 1)
  • -end <line>: End line number (0 = start+limit or EOF, default: 0)
  • -limit <num>: Number of lines to process (default: 10)

Mode Options:

  • -mode <mode>: Debug mode: parse, hex, lines (default: parse)

Display Options:

  • -verbose: Show detailed parsing information (default: false)
  • -raw: Show raw line content (default: false)
  • -hex: Show hex dump of each line (default: false)
  • -parsed: Show parsed log entry (default: true)
Use Cases

Investigating Parser Corruption: The debug command is particularly useful for investigating issues where the parser only handles the first OSC sequence per line but ignores subsequent ones, causing content corruption.

Common Issues Debugged:

  • Multiple OSC sequences per line (e.g., progress updates)
  • Malformed OSC sequences missing proper terminators
  • ANSI escape sequences interfering with parsing
  • Timestamp extraction failures
  • Content/group association problems

Example Workflow:

# 1. Identify problematic lines in output
./build/bklog parse -file buildkite.log | grep -n "unexpected content"

# 2. Debug specific lines with verbose output
./build/bklog debug -file buildkite.log -start 142 -limit 1 -verbose

# 3. Examine raw bytes if needed
./build/bklog debug -file buildkite.log -start 142 -limit 1 -mode hex

# 4. Compare multiple lines to understand patterns
./build/bklog debug -file buildkite.log -start 140 -end 145 -raw

# 5. Extract all timestamps to CSV for analysis
./build/bklog debug -file buildkite.log -mode extract-timestamps -csv timestamps.csv

Extract all OSC timestamps to CSV:

./build/bklog debug -file buildkite.log -mode extract-timestamps -csv timestamps.csv

This extracts all OSC sequence timestamps from the log file into a CSV file with columns: line_number, osc_offset, timestamp_ms, timestamp_formatted.

Real Examples Using Test Data

The repository includes test data files that you can use to try out the tail functionality:

View last 5 entries from the test log:

./build/bklog query -file ./testdata/bash-example.parquet -op tail -tail 5

Output:

[2025-04-22 21:43:32.739] [CMD] $ echo 'Tests passed!'
[2025-04-22 21:43:32.740] Tests passed!
[2025-04-22 21:43:32.740] [GRP] +++ End of Example tests
[2025-04-22 21:43:32.740] [CMD] $ buildkite-agent annotate --style success 'Build passed'
[2025-04-22 21:43:32.748] Annotation added

View last 10 entries (default) with JSON output:

./build/bklog query -file ./testdata/bash-example.parquet -op tail -format json

Parse the raw log file and immediately query the last 3 entries:

# First create a fresh parquet file from the raw log
./build/bklog parse -file ./testdata/bash-example.log -parquet temp.parquet

# Then query the last 3 entries
./build/bklog query -file temp.parquet -op tail -tail 3

Combine with other operations - show file info then tail:

# Get file statistics
./build/bklog query -file ./testdata/bash-example.parquet -op info

# Then view the last few entries
./build/bklog query -file ./testdata/bash-example.parquet -op tail -tail 7

Dump all entries from the test file:

./build/bklog query -file ./testdata/bash-example.parquet -op dump

Dump first 10 entries as JSON:

./build/bklog query -file ./testdata/bash-example.parquet -op dump -limit 10 -format json
CLI Options
Parse Command
./build/bklog parse [options]

Local File Options:

  • -file <path>: Path to Buildkite log file (use this OR API parameters below)

Buildkite API Options:

  • -org <slug>: Buildkite organization slug (for API access)
  • -pipeline <slug>: Buildkite pipeline slug (for API access)
  • -build <number>: Buildkite build number or UUID (for API access)
  • -job <id>: Buildkite job ID (for API access)

Output Options:

  • -json: Output as JSON instead of text
  • -filter <type>: Filter entries by type (group, section)
  • -summary: Show processing summary at the end
  • -groups: Show group/section information for each entry
  • -parquet <path>: Export to Parquet file (e.g., output.parquet)
  • -jsonl <path>: Export to JSON Lines file (e.g., output.jsonl)
Query Command
./build/bklog query [options]

Data Source Options (choose one):

  • -file <path>: Path to Parquet log file (use this OR API parameters below)

Buildkite API Options:

  • -org <slug>: Buildkite organization slug (for API access)
  • -pipeline <slug>: Buildkite pipeline slug (for API access)
  • -build <number>: Buildkite build number or UUID (for API access)
  • -job <id>: Buildkite job ID (for API access)

Query Options:

  • -op <operation>: Query operation (list-groups, by-group, search, info, tail, seek, dump) (default: list-groups)
  • -group <pattern>: Group name pattern to filter by (for by-group operation)
  • -format <format>: Output format (text, json) (default: text)
  • -stats: Show query statistics (default: true)
  • -limit <number>: Limit number of entries returned (0 = no limit, enables early termination)
  • -tail <number>: Number of lines to show from end (for tail operation, default: 10)
  • -seek <row>: Row number to seek to (0-based, for seek operation)
  • -raw: Output raw log content without timestamps, groups, or other prefixes
  • -strip-ansi: Strip ANSI escape codes from log content

Search Options:

  • -pattern <regex>: Regex pattern to search for (for search operation)
  • -A <num>: Show NUM lines after each match (ripgrep-style)
  • -B <num>: Show NUM lines before each match (ripgrep-style)
  • -C <num>: Show NUM lines before and after each match (ripgrep-style)
  • -case-sensitive: Enable case-sensitive search (default: case-insensitive)
  • -invert-match: Show non-matching lines instead of matching ones
  • -reverse: Search backwards from end/seek position (useful for finding recent errors first)
  • -search-seek <row>: Start search from this row number (0-based, useful with -reverse)

Cache Options (API mode only):

  • -cache-ttl <duration>: Cache TTL for non-terminal jobs (default: 30s)
  • -cache-force-refresh: Force refresh cached entry (ignores cache)
  • -cache-url <url>: Cache storage URL (file://path, s3://bucket, etc., default: ~/.bklog)
Debug Command
./build/bklog debug [options]

Required:

  • -file <path>: Path to log file to debug (required)

Range Options:

  • -start <line>: Start line number (1-based, default: 1)
  • -end <line>: End line number (0 = start+limit or EOF, default: 0)
  • -limit <num>: Number of lines to process (default: 10)

Mode Options:

  • -mode <mode>: Debug mode: parse, hex, lines, extract-timestamps (default: parse)

Display Options:

  • -verbose: Show detailed parsing information (default: false)
  • -raw: Show raw line content (default: false)
  • -hex: Show hex dump of each line (default: false)
  • -parsed: Show parsed log entry (default: true)
  • -csv <path>: Output CSV file for extract-timestamps mode

Note: For API usage, set BUILDKITE_API_TOKEN environment variable. Logs are automatically downloaded and cached in ~/.bklog/.

Security: Keep your Buildkite API token secure. Never commit tokens to version control or expose them in logs. Use environment variables or secure secret management systems.

Log Entry Types

Commands

Lines that represent shell commands being executed:

[2025-04-22 21:43:29.975] $ git clone -v -- https://github.com/buildkite/bash-example.git .
Groups

Headers that mark different phases of the build (collapsible in Buildkite UI):

[2025-04-22 21:43:29.921] ~~~ Running global environment hook
[2025-04-22 21:43:30.694] --- :package: Build job checkout directory
[2025-04-22 21:43:30.699] +++ :hammer: Example tests
Build Groups and Sections

The parser automatically tracks which section or group each log entry belongs to:

[2025-04-22 21:43:29.921] [~~~ Running global environment hook] ~~~ Running global environment hook
[2025-04-22 21:43:29.921] [~~~ Running global environment hook] $ /buildkite/agent/hooks/environment
[2025-04-22 21:43:29.948] [~~~ Running global pre-checkout hook] ~~~ Running global pre-checkout hook

Each entry is automatically associated with the most recent group header (~~~, ---, or +++). This allows you to:

  • Group related log entries by build phase
  • Filter logs by group for focused analysis
  • Understand build structure and timing relationships
  • Export structured data with group context preserved

Parquet Export

The parser can export log entries to Apache Parquet format using the official Apache Arrow Go implementation for efficient storage and analysis. Parquet files can be directly queried by tools like DuckDB, Apache Spark, and Pandas for powerful log analytics:

Intelligent Caching System

The library uses a two-tier intelligent caching strategy that optimizes for both performance and data freshness:

flowchart TD
    A[Start: DownloadAndCache] --> B[Check blob storage cache]
    B --> C{Cache exists?}
    C -->|No| H[Download logs from API]
    C -->|Yes| D{Force refresh?}
    D -->|Yes| H
    D -->|No| E[Get job status]
    E --> F{Job is terminal?}
    F -->|Yes| G[Use cache immediately<br/>Terminal jobs never expire]
    F -->|No| I{Time elapsed < TTL?}
    I -->|Yes| J[Use cache<br/>Within TTL window]
    I -->|No| H
    H --> K[Parse logs to Parquet]
    K --> L[Store in blob storage with metadata]
    L --> M[Create local cache file]
    G --> N[Create local cache file]
    J --> N
    M --> O[Return local file path]
    N --> O

    classDef terminal fill:#1a472a,stroke:#4ade80,color:#ffffff
    classDef cache fill:#1e3a8a,stroke:#60a5fa,color:#ffffff
    classDef download fill:#7c2d12,stroke:#fb923c,color:#ffffff
    classDef decision fill:#374151,stroke:#9ca3af,color:#ffffff

    class G,F terminal
    class B,C,I,J,N cache
    class H,K,L,M download
    class D,E decision

Caching Strategy:

  • Terminal Jobs: Once a job completes, logs never change → cache forever (no TTL check)
  • Running Jobs: Logs may still be updated → respect TTL to ensure fresh data
  • Force Refresh: Override cache entirely for debugging or manual refresh scenarios
Benefits of Parquet Format
  • Columnar storage: Efficient compression and query performance
  • Schema preservation: Maintains data types and structure
  • Analytics ready: Compatible with Pandas, Apache Spark, DuckDB, and other data tools
  • Compact size: Typically 70-90% smaller than JSON for log data
  • Fast queries: Optimized for analytical workloads and filtering
Parquet Schema

The exported Parquet files contain the following columns:

Column Type Description
timestamp int64 Unix timestamp in milliseconds since epoch
content string Log content after OSC sequence processing
group string Current build group/section name
flags int32 Bitwise flags field (HasTimestamp=1, IsCommand=2, IsGroup=4)
Flags Field

The flags column uses bitwise operations to efficiently store multiple boolean properties:

Flag Bit Position Value Description
HasTimestamp 0 1 Entry has a valid timestamp
IsCommand 1 2 Entry is a shell command
IsGroup 2 4 Entry is a group header
Usage Examples

Basic export:

./build/bklog -file buildkite.log -parquet output.parquet

Export with filtering:

./build/bklog -file buildkite.log -parquet commands.parquet -filter command

Export with streaming processing:

./build/bklog -file buildkite.log -parquet output.parquet -summary

This uses the modern iter.Seq2[*LogEntry, error] iterator pattern for memory-efficient processing.

API Reference

Types
type LogEntry struct {
    Timestamp time.Time  // Parsed timestamp (zero if no timestamp)
    Content   string     // Log content after OSC sequence
    RawLine   []byte     // Original raw log line as bytes
    Group     string     // Current section/group this entry belongs to
}

type Parser struct {
    // Internal regex patterns
}
Methods
Parser Methods
// Create a new parser
func NewParser() *Parser

// Parse a single log line
func (p *Parser) ParseLine(line string) (*LogEntry, error)

// Create iter.Seq2 iterator with proper error handling (streaming approach)
func (p *Parser) All(reader io.Reader) iter.Seq2[*LogEntry, error]

// Strip ANSI escape sequences
func (p *Parser) StripANSI(content string) string
LogEntry Methods
func (entry *LogEntry) HasTimestamp() bool
func (entry *LogEntry) CleanContent() string  // Content with ANSI stripped
func (entry *LogEntry) IsCommand() bool
func (entry *LogEntry) IsGroup() bool         // Check if entry is a group header (~~~, ---, +++)
func (entry *LogEntry) IsSection() bool       // Deprecated: use IsGroup() instead
Parquet Export Functions
// Export using iter.Seq2 streaming iterator
func ExportSeq2ToParquet(seq iter.Seq2[*LogEntry, error], filename string) error

// Export using iter.Seq2 with filtering
func ExportSeq2ToParquetWithFilter(seq iter.Seq2[*LogEntry, error], filename string, filterFunc func(*LogEntry) bool) error

// Create a new Parquet writer for streaming
func NewParquetWriter(file *os.File) *ParquetWriter

// Write a batch of entries to Parquet
func (pw *ParquetWriter) WriteBatch(entries []*LogEntry) error

// Close the Parquet writer
func (pw *ParquetWriter) Close() error
Parquet Query Functions
// Create a new Parquet reader
func NewParquetReader(filename string) *ParquetReader

// Stream entries from a Parquet file
func ReadParquetFileIter(filename string) iter.Seq2[ParquetLogEntry, error]

// Filter streaming entries by group pattern (case-insensitive)
func FilterByGroupIter(entries iter.Seq2[ParquetLogEntry, error], groupPattern string) iter.Seq2[ParquetLogEntry, error]
ParquetReader Methods
// Stream all log entries from the Parquet file
func (pr *ParquetReader) ReadEntriesIter() iter.Seq2[ParquetLogEntry, error]

// Stream entries filtered by group pattern
func (pr *ParquetReader) FilterByGroupIter(groupPattern string) iter.Seq2[ParquetLogEntry, error]
Query Result Types
type ParquetLogEntry struct {
    Timestamp   int64    `json:"timestamp"`    // Unix timestamp in milliseconds
    Content     string   `json:"content"`      // Log content
    Group       string   `json:"group"`        // Associated group/section
    Flags       LogFlags `json:"flags"`        // Bitwise flags (HasTimestamp=1, IsCommand=2, IsGroup=4)
}

// Backward-compatible methods
func (entry *ParquetLogEntry) HasTime() bool      // Returns Flags.HasTimestamp()
func (entry *ParquetLogEntry) IsCommand() bool    // Returns Flags.IsCommand()
func (entry *ParquetLogEntry) IsGroup() bool      // Returns Flags.IsGroup()

type LogFlags int32

// Bitwise flag operations
func (lf LogFlags) Has(flag LogFlag) bool         // Check if flag is set
func (lf *LogFlags) Set(flag LogFlag)             // Set flag
func (lf *LogFlags) Clear(flag LogFlag)           // Clear flag
func (lf *LogFlags) Toggle(flag LogFlag)          // Toggle flag

// Convenience methods
func (lf LogFlags) HasTimestamp() bool            // Check HasTimestamp flag
func (lf LogFlags) IsCommand() bool               // Check IsCommand flag  
func (lf LogFlags) IsGroup() bool                 // Check IsGroup flag

type GroupInfo struct {
    Name       string    `json:"name"`          // Group/section name
    EntryCount int       `json:"entry_count"`   // Number of entries in group
    FirstSeen  time.Time `json:"first_seen"`    // Timestamp of first entry
    LastSeen   time.Time `json:"last_seen"`     // Timestamp of last entry
    Commands   int       `json:"commands"`      // Number of command entries

}

Performance

Benchmarks

The parser includes comprehensive benchmarks to measure performance. Run them with:

go test -bench=. -benchmem
Key Results (Apple M3 Pro)

Single Line Parsing (Byte-based):

  • OSC sequence with timestamp: ~64 ns/op, 192 B/op, 3 allocs/op
  • Regular line (no timestamp): ~29 ns/op, 128 B/op, 2 allocs/op
  • ANSI-heavy line: ~68 ns/op, 224 B/op, 3 allocs/op

Memory Usage (10,000 lines):

  • Seq2 Streaming Iterator: ~3.5 MB allocated, 64,006 allocations
  • Constant memory footprint regardless of file size

Streaming Throughput:

  • 100 lines: ~51,000 ops/sec
  • 1,000 lines: ~5,200 ops/sec
  • 10,000 lines: ~510 ops/sec
  • 100,000 lines: ~54 ops/sec

ANSI Stripping: ~7.7M ops/sec, 160 B/op, 2 allocs/op

Parquet Export Performance (1,000 lines, Apache Arrow):

  • Seq2 streaming export: ~1,100 ops/sec, 1.2 MB allocated

Content Classification Performance (1,000 entries):

  • IsCommand(): ~15,000 ops/sec, 84 KB allocated

  • IsGroup(): ~14,000 ops/sec, 84 KB allocated

  • CleanContent(): ~15,000 ops/sec, 84 KB allocated

Parquet Streaming Query Performance (Apache Arrow Go v18):

  • ReadEntriesIter: Constant memory usage, ~5,700 entries/sec
  • FilterByGroupIter: Early termination support, ~5,700 entries/sec
  • Memory-efficient: Processes files of any size with constant memory footprint

Streaming Query Scalability:

  • Constant memory usage regardless of file size
  • Early termination support for partial processing
  • Linear processing time scales with data size
  • No memory allocation growth for large files
Performance Improvements

Byte-based Parser vs Regex:

  • 10x faster OSC sequence parsing (~46ns vs ~477ns)
  • 10x faster ANSI stripping (~127ns vs ~1311ns)
  • Fewer allocations (2 vs 5 for ANSI stripping)
  • Better memory efficiency for complex lines

Streaming Memory Efficiency:

  • Constant memory footprint regardless of file size
  • True streaming processing for files of any size
  • Early termination capability with immediate resource cleanup
  • Memory-safe processing of multi-gigabyte files

Testing

Run the test suite:

go test -v

Run benchmarks:

go test -bench=. -benchmem

The tests cover:

  • OSC sequence parsing
  • Timestamp extraction
  • ANSI code stripping
  • Content classification
  • Stream processing
  • Iterator functionality
  • Memory usage patterns

Acknowledgments

This library was developed with assistance from Claude (Anthropic) for parsing, query functionality, and performance optimization.

License

This project is licensed under the MIT License.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ExportSeq2ToParquet

func ExportSeq2ToParquet(seq iter.Seq2[*LogEntry, error], filename string) error

ExportSeq2ToParquet exports log entries using Go 1.23+ iter.Seq2 for efficient iteration

func ExportSeq2ToParquetWithFilter

func ExportSeq2ToParquetWithFilter(seq iter.Seq2[*LogEntry, error], filename string, filterFunc func(*LogEntry) bool) error

ExportSeq2ToParquetWithFilter exports filtered log entries using iter.Seq2

func FilterByGroupIter

func FilterByGroupIter(entries iter.Seq2[ParquetLogEntry, error], groupPattern string) iter.Seq2[ParquetLogEntry, error]

FilterByGroupIter returns an iterator over entries that belong to groups matching the specified pattern

func GenerateBlobKey

func GenerateBlobKey(org, pipeline, build, job string) string

GenerateBlobKey creates a consistent key for blob storage

func GetDefaultStorageURL

func GetDefaultStorageURL(storageURL string, noTempDir bool) (string, error)

GetDefaultStorageURL returns the default storage URL based on environment

If noTempDir is true, the returned file:// URL will include the no_tmp_dir parameter, which causes gocloud.dev/blob/fileblob to create temporary files in the same directory as the final destination, avoiding cross-filesystem rename errors.

This function applies the noTempDir setting to both user-provided and default URLs.

func GetRuntimeInfo

func GetRuntimeInfo() map[string]string

GetRuntimeInfo returns information about the current runtime environment

func IsContainerizedEnvironment

func IsContainerizedEnvironment() bool

IsContainerizedEnvironment detects if we're running in a container

func IsTerminalState

func IsTerminalState(state JobState) bool

IsTerminalState returns true if the given job state is terminal

func ReadParquetFileIter

func ReadParquetFileIter(filename string) iter.Seq2[ParquetLogEntry, error]

ReadParquetFileIter is a convenience function to get an iterator over entries from a Parquet file

func StripANSI

func StripANSI(s string) string

StripANSI removes ANSI escape sequences using strings.Builder for efficiency

func StripANSIRegex

func StripANSIRegex(s string) string

StripANSIRegex removes ANSI escape sequences from a string using regex

func ValidateAPIParams

func ValidateAPIParams(org, pipeline, build, job string) error

ValidateAPIParams validates that all required API parameters are provided

Types

type AfterBlobStorageFunc

type AfterBlobStorageFunc func(ctx context.Context, result *BlobStorageResult)

type AfterCacheCheckFunc

type AfterCacheCheckFunc func(ctx context.Context, result *CacheCheckResult)

Hook function types for different stages of downloadAndCacheWithBlobStorage

type AfterJobStatusFunc

type AfterJobStatusFunc func(ctx context.Context, result *JobStatusResult)

type AfterLocalCacheFunc

type AfterLocalCacheFunc func(ctx context.Context, result *LocalCacheResult)

type AfterLogDownloadFunc

type AfterLogDownloadFunc func(ctx context.Context, result *LogDownloadResult)

type AfterLogParsingFunc

type AfterLogParsingFunc func(ctx context.Context, result *LogParsingResult)

type BaseResult

type BaseResult struct {
	Org, Pipeline, Build, Job string
	Duration                  time.Duration
}

BaseResult contains common fields for all hook results

type BlobMetadata

type BlobMetadata struct {
	JobID        string    `json:"job_id"`
	JobState     string    `json:"job_state"`
	IsTerminal   bool      `json:"is_terminal"`
	CachedAt     time.Time `json:"cached_at"`
	TTL          string    `json:"ttl"` // duration string like "30s"
	Organization string    `json:"organization"`
	Pipeline     string    `json:"pipeline"`
	Build        string    `json:"build"`
}

BlobMetadata contains metadata for cached blobs

type BlobStorage

type BlobStorage struct {
	// contains filtered or unexported fields
}

BlobStorage provides an abstraction over blob storage backends

func NewBlobStorage

func NewBlobStorage(ctx context.Context, storageURL string, opts *BlobStorageOptions) (*BlobStorage, error)

NewBlobStorage creates a new blob storage instance from a storage URL Supports file:// URLs for local filesystem storage

The opts parameter allows configuring blob storage behavior. Pass nil to use default options.

func (*BlobStorage) Close

func (bs *BlobStorage) Close() error

Close closes the blob storage connection

func (*BlobStorage) Delete

func (bs *BlobStorage) Delete(ctx context.Context, key string) error

Delete removes a blob from storage

func (*BlobStorage) Exists

func (bs *BlobStorage) Exists(ctx context.Context, key string) (bool, error)

Exists checks if a blob exists in storage

func (*BlobStorage) GetModTime

func (bs *BlobStorage) GetModTime(ctx context.Context, key string) (time.Time, error)

GetModTime returns the modification time of a blob

func (*BlobStorage) ReadWithMetadata

func (bs *BlobStorage) ReadWithMetadata(ctx context.Context, key string) (*BlobMetadata, error)

ReadWithMetadata reads data from blob storage with metadata

func (*BlobStorage) Reader added in v0.6.1

func (bs *BlobStorage) Reader(ctx context.Context, key string) (io.ReadCloser, error)

Reader returns an io.ReadCloser for streaming blob data from the specified key. The caller is responsible for closing the returned reader when done.

func (*BlobStorage) WriteWithMetadata

func (bs *BlobStorage) WriteWithMetadata(ctx context.Context, key string, data []byte, metadata *BlobMetadata) error

WriteWithMetadata writes data to blob storage with metadata

type BlobStorageOptions added in v0.6.3

type BlobStorageOptions struct {
	// NoTempDir controls whether to use the no_tmp_dir URL parameter for file:// URLs.
	// When true, temporary files are created in the same directory as the final destination,
	// avoiding cross-filesystem rename errors. This may result in stranded .tmp files if
	// the process crashes before cleanup runs.
	//
	// When false (default), temporary files are created in os.TempDir(), which may cause
	// "invalid cross-device link" errors if the temp directory is on a different filesystem
	// than the storage directory.
	NoTempDir bool
}

BlobStorageOptions contains configuration options for blob storage

type BlobStorageResult

type BlobStorageResult struct {
	BaseResult
	BlobKey    string
	DataSize   int64
	IsTerminal bool
	TTL        time.Duration
}

BlobStorageResult contains the result of storing data in blob storage

type BuildkiteAPI

type BuildkiteAPI interface {
	JobStatusProvider
	LogProvider
}

BuildkiteAPI combines both job status and log providers

type BuildkiteAPIClient

type BuildkiteAPIClient struct {
	// contains filtered or unexported fields
}

BuildkiteAPIClient provides methods to interact with the Buildkite API Now wraps the official go-buildkite v4 client

func NewBuildkiteAPIClient

func NewBuildkiteAPIClient(apiToken, version string) *BuildkiteAPIClient

NewBuildkiteAPIClient creates a new Buildkite API client using go-buildkite v4

func NewBuildkiteAPIExistingClient

func NewBuildkiteAPIExistingClient(client *buildkite.Client) *BuildkiteAPIClient

NewBuildkiteAPI creates a new Buildkite API client using the provided go-buildkite client

func (*BuildkiteAPIClient) GetJobLog

func (c *BuildkiteAPIClient) GetJobLog(ctx context.Context, org, pipeline, build, job string) (io.ReadCloser, error)

GetJobLog fetches the log output for a specific job using go-buildkite org: organization slug pipeline: pipeline slug build: build number or UUID job: job ID

func (*BuildkiteAPIClient) GetJobStatus

func (c *BuildkiteAPIClient) GetJobStatus(ctx context.Context, org, pipeline, build, jobID string) (*JobStatus, error)

GetJobStatus gets the current status of a job

type ByteParser

type ByteParser struct{}

ByteParser handles byte-level parsing of Buildkite log files

func NewByteParser

func NewByteParser() *ByteParser

NewByteParser creates a new byte-based parser

func (*ByteParser) ParseLine

func (p *ByteParser) ParseLine(line string) (*LogEntry, error)

ParseLine parses a single log line using byte scanning

type CacheCheckResult

type CacheCheckResult struct {
	BaseResult
	BlobKey string
	Exists  bool
}

CacheCheckResult contains the result of checking blob storage cache

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client provides a high-level convenience API for common buildkite-logs-parquet operations

func NewClient

func NewClient(ctx context.Context, client *buildkite.Client, storageURL string) (*Client, error)

NewClient creates a new Client using the provided go-buildkite client

func NewClientWithAPI

func NewClientWithAPI(ctx context.Context, api BuildkiteAPI, storageURL string) (*Client, error)

NewClientWithAPI creates a new Client using a custom BuildkiteAPI implementation

func (*Client) Close

func (c *Client) Close() error

Close closes the underlying blob storage connection

func (*Client) DownloadAndCache

func (c *Client) DownloadAndCache(ctx context.Context, org, pipeline, build, job string, ttl time.Duration, forceRefresh bool) (string, error)

DownloadAndCache downloads and caches job logs as Parquet format, returning the local file path

Parameters:

  • org: Buildkite organization slug
  • pipeline: Pipeline slug
  • build: Build number or UUID
  • job: Job ID
  • ttl: Time-to-live for cache (use 0 for default 30s)
  • forceRefresh: If true, forces re-download even if cache exists

Returns the local file path of the cached Parquet file

func (*Client) Hooks

func (c *Client) Hooks() *Hooks

Hooks returns the hooks instance for registering callback functions

func (*Client) NewReader

func (c *Client) NewReader(ctx context.Context, org, pipeline, build, job string, ttl time.Duration, forceRefresh bool) (*ParquetReader, error)

NewReader downloads and caches job logs (if needed) and returns a ParquetReader for querying

Parameters:

  • org: Buildkite organization slug
  • pipeline: Pipeline slug
  • build: Build number or UUID
  • job: Job ID
  • ttl: Time-to-live for cache (use 0 for default 30s)
  • forceRefresh: If true, forces re-download even if cache exists

Returns a ParquetReader for querying the log data

type GroupInfo

type GroupInfo struct {
	Name       string    `json:"name"`
	EntryCount int       `json:"entry_count"`
	FirstSeen  time.Time `json:"first_seen"`
	LastSeen   time.Time `json:"last_seen"`
}

GroupInfo contains statistical information about a log group

type Hooks

type Hooks struct {
	OnAfterCacheCheck  []AfterCacheCheckFunc
	OnAfterJobStatus   []AfterJobStatusFunc
	OnAfterLogDownload []AfterLogDownloadFunc
	OnAfterLogParsing  []AfterLogParsingFunc
	OnAfterBlobStorage []AfterBlobStorageFunc
	OnAfterLocalCache  []AfterLocalCacheFunc
}

Hooks contains all registered hook functions

func (*Hooks) AddAfterBlobStorage

func (h *Hooks) AddAfterBlobStorage(hook AfterBlobStorageFunc)

func (*Hooks) AddAfterCacheCheck

func (h *Hooks) AddAfterCacheCheck(hook AfterCacheCheckFunc)

Hook registration methods

func (*Hooks) AddAfterJobStatus

func (h *Hooks) AddAfterJobStatus(hook AfterJobStatusFunc)

func (*Hooks) AddAfterLocalCache

func (h *Hooks) AddAfterLocalCache(hook AfterLocalCacheFunc)

func (*Hooks) AddAfterLogDownload

func (h *Hooks) AddAfterLogDownload(hook AfterLogDownloadFunc)

func (*Hooks) AddAfterLogParsing

func (h *Hooks) AddAfterLogParsing(hook AfterLogParsingFunc)

type JobState

type JobState string

JobState represents the possible states of a Buildkite job

const (
	JobStateFinished JobState = "finished"  // Job completed (passed or failed)
	JobStatePassed   JobState = "passed"    // Job completed successfully
	JobStateFailed   JobState = "failed"    // Job completed with failure
	JobStateCanceled JobState = "canceled"  // Job was canceled
	JobStateExpired  JobState = "expired"   // Job expired before being picked up
	JobStateTimedOut JobState = "timed_out" // Job timed out during execution
	JobStateSkipped  JobState = "skipped"   // Job was skipped
	JobStateBroken   JobState = "broken"    // Job configuration is broken
)

Terminal job states - jobs in these states will not change

const (
	JobStatePending         JobState = "pending"          // Job is pending
	JobStateWaiting         JobState = "waiting"          // Job is waiting
	JobStateWaitingFailed   JobState = "waiting_failed"   // Job waiting failed
	JobStateBlocked         JobState = "blocked"          // Job is blocked
	JobStateBlockedFailed   JobState = "blocked_failed"   // Job blocked failed
	JobStateUnblocked       JobState = "unblocked"        // Job is unblocked
	JobStateUnblockedFailed JobState = "unblocked_failed" // Job unblocked failed
	JobStateLimiting        JobState = "limiting"         // Job is limiting
	JobStateLimited         JobState = "limited"          // Job is limited
	JobStateScheduled       JobState = "scheduled"        // Job is scheduled
	JobStateAssigned        JobState = "assigned"         // Job is assigned
	JobStateAccepted        JobState = "accepted"         // Job is accepted
	JobStateRunning         JobState = "running"          // Job is currently running
	JobStateCanceling       JobState = "canceling"        // Job is being canceled
	JobStateTimingOut       JobState = "timing_out"       // Job is timing out
)

Non-terminal job states - jobs in these states may still change

type JobStatus

type JobStatus struct {
	ID         string     `json:"id"`
	State      JobState   `json:"state"`
	IsTerminal bool       `json:"is_terminal"`
	WebURL     string     `json:"web_url,omitempty"`
	ExitStatus *int       `json:"exit_status,omitempty"`
	FinishedAt *time.Time `json:"finished_at,omitempty"`
}

JobStatus contains information about a Buildkite job's current status

func (*JobStatus) ShouldRefreshCache

func (js *JobStatus) ShouldRefreshCache(cacheTime time.Time, ttl time.Duration) bool

ShouldRefreshCache determines if a cached entry should be refreshed based on job status and TTL

type JobStatusProvider

type JobStatusProvider interface {
	GetJobStatus(ctx context.Context, org, pipeline, build, job string) (*JobStatus, error)
}

JobStatusProvider defines the interface for getting job status

type JobStatusResult

type JobStatusResult struct {
	BaseResult
	JobStatus *JobStatus
}

JobStatusResult contains the result of fetching job status

type LocalCacheResult

type LocalCacheResult struct {
	BaseResult
	LocalPath string
	FileSize  int64
}

LocalCacheResult contains the result of creating local cache file

type LogDownloadResult

type LogDownloadResult struct {
	BaseResult
	LogSize int64 // Size of downloaded logs in bytes
}

LogDownloadResult contains the result of downloading logs from API

type LogEntry

type LogEntry struct {
	Timestamp time.Time
	Content   string // Parsed content after OSC processing, may still contain ANSI codes
	RawLine   []byte // Original line bytes including all OSC sequences and formatting
	Group     string // The current section/group this entry belongs to
}

LogEntry represents a parsed Buildkite log entry

func (*LogEntry) ComputeFlags

func (entry *LogEntry) ComputeFlags() LogFlags

ComputeFlags returns the consolidated flags for this log entry

func (*LogEntry) HasTimestamp

func (entry *LogEntry) HasTimestamp() bool

HasTimestamp returns true if the log entry has a valid timestamp

func (*LogEntry) IsGroup

func (entry *LogEntry) IsGroup() bool

IsGroup returns true if the log entry appears to be a group header

func (*LogEntry) IsSection deprecated

func (entry *LogEntry) IsSection() bool

Deprecated: IsSection is an alias for IsGroup. Use IsGroup instead.

type LogFlag

type LogFlag int32
const (
	HasTimestamp LogFlag = iota
	IsGroup
)

type LogFlags

type LogFlags int32

LogFlags represents a bitwise combination of log flags

func (*LogFlags) Clear

func (lf *LogFlags) Clear(flag LogFlag)

Clear clears the specified flag

func (LogFlags) Has

func (lf LogFlags) Has(flag LogFlag) bool

Has returns true if the specified flag is set

func (LogFlags) HasTimestamp

func (lf LogFlags) HasTimestamp() bool

HasTimestamp returns true if HasTimestamp flag is set

func (LogFlags) IsGroup

func (lf LogFlags) IsGroup() bool

IsGroup returns true if IsGroup flag is set

func (*LogFlags) Set

func (lf *LogFlags) Set(flag LogFlag)

Set sets the specified flag

func (*LogFlags) Toggle

func (lf *LogFlags) Toggle(flag LogFlag)

Toggle toggles the specified flag

type LogIterator deprecated

type LogIterator struct {
	// contains filtered or unexported fields
}

LogIterator provides an iterator interface for processing log entries.

Deprecated: Use Parser.All() which returns an iter.Seq2 instead.

func (*LogIterator) Entry

func (li *LogIterator) Entry() *LogEntry

Entry returns the current log entry Only valid after a successful call to Next()

func (*LogIterator) Err

func (li *LogIterator) Err() error

Err returns any error encountered during iteration

func (*LogIterator) Next

func (li *LogIterator) Next() bool

Next advances the iterator to the next log entry Returns true if there is a next entry, false if EOF or error

type LogParsingResult

type LogParsingResult struct {
	BaseResult
	ParquetSize int64 // Size of generated Parquet data in bytes
	LogEntries  int   // Number of log entries processed
}

LogParsingResult contains the result of parsing logs to Parquet

type LogProvider

type LogProvider interface {
	GetJobLog(ctx context.Context, org, pipeline, build, job string) (io.ReadCloser, error)
}

LogProvider defines the interface for getting job logs

type ParquetFileInfo

type ParquetFileInfo struct {
	RowCount     int64 `json:"row_count"`
	ColumnCount  int   `json:"column_count"`
	FileSize     int64 `json:"file_size_bytes"`
	NumRowGroups int   `json:"num_row_groups"`
}

ParquetFileInfo contains metadata about a Parquet file

type ParquetLogEntry

type ParquetLogEntry struct {
	RowNumber int64    `json:"row_number"` // 0-based row position in the Parquet file
	Timestamp int64    `json:"timestamp"`
	Content   string   `json:"content"`
	Group     string   `json:"group"`
	Flags     LogFlags `json:"flags"`
}

ParquetLogEntry represents a log entry read from a Parquet file

func (*ParquetLogEntry) CleanContent

func (entry *ParquetLogEntry) CleanContent(stripANSI bool) string

CleanContent returns the content with optional ANSI stripping and whitespace trimming

func (*ParquetLogEntry) CleanGroup

func (entry *ParquetLogEntry) CleanGroup(stripANSI bool) string

CleanGroup returns the group name with optional ANSI stripping and whitespace trimming

func (*ParquetLogEntry) HasTime

func (entry *ParquetLogEntry) HasTime() bool

HasTime returns true if the entry has a timestamp (backward compatibility)

func (*ParquetLogEntry) IsGroup

func (entry *ParquetLogEntry) IsGroup() bool

IsGroup returns true if the entry is a group header (backward compatibility)

type ParquetReader

type ParquetReader struct {
	// contains filtered or unexported fields
}

ParquetReader provides functionality to read and query Parquet log files

func NewParquetReader

func NewParquetReader(filename string) *ParquetReader

NewParquetReader creates a new ParquetReader for the specified file

func (*ParquetReader) FilterByGroupIter

func (pr *ParquetReader) FilterByGroupIter(groupPattern string) iter.Seq2[ParquetLogEntry, error]

FilterByGroupIter returns an iterator over entries that belong to groups matching the specified name pattern

func (*ParquetReader) GetFileInfo

func (pr *ParquetReader) GetFileInfo() (*ParquetFileInfo, error)

GetFileInfo returns metadata about the Parquet file

func (*ParquetReader) ReadEntriesIter

func (pr *ParquetReader) ReadEntriesIter() iter.Seq2[ParquetLogEntry, error]

ReadEntriesIter returns an iterator over log entries from the Parquet file

func (*ParquetReader) SearchEntriesIter

func (pr *ParquetReader) SearchEntriesIter(options SearchOptions) iter.Seq2[SearchResult, error]

SearchEntriesIter returns an iterator over search results with context

func (*ParquetReader) SeekToRow

func (pr *ParquetReader) SeekToRow(startRow int64) iter.Seq2[ParquetLogEntry, error]

SeekToRow returns an iterator starting from the specified row number (0-based)

type ParquetWriter

type ParquetWriter struct {
	// contains filtered or unexported fields
}

ParquetWriter provides streaming Parquet writing capabilities

func NewParquetWriter

func NewParquetWriter(file *os.File) (*ParquetWriter, error)

NewParquetWriter creates a new Parquet writer for streaming

func (*ParquetWriter) Close

func (pw *ParquetWriter) Close() error

Close closes the Parquet writer

func (*ParquetWriter) WriteBatch

func (pw *ParquetWriter) WriteBatch(entries []*LogEntry) error

WriteBatch writes a batch of log entries to the Parquet file

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser handles parsing of Buildkite log files

func NewParser

func NewParser() *Parser

NewParser creates a new Buildkite log parser

func (*Parser) All

func (p *Parser) All(reader io.Reader) iter.Seq2[*LogEntry, error]

All returns an iterator over all log entries using Go 1.23+ iter.Seq2 pattern Each iteration yields a *LogEntry and an error, following Go's idiomatic error handling This method creates isolated parser state to prevent contamination between iterations

func (*Parser) NewIterator deprecated

func (p *Parser) NewIterator(reader io.Reader) *LogIterator

NewIterator creates a new LogIterator for memory-efficient processing.

Deprecated: Use Parser.All() which returns an iter.Seq2 instead.

func (*Parser) ParseLine

func (p *Parser) ParseLine(line string) (*LogEntry, error)

ParseLine parses a single log line

func (*Parser) Reset deprecated

func (p *Parser) Reset()

Reset clears the parser's internal state, useful for reusing the parser for multiple independent parsing operations.

Deprecated: State isolation is now handled internally by All() and LogIterator. This method will be removed in a future major version.

type QueryResult

type QueryResult struct {
	Groups  []GroupInfo       `json:"groups,omitempty"`
	Entries []ParquetLogEntry `json:"entries,omitempty"`
	Stats   QueryStats        `json:"stats,omitempty"`
}

QueryResult holds the results of a query operation

type QueryStats

type QueryStats struct {
	TotalEntries   int     `json:"total_entries"`
	MatchedEntries int     `json:"matched_entries"`
	TotalGroups    int     `json:"total_groups"`
	QueryTime      float64 `json:"query_time_ms"`
}

QueryStats contains performance and result statistics for queries

type SearchOptions

type SearchOptions struct {
	Pattern       string // Regex pattern to search for
	CaseSensitive bool   // Enable case-sensitive matching
	InvertMatch   bool   // Show non-matching lines
	BeforeContext int    // Lines to show before match
	AfterContext  int    // Lines to show after match
	Context       int    // Lines to show before and after (overrides BeforeContext/AfterContext)
	Reverse       bool   // Search backwards from end/seek position
	SeekStart     int64  // Start search from this row (useful with Reverse)
}

SearchOptions configures regex search behavior

type SearchResult

type SearchResult struct {
	Match         ParquetLogEntry   `json:"match"`
	BeforeContext []ParquetLogEntry `json:"before_context,omitempty"`
	AfterContext  []ParquetLogEntry `json:"after_context,omitempty"`
}

SearchResult represents a match with context lines

Directories

Path Synopsis
cmd
bklog command
examples
query command
smart-cache command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL