stringy

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 17, 2025 License: MIT Imports: 3 Imported by: 0

README

stringy

Unicode string encoding handling for Go. Provides helpers for working with UTF-8, UTF-16, and UTF-32 string representations, including code unit and code point counting.

Why does this library exist?

Prior to Go 1.23, there was no built-in way to count UTF-16 code units in a string. This is important for interoperability with systems and protocols (such as JavaScript, Windows APIs, and some network protocols) that use UTF-16 as their string encoding. Go's built-in string type is UTF-8, and while it is easy to work with runes (Unicode code points), handling UTF-16 code units correctly is non-trivial, especially for characters outside the Basic Multilingual Plane (BMP) that require surrogate pairs.

This library provides types and helpers that "just work" for handling UTF-16, UTF-32, and UTF-8 string representations, including correct code unit and code point counting, and conversion between representations.

How it works

  • Utf16 is a slice of uint16 representing a UTF-16 encoded string. It implements methods for appending, writing, and converting to/from Go strings, as well as counting code units and code points.
  • Utf32 is a slice of rune (Unicode code points), with similar methods for manipulation and conversion.
  • Utf8 is a wrapper around Go's built-in string type, with helpers for code unit and code point counting.

All types implement the String interface, which provides methods for getting the string representation, code unit count, and code point count.

Example usage

import "github.com/R167/stringy"

var s stringy.Utf16
s.WriteString("hello 𝄞 world")
fmt.Println(s.String()) // prints: hello 𝄞 world
fmt.Println(s.CodeUnits()) // prints: number of UTF-16 code units
fmt.Println(s.CodePoints()) // prints: number of Unicode code points (runes)

Documentation

Overview

stringy provides utilities for working with UTF-8, UTF-16, and UTF-32 encoded strings.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Utf16CodeUnits

func Utf16CodeUnits(str string) int

Utf16CodeUnits returns the number of code units in the UTF-16 encoding of the string.

Types

type String

type String interface {
	// String returns the string representation of the string.
	String() string
	// CodeUnits returns the number of code units in the string.
	CodeUnits() int
	// CodePoints returns the number of code points in the string.
	// This is the number of runes in the string and what you would use for character-based operations.
	CodePoints() int
}

type Utf16

type Utf16 []uint16

Utf16 is a slice of uint16 values that represents a UTF-16 encoded string.

*Utf16 implements the io.StringWriter and io.Writer interfaces.

func NewUtf16

func NewUtf16(s string) Utf16

func (*Utf16) AppendRunes

func (s *Utf16) AppendRunes(r []rune)

AppendRunes appends a rune slice to the Utf16 slice.

func (Utf16) CodePoints

func (s Utf16) CodePoints() int

CodePoints returns the number of code points in the Utf16 slice.

func (Utf16) CodeUnits

func (s Utf16) CodeUnits() int

CodeUnits returns the number of code units in the Utf16 slice.

func (Utf16) String

func (s Utf16) String() string

String returns the string representation of the Utf16 slice.

func (*Utf16) Write

func (s *Utf16) Write(p []byte) (n int, err error)

Write writes a byte slice to the Utf16 slice. The byte slice is always assumed to be a valid UTF-8 encoded string. It always returns len(p) and a nil error.

func (*Utf16) WriteString

func (s *Utf16) WriteString(str string) (n int, err error)

WriteString writes a string to the Utf16 slice. It always returns len(str) and a nil error.

type Utf32

type Utf32 []rune

Utf32 is a slice of runes that represents a UTF-32 encoded string.

*Utf32 implements the io.StringWriter and io.Writer interfaces.

func NewUtf32

func NewUtf32(s string) Utf32

NewUtf32 creates a new Utf32 instance from a string.

func (*Utf32) AppendRunes

func (s *Utf32) AppendRunes(r []rune)

AppendRunes appends a rune slice to the Utf32 slice.

func (Utf32) CodePoints

func (s Utf32) CodePoints() int

CodePoints implements String.

func (Utf32) CodeUnits

func (s Utf32) CodeUnits() int

CodeUnits returns the number of code units in the Utf32 slice.

func (Utf32) String

func (s Utf32) String() string

String returns the string representation of the Utf32 slice.

func (*Utf32) Write

func (s *Utf32) Write(p []byte) (n int, err error)

Write writes a byte slice to the Utf32 slice. The byte slice is always assumed to be a valid UTF-8 encoded string. It always returns len(p) and a nil error.

func (*Utf32) WriteString

func (s *Utf32) WriteString(str string) (n int, err error)

WriteString writes a string to the Utf32 slice. It always returns len(str) and a nil error.

type Utf8

type Utf8 string

Utf8 is a string type that supports UTF-8 encoding.

It is implemented as a string since Go's built-in string type is already UTF-8 encoded. Because of this, Utf8 does not implement the same interfaces for updating as the other types.

func NewUtf8

func NewUtf8(s string) Utf8

NewUtf8 creates a new Utf8 wrapper from a string.

func (Utf8) CodePoints

func (s Utf8) CodePoints() int

CodePoints returns the unicode charachter length of this string.

func (Utf8) CodeUnits

func (s Utf8) CodeUnits() int

CodeUnits returns the number of code units in the Utf8 string. For utf8, this is the number of bytes.

func (Utf8) String

func (s Utf8) String() string

String returns the string underlying string from the Utf8 wrapper.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL