Reference page
ALTREP in miniextendr
ALTREP (Alternative Representations) is R's system for creating custom vector implementations. miniextendr provides a powerful, safe abstraction for creating ALTREP vectors from Rust.
ALTREP (Alternative Representations) is Rβs system for creating custom vector implementations. miniextendr provides a powerful, safe abstraction for creating ALTREP vectors from Rust.
Additional Resources:
- Quick Reference - One-page cheat sheet
- Receiving ALTREP from R - How
SEXPandAltrepSexpparameters handle ALTREP input - Practical Examples - Real-world use cases
- Test Suite - Working examples
πWhat is ALTREP?
ALTREP allows you to create R vectors with custom internal representations. Instead of storing data in Rβs native format, you can:
- Compute elements on demand (lazy sequences)
- Reference external data without copying (zero-copy views)
- Use compact representations (constant vectors, arithmetic sequences)
- Provide optimized operations (O(1) sum for arithmetic sequences)
πQuick Start
Hereβs a minimal ALTREP example - a constant integer vector using the field-based derive (simplest approach):
use miniextendr_api::{miniextendr, ffi::SEXP, IntoR};
// 1. Define your data type with derive β generates everything
#[derive(miniextendr_api::AltrepInteger)]
#[altrep(len = "len", elt = "value", class = "ConstantInt")]
pub struct ConstantIntData {
value: i32,
len: usize,
}
// 2. Export a constructor
#[miniextendr]
pub fn constant_int(value: i32, n: i32) -> SEXP {
let data = ConstantIntData { value, len: n as usize };
data.into_sexp()
}
Usage in R:
x <- constant_int(42L, 1000000L) # Creates 1M-element vector using O(1) memory
x[1] # 42
x[500] # 42
sum(x) # 42000000 (uses default R sum)
πChoosing ALTREP vs Regular Conversion
miniextendr offers two conversion paths for Rust data:
πRegular Conversion (IntoR) - Copy to R
#[miniextendr]
fn get_data() -> Vec<i32> {
vec![1, 2, 3, 4, 5]
}
// Or explicitly: vec.into_sexp()
Behavior:
- Data is copied to Rβs heap
- Original Vec is dropped
- R owns a regular integer vector (INTSXP)
- O(n) memory copy, O(n) memory allocation
Best for:
- Small data (<1000 elements)
- Data R will modify
- Temporary results
- When simplicity matters
πALTREP Conversion (IntoRAltrep) - Zero-Copy
use miniextendr_api::IntoRAltrep;
#[miniextendr]
fn get_data() -> SEXP {
let vec = vec![1, 2, 3, 4, 5];
vec.into_sexp_altrep()
}
// Or: Altrep(vec).into_sexp()
Behavior:
- Data stays in Rust (ExternalPtr wrapper)
- No copying, no duplication
- R accesses via ALTREP callbacks
- O(1) creation, ~10ns per element overhead
Best for:
- Large vectors (>1000 elements)
- Lazy evaluation (compute on access)
- External data (files, APIs, databases)
- Zero-copy requirements
πPerformance Comparison (Measured)
Pure Creation (No Access):
| Size | Copy | ALTREP | Speedup |
|---|---|---|---|
| 100 | 0.33 ms | 0.42 ms | 0.8x (copy faster) |
| 1,000 | 0.43 ms | 0.50 ms | 0.9x (similar) |
| 100,000 | 0.44 ms | 0.42 ms | 1.0x (similar) |
| 1,000,000 | 0.44 ms | 0.20 ms | 2.2x faster |
| 10,000,000 | 4.16 ms | 1.90 ms | 2.2x faster |
Partial Access (Create 1M, Access First 10):
| Size | Copy | ALTREP | Speedup |
|---|---|---|---|
| 10,000 | 0.02 ms | 0.02 ms | 1.0x |
| 100,000 | 0.06 ms | 0.02 ms | 3.0x faster |
| 1,000,000 | 0.42 ms | 0.20 ms | 2.1x faster |
| 10,000,000 | 4.28 ms | 0.08 ms | 53.5x faster |
Memory:
- Copy (1M elements): R heap +3.8 MB
- ALTREP (1M elements): R heap +0.0 MB (data in Rust heap)
Benchmarks run on Apple M-series, R 4.5. Your results may vary.
πDecision Guide
Is your data > 1000 elements?
ββ Yes β Use .into_sexp_altrep()
ββ No
ββ Will R modify it?
ββ Yes β Use .into_sexp() (copy)
ββ No β Either works, .into_sexp() is simplerπExamples
use miniextendr_api::{miniextendr, IntoRAltrep, ffi::SEXP};
// Small data - copy is fine
#[miniextendr]
fn get_config() -> Vec<i32> {
vec![1, 2, 3] // Automatically copies via IntoR
}
// Large data - use ALTREP
#[miniextendr]
fn get_large_data() -> SEXP {
let data = vec![0; 1_000_000];
data.into_sexp_altrep() // Zero-copy!
}
// Lazy computation - definitely ALTREP
#[miniextendr]
fn fibonacci_seq(n: i32) -> SEXP {
(0..n as usize)
.map(|i| fibonacci(i))
.collect::<Vec<i32>>()
.into_sexp_altrep()
}
// Range - already lazy, use ALTREP
#[miniextendr]
fn int_range(from: i32, to: i32) -> SEXP {
(from..to)
.collect::<Vec<_>>()
.into_sexp_altrep()
}πMigration from Altrep(...) to .into_sexp_altrep()
Both forms are equivalent and compile to identical code:
// Old style (still works!)
return Altrep(vec).into_sexp();
// New style (more explicit)
return vec.into_sexp_altrep();
// Both are valid - use whichever is clearer
πArchitecture Overview
miniextendrβs ALTREP system uses a single-struct pattern with two derive paths:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Path A: Field-based derive (simplest) β
β #[derive(AltrepInteger)] + #[altrep(len, elt, class)] β
β Generates EVERYTHING: traits, registration, IntoR, Ref/Mut β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Path B: Manual traits + registration derive β
β #[derive(Altrep)] + #[altrep(class = "...")] β
β Generates registration (TypedExternal, RegisterAltrep, IntoR) β
β YOU implement AltrepLen, Alt*Data, and call impl_alt*_from_dataβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β High-Level Data Traits (you implement, or derive generates) β
β AltrepLen, AltIntegerData, AltRealData, etc. β
β Safe, idiomatic Rust - no raw SEXP handling β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Low-Level Traits (auto-generated by impl_alt*_from_data!) β
β Implements Altrep, AltVec, AltInteger traits β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
No wrapper struct is needed. The data struct registers directly with R.
πHigh-Level Data Traits
πCore Trait: AltrepLen
Every ALTREP type must implement AltrepLen:
impl AltrepLen for MyData {
fn len(&self) -> usize {
// Return the vector length
}
}πType-Specific Traits
| R Vector Type | Rust Trait | Required Method |
|---|---|---|
integer | AltIntegerData | fn elt(&self, i: usize) -> i32 |
numeric | AltRealData | fn elt(&self, i: usize) -> f64 |
logical | AltLogicalData | fn elt(&self, i: usize) -> Logical |
raw | AltRawData | fn elt(&self, i: usize) -> u8 |
character | AltStringData | fn elt(&self, i: usize) -> Option<&str> |
complex | AltComplexData | fn elt(&self, i: usize) -> Rcomplex |
list | AltListData | fn elt(&self, i: usize) -> SEXP |
πOptional Methods
Each trait provides optional methods you can override:
impl AltIntegerData for MyData {
fn elt(&self, i: usize) -> i32 {
// Required: element access
}
fn no_na(&self) -> Option<bool> {
// Optional: NA hint (enables optimizations)
Some(true) // No NAs in this vector
}
fn is_sorted(&self) -> Option<Sortedness> {
// Optional: sortedness hint
Some(Sortedness::Increasing)
}
fn sum(&self, na_rm: bool) -> Option<i64> {
// Optional: O(1) sum (instead of element-by-element)
Some(self.formula_based_sum())
}
fn min(&self, na_rm: bool) -> Option<i32> {
// Optional: O(1) min
Some(self.known_minimum())
}
fn max(&self, na_rm: bool) -> Option<i32> {
// Optional: O(1) max
Some(self.known_maximum())
}
fn get_region(&self, start: usize, len: usize, buf: &mut [i32]) -> usize {
// Optional: bulk element access (can be more efficient)
// Default uses elt() in a loop
}
}
πExample: Arithmetic Sequence
A lazy arithmetic sequence that computes elements on demand. This uses the manual traits pattern (#[derive(Altrep)]) since we need custom trait implementations with optimization hints:
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "ArithSeq")]
pub struct ArithSeqData {
start: f64,
step: f64,
len: usize,
}
impl AltrepLen for ArithSeqData {
fn len(&self) -> usize {
self.len
}
}
impl AltRealData for ArithSeqData {
fn elt(&self, i: usize) -> f64 {
self.start + (i as f64) * self.step
}
fn no_na(&self) -> Option<bool> {
Some(true) // Arithmetic sequences never produce NA
}
fn is_sorted(&self) -> Option<Sortedness> {
if self.step < 0.0 {
Some(Sortedness::Decreasing)
} else {
Some(Sortedness::Increasing)
}
}
fn sum(&self, _na_rm: bool) -> Option<f64> {
// O(1) sum using arithmetic series formula: n*(first+last)/2
let last = self.start + (self.len - 1) as f64 * self.step;
Some(self.len as f64 * (self.start + last) / 2.0)
}
}
miniextendr_api::impl_altreal_from_data!(ArithSeqData);
#[miniextendr]
pub fn arith_seq(from: f64, to: f64, length_out: i32) -> SEXP {
let len = length_out as usize;
let step = if len > 1 { (to - from) / (len - 1) as f64 } else { 0.0 };
ArithSeqData { start: from, step, len }.into_sexp()
}
πLazy Materialization
For cases where you want lazy computation but also need to support DATAPTR:
use miniextendr_api::altrep_data::AltrepDataptr;
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "LazyIntSeq")]
pub struct LazyIntSeqData {
start: i32,
step: i32,
len: usize,
materialized: Option<Vec<i32>>, // Lazily allocated
}
impl AltrepLen for LazyIntSeqData {
fn len(&self) -> usize { self.len }
}
impl AltIntegerData for LazyIntSeqData {
fn elt(&self, i: usize) -> i32 {
// Compute on-the-fly (no materialization needed)
self.start.saturating_add((i as i32).saturating_mul(self.step))
}
}
impl AltrepDataptr<i32> for LazyIntSeqData {
fn dataptr(&mut self, _writable: bool) -> Option<*mut i32> {
// Materialize on first DATAPTR access
if self.materialized.is_none() {
let data: Vec<i32> = (0..self.len)
.map(|i| self.start.saturating_add((i as i32).saturating_mul(self.step)))
.collect();
self.materialized = Some(data);
}
self.materialized.as_mut().map(|v| v.as_mut_ptr())
}
fn dataptr_or_null(&self) -> Option<*const i32> {
// Return pointer only if already materialized
// Returning None tells R to use elt() instead
self.materialized.as_ref().map(|v| v.as_ptr())
}
}
// Enable dataptr support in macro
miniextendr_api::impl_altinteger_from_data!(LazyIntSeqData, dataptr);
Key behaviors:
elt()always works, no allocation neededdataptr_or_null()returnsNoneuntil materializeddataptr()allocates on first call, caches result- Operations like
x + ytriggerdataptr(), causing materialization
πSerialization Support
To make ALTREP objects serializable (for saveRDS/readRDS):
use miniextendr_api::altrep_data::AltrepSerialize;
impl AltrepSerialize for LazyIntSeqData {
fn serialized_state(&self) -> SEXP {
// Return a serializable representation (typically a simple vector)
unsafe {
use miniextendr_api::ffi::{Rf_allocVector, SET_INTEGER_ELT, SEXPTYPE};
let state = Rf_allocVector(SEXPTYPE::INTSXP, 3);
SET_INTEGER_ELT(state, 0, self.start);
SET_INTEGER_ELT(state, 1, self.step);
SET_INTEGER_ELT(state, 2, self.len as i32);
state
}
}
fn unserialize(state: SEXP) -> Option<Self> {
unsafe {
use miniextendr_api::ffi::INTEGER_ELT;
Some(LazyIntSeqData {
start: INTEGER_ELT(state, 0),
step: INTEGER_ELT(state, 1),
len: INTEGER_ELT(state, 2) as usize,
materialized: None, // Fresh - not materialized
})
}
}
}
// Enable serialization in macro
miniextendr_api::impl_altinteger_from_data!(LazyIntSeqData, dataptr, serialize);πClass Registration and Cross-Session readRDS
For saveRDS/readRDS to work across R sessions, R must be able to find the
ALTREP class by name when unserializing. This requires two things:
π1. DllInfo β associating the class with a package
R_make_alt*_class(class_name, pkg_name, dll_info) takes a DllInfo* that
tells R which package owns the class. The serialized stream stores the class
name and package name. On readRDS, R looks up the class by
(class_name, pkg_name) β this lookup requires the DllInfo to have been
provided at registration time.
miniextendr stores the DllInfo from package_init in a global and passes it
to all R_make_alt*_class calls:
// In init.rs -- during R_init_<pkg>:
crate::set_altrep_dll_info(dll);
// In registration code -- when creating the class:
let dll = $crate::altrep_dll_info();
let cls = R_make_altreal_class(class_name, pkg_name, dll);
Without DllInfo (NULL), R canβt find the class during deserialization, even
if itβs registered. This was a bug β all classes were registered with NULL.
π2. Eager registration β classes must exist before readRDS runs
ALTREP classes are registered in two ways:
Derive-generated classes (user #[derive(Altrep)] / #[derive(AltrepInteger)] structs)
register via linkmeβs #[distributed_slice]. Each ALTREP struct emits an entry
thatβs called during R_init:
// Generated by proc-macro:
#[distributed_slice(MX_ALTREP_REGISTRATIONS)]
fn register_my_class() {
MyType::get_or_init_class();
}
// Called during R_init:
for reg_fn in MX_ALTREP_REGISTRATIONS.iter() {
reg_fn();
}
Built-in classes (Vec<f64>, Box<[i32]>, Arrow arrays, etc.) use
OnceLock inside RegisterAltrep::get_or_init_class(). These are lazy β
the class is created on first use (e.g., when .into_sexp_altrep() is
called). This is a problem for readRDS: R tries to find the class during
deserialization, before any miniextendr code has called into_sexp_altrep.
The fix: register_builtin_altrep_classes() is called during R_init and
eagerly calls get_or_init_class() for every built-in type:
// In registry.rs β during R_init:
register_builtin_altrep_classes(); // Vec, Box, Range
#[cfg(feature = "arrow")]
register_arrow_altrep_classes(); // Float64Array, Int32Array, etc.pub(crate) fn register_builtin_altrep_classes() {
use crate::altrep::RegisterAltrep;
Vec::<i32>::get_or_init_class();
Vec::<f64>::get_or_init_class();
Vec::<bool>::get_or_init_class();
Vec::<u8>::get_or_init_class();
Vec::<String>::get_or_init_class();
Vec::<Option<String>>::get_or_init_class();
// ... all built-in types
}πWhat happens during readRDS
Session A: saveRDS(altrep_vec, "data.rds")
β ALTREP serialize hook fires
β serialized_state() materializes data to plain R vector
β Stream contains: class_name="miniextendr_Vec_f64", pkg_name="miniextendr", state=<REALSXP>
Session B: library(miniextendr); readRDS("data.rds")
β R_init_miniextendr runs β registers all ALTREP classes (with DllInfo)
β readRDS parses stream β finds class "miniextendr_Vec_f64" in package "miniextendr"
β R calls unserialize(class, state) β reconstructs Vec<f64> from the REALSXP
β Returns a live ALTREP vector backed by Rust data
Session C: readRDS("data.rds") # WITHOUT library(miniextendr)
β ALTREP class not registered β R falls back to the serialized state
β Returns a plain R numeric vector (the materialized data)
β Works correctly β just not an ALTREP anymoreπAdding serialization to new types
When you add a new impl_alt*_from_data! with serialize:
- Implement
AltrepSerializefor the type - Add the
serializeoption:impl_altreal_from_data!(MyType, serialize); - If itβs a built-in type (in miniextendr-api, not user code), add it to
register_builtin_altrep_classes()so itβs eagerly registered at init
User types donβt need step 3 β the proc-macro generates #[distributed_slice]
entries automatically.
πMutable Vectors (Set_elt)
String and List vectors can be made mutable by implementing the set_elt() method. This allows R code to modify elements in-place.
Important: Only String and List types support set_elt. Numeric vectors (Integer, Real, Logical, Raw, Complex) cannot be mutated through ALTREP.
πMutable String Vectors
use miniextendr_api::altrep_data::{AltrepLen, AltStringData};
use miniextendr_api::ffi::SEXP;
use std::cell::RefCell;
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "MutableString")]
pub struct MutableStringData {
strings: RefCell<Vec<Option<String>>>,
}
impl AltrepLen for MutableStringData {
fn len(&self) -> usize {
self.strings.borrow().len()
}
}
impl AltStringData for MutableStringData {
fn elt(&self, i: usize) -> Option<&str> {
// SAFETY: This is unsafe - we're returning a reference into RefCell
// In practice, you'd need to use a different strategy (e.g., cache in thread-local)
// or return owned String and convert to SEXP
unsafe {
let ptr = self.strings.as_ptr();
(*ptr).get(i).and_then(|s| s.as_deref())
}
}
// Enable mutation
fn set_elt(&mut self, i: usize, value: Option<&str>) {
if let Some(s) = self.strings.get_mut().get_mut(i) {
*s = value.map(|v| v.to_string());
}
}
}
miniextendr_api::impl_altstring_from_data!(MutableStringData, set_elt);
Note: The above example shows the concept but has lifetime issues. For production use, consider:
- Storing SEXPs directly instead of Rust strings
- Using thread-local storage for temporary string references
- Materializing to a regular R vector when mutations occur
πMutable List Vectors
Lists are easier to make mutable since they already store SEXPs:
use miniextendr_api::altrep_data::{AltrepLen, AltListData};
use miniextendr_api::ffi::SEXP;
use std::cell::RefCell;
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "MutableList")]
pub struct MutableListData {
// SEXPs need to be protected from GC
elements: RefCell<Vec<SEXP>>,
}
impl AltrepLen for MutableListData {
fn len(&self) -> usize {
self.elements.borrow().len()
}
}
impl AltListData for MutableListData {
fn elt(&self, i: usize) -> SEXP {
self.elements.borrow()[i]
}
fn set_elt(&mut self, i: usize, value: SEXP) {
self.elements.borrow_mut()[i] = value;
}
}
miniextendr_api::impl_altlist_from_data!(MutableListData, set_elt);πSafety Considerations
1. Rβs Copy-on-Write: R may copy your vector before calling set_elt, so mutations may not affect the original vector reference.
2. GC Protection: When storing SEXPs in mutable lists:
- SEXPs in the ALTREP data slot are automatically protected
- If you create new SEXPs, ensure theyβre returned to R immediately
- Donβt store raw SEXP pointers that outlive their protection
3. Thread Safety:
- ALTREP callbacks run on Rβs main thread
- Use
RefCell(notMutex) for interior mutability - No async/threading allowed inside ALTREP methods
4. Materialization:
- R may materialize (copy to regular vector) when it needs a
dataptr - After materialization, mutations go to the copy, not your ALTREP
πWhen to Use Mutable ALTREP
Good use cases:
- Lazy evaluation with caching
- Proxying to external mutable data sources
- Implementing special data structures (e.g., sparse vectors)
Avoid for:
- Regular data storage (use
Vec<T>instead) - Situations where you need
dataptr(forces materialization) - Performance-critical code (mutations have overhead)
πStandard Type Support
miniextendr provides built-in ALTREP support for common Rust types via .into_sexp_altrep():
πVec<T> (Owned Data)
#[miniextendr]
pub fn simple_vec_int(values: Vec<i32>) -> SEXP {
values.into_sexp_altrep()
}πBox<[T]> (Immutable Owned Slice)
#[miniextendr]
pub fn boxed_ints(n: i32) -> SEXP {
let data: Box<[i32]> = (1..=n).collect();
data.into_sexp_altrep()
}πStatic Slices (&'static [T])
static MY_DATA: &[i32] = &[10, 20, 30, 40, 50];
#[miniextendr]
pub fn static_ints() -> SEXP {
MY_DATA.into_sexp_altrep()
}
Note: Static ALTREPs are read-only and cannot support writable DATAPTR.
πComplex Numbers
use miniextendr_api::ffi::Rcomplex;
use miniextendr_api::altrep_data::AltComplexData;
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "UnitCircle")]
pub struct UnitCircleData {
n: usize, // Number of points on unit circle
}
impl AltrepLen for UnitCircleData {
fn len(&self) -> usize { self.n }
}
impl AltComplexData for UnitCircleData {
fn elt(&self, i: usize) -> Rcomplex {
let theta = 2.0 * std::f64::consts::PI * (i as f64) / (self.n as f64);
Rcomplex { r: theta.cos(), i: theta.sin() }
}
}
miniextendr_api::impl_altcomplex_from_data!(UnitCircleData);
#[miniextendr]
pub fn unit_circle(n: i32) -> SEXP {
UnitCircleData { n: n as usize }.into_sexp()
}
πLogical Vectors
Use the Logical enum for proper NA handling:
use miniextendr_api::altrep_data::{AltLogicalData, Logical};
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "LogicalVec")]
pub struct LogicalVecData {
data: Vec<Logical>,
}
impl AltrepLen for LogicalVecData {
fn len(&self) -> usize { self.data.len() }
}
impl AltLogicalData for LogicalVecData {
fn elt(&self, i: usize) -> Logical {
self.data[i]
}
fn no_na(&self) -> Option<bool> {
Some(!self.data.iter().any(|v| matches!(v, Logical::Na)))
}
fn sum(&self, na_rm: bool) -> Option<i64> {
// Sum = count of TRUE values
let mut total = 0i64;
for v in &self.data {
match v {
Logical::True => total += 1,
Logical::False => {}
Logical::Na => if !na_rm { return None; }
}
}
Some(total)
}
}
miniextendr_api::impl_altlogical_from_data!(LogicalVecData);
πString Vectors
String ALTREPs return Option<&str> where None represents NA:
use miniextendr_api::altrep_data::AltStringData;
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "StringVec")]
pub struct StringVecData {
data: Vec<Option<String>>,
}
impl AltrepLen for StringVecData {
fn len(&self) -> usize { self.data.len() }
}
impl AltStringData for StringVecData {
fn elt(&self, i: usize) -> Option<&str> {
self.data[i].as_deref() // None = NA
}
fn no_na(&self) -> Option<bool> {
Some(!self.data.iter().any(|v| v.is_none()))
}
}
miniextendr_api::impl_altstring_from_data!(StringVecData);
πRaw Vectors
use miniextendr_api::altrep_data::AltRawData;
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "RepeatingRaw")]
pub struct RepeatingRawData {
pattern: Vec<u8>,
total_len: usize,
}
impl AltrepLen for RepeatingRawData {
fn len(&self) -> usize { self.total_len }
}
impl AltRawData for RepeatingRawData {
fn elt(&self, i: usize) -> u8 {
self.pattern[i % self.pattern.len()]
}
}
miniextendr_api::impl_altraw_from_data!(RepeatingRawData);
πList Vectors
List vectors (Rβs list type / VECSXP) can contain any R objects. The AltListData trait allows you to create lists that compute or fetch elements on demand.
use miniextendr_api::altrep_data::{AltrepLen, AltListData};
use miniextendr_api::ffi::SEXP;
use miniextendr_api::{IntoR, Rf_ScalarInteger};
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "IntegerSequenceList")]
pub struct IntegerSequenceListData {
n: usize, // Number of elements in the list
}
impl AltrepLen for IntegerSequenceListData {
fn len(&self) -> usize {
self.n
}
}
impl AltListData for IntegerSequenceListData {
fn elt(&self, i: usize) -> SEXP {
// Each element is a scalar integer equal to its index
unsafe { Rf_ScalarInteger((i + 1) as i32) }
}
}
miniextendr_api::impl_altlist_from_data!(IntegerSequenceListData);
#[miniextendr]
pub fn int_seq_list(n: i32) -> SEXP {
let data = IntegerSequenceListData { n: n as usize };
data.into_sexp()
}
Usage in R:
lst <- int_seq_list(5L)
length(lst) # 5
lst[[1]] # 1L
lst[[3]] # 3L
lst[[5]] # 5LπList Safety Considerations
Important: List elements are SEXPs that must be properly protected from garbage collection. When implementing AltListData::elt():
- Return existing SEXPs: If you store SEXPs in your data structure, theyβre already protected by being in the ALTREP objectβs data slot
- Create new SEXPs: If you create SEXPs on-the-fly (like
Rf_ScalarInteger), R will protect them when theyβre added to the list - Avoid raw pointers: Donβt store raw SEXP pointers that might become invalid
πPractical List Examples
Example 1: Repeating Element
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "RepeatedList")]
pub struct RepeatedListData {
element: SEXP, // Stored in data1 slot (protected)
n: usize,
}
impl AltListData for RepeatedListData {
fn elt(&self, _i: usize) -> SEXP {
self.element // Same element for all indices
}
}
Example 2: List of Named Lists
impl AltListData for NamedListGenerator {
fn elt(&self, i: usize) -> SEXP {
// Create a named list for each element
let names = vec!["x", "y"];
let values = vec![
unsafe { Rf_ScalarInteger(i as i32) },
unsafe { Rf_ScalarReal(i as f64) },
];
// Use miniextendr's list builder
miniextendr_api::list::named_list(&names, &values).into_sexp()
}
}
πReference Types
When you need to pass an ALTREP back to Rust functions:
// The ALTREP derives generate these automatically:
// - ConstantIntDataRef: immutable reference to ALTREP data
// - ConstantIntDataMut: mutable reference to ALTREP data
#[miniextendr]
pub fn inspect_constant_int(x: ConstantIntDataRef) -> String {
format!("value={}, len={}", x.value, x.len)
}
#[miniextendr]
pub fn double_constant_int(mut x: ConstantIntDataMut) {
x.value *= 2;
}
πLow-Level Trait Macros
The impl_alt*_from_data! macros accept options:
// Basic (element access only)
miniextendr_api::impl_altinteger_from_data!(MyType);
// With dataptr support (enables DATAPTR method)
miniextendr_api::impl_altinteger_from_data!(MyType, dataptr);
// With serialization (enables saveRDS/readRDS)
miniextendr_api::impl_altinteger_from_data!(MyType, serialize);
// With subset optimization (enables optimized x[i] for index vectors)
miniextendr_api::impl_altinteger_from_data!(MyType, subset);
// Multiple options
miniextendr_api::impl_altinteger_from_data!(MyType, dataptr, serialize, subset);| Option | What it does | Requires |
|---|---|---|
dataptr | Enables DATAPTR method | impl AltrepDataptr<T> |
serialize | Enables serialization | impl AltrepSerialize |
subset | Enables optimized subsetting | impl AltrepSubset |
πSortedness and NA Hints
Providing hints enables R to optimize operations:
use miniextendr_api::altrep_data::Sortedness;
impl AltIntegerData for MyData {
fn is_sorted(&self) -> Option<Sortedness> {
match self.ordering {
Ordering::Ascending => Some(Sortedness::Increasing),
Ordering::Descending => Some(Sortedness::Decreasing),
Ordering::Unknown => None, // Don't know
}
}
fn no_na(&self) -> Option<bool> {
Some(true) // Enables R to skip NA checks
}
}
πSubsetting Optimization (Extract_subset)
The extract_subset() method allows you to optimize Rβs subsetting operations (x[indices]). Instead of R extracting elements one-by-one, you can return a new ALTREP object or optimized representation.
πWhen R Calls Extract_subset
R calls extract_subset(x, indices, call) when:
- User writes
x[c(1, 3, 5)]- integer vector indices - User writes
x[condition]- logical vector indices - Subsetting with names:
x[c("a", "b")]
Note: Single element access x[i] or x[[i]] uses elt(), not extract_subset().
πBasic Example: Range Subsetting
use miniextendr_api::altrep_traits::AltVec;
use miniextendr_api::ffi::{SEXP, R_xlen_t};
impl AltVec for RangeData {
const HAS_EXTRACT_SUBSET: bool = true;
fn extract_subset(x: SEXP, indices: SEXP, _call: SEXP) -> SEXP {
// Extract the RangeData from x
let data = unsafe { altrep_data1_as::<RangeData>(x) }.unwrap();
// For simple cases, return a new optimized Range
// Example: Range(1..100)[1..10] = Range(1..10)
// In practice, you'd:
// 1. Parse indices SEXP
// 2. Compute the subset
// 3. Return new ALTREP or regular vector
// Fallback to default R behavior for complex cases
std::ptr::null_mut() // R will use default elt-based extraction
}
}πPractical Example: Constant Vector Subset
For a constant vector, any subset is also constant:
impl AltVec for ConstantIntData {
const HAS_EXTRACT_SUBSET: bool = true;
fn extract_subset(x: SEXP, indices: SEXP, _call: SEXP) -> SEXP {
use miniextendr_api::ffi::{Rf_xlength, TYPEOF, SEXPTYPE};
let data = unsafe { altrep_data1_as::<ConstantIntData>(x) }?;
// Get length of indices
let n = unsafe { Rf_xlength(indices) };
// Return new constant vector with same value, different length
let subset = ConstantIntData {
value: data.value,
len: n as usize,
};
subset.into_sexp()
}
}πPerformance Benefits
O(1) Subset Creation:
x <- range_int_altrep(1L, 1000000L) # O(1) - no allocation
y <- x[1:100000] # O(1) - returns new Range(1, 100001)
Without extract_subset, R would:
- Allocate a 100,000-element vector
- Call
elt()100,000 times - Fill the new vector
With extract_subset:
- Return a new
Rangeobject (few bytes) - No element extraction
- Lazy evaluation continues
πWhen to Implement Extract_subset
Good candidates:
- β Mathematical sequences: Range, arithmetic sequences (subset is another sequence)
- β Constant vectors: Subset is constant with different length
- β Views/windows: Subset adjusts the window bounds
- β External data: Subset delegates to underlying data source
- β Sparse vectors: Subset maintains sparsity
Not worth it for:
- β Materialized data (Vec, Box): Rβs default is already efficient
- β Complex computations: Unless subset is much simpler than original
- β Small vectors: Overhead not worth the optimization
πHandling Different Index Types
fn extract_subset(x: SEXP, indices: SEXP, _call: SEXP) -> SEXP {
use miniextendr_api::ffi::{TYPEOF, SEXPTYPE};
unsafe {
match TYPEOF(indices) {
SEXPTYPE::INTSXP => {
// Integer indices: x[c(1L, 3L, 5L)]
// Extract and process integer vector
}
SEXPTYPE::REALSXP => {
// Numeric indices: x[c(1, 3, 5)]
// Convert to integers and process
}
SEXPTYPE::LGLSXP => {
// Logical indices: x[c(TRUE, FALSE, TRUE)]
// Find TRUE positions
}
SEXPTYPE::STRSXP => {
// Named indices: x[c("a", "b")]
// Match names (if your vector has names)
}
_ => {
// Unknown type - let R handle it
std::ptr::null_mut()
}
}
}
}πFallback Strategy
Always provide a fallback: Return NULL (null_mut()) to let R use default element-by-element extraction:
fn extract_subset(x: SEXP, indices: SEXP, _call: SEXP) -> SEXP {
// Try optimized path
if let Some(result) = try_optimized_subset(x, indices) {
return result;
}
// Fallback: R will call elt() for each index
std::ptr::null_mut()
}
This ensures correctness even when optimization isnβt possible.
πPerformance Tips
- Implement
sum/min/maxwhen you can compute them in O(1) - Use
no_na()hint when you know there are no NAs - Use
is_sorted()hint for sorted data - Implement
get_region()for efficient bulk access - Delay materialization - prefer
elt()overdataptr() - Return
Nonefromdataptr_or_null()until actually materialized
πCommon Patterns
πPattern 1: Constant Vector
struct Constant<T> { value: T, len: usize }
// All elements return the same value
fn elt(&self, _i: usize) -> T { self.value }πPattern 2: Computed Sequence
struct Sequence { start: T, step: T, len: usize }
// Elements computed from formula
fn elt(&self, i: usize) -> T { self.start + i * self.step }πPattern 3: External Data View
struct ExternalView<'a> { data: &'a [T] }
// Zero-copy view into external data
fn elt(&self, i: usize) -> T { self.data[i] }πPattern 4: Lazy Computation with Cache
struct Lazy { params: Params, cache: Option<Vec<T>> }
// Compute and cache on first access
fn dataptr(&mut self) -> *mut T {
if self.cache.is_none() { self.cache = Some(self.compute()); }
self.cache.as_mut().unwrap().as_mut_ptr()
}
πAdvanced Methods
These methods are rarely needed but available for special use cases.
πInspect - Custom Debug Output
The inspect() method customizes the output of .Internal(inspect(x)), Rβs internal debugging tool.
impl Altrep for MyData {
const HAS_INSPECT: bool = true;
fn inspect(
x: SEXP,
pre: i32,
deep: i32,
pvec: i32,
inspect_subtree: Option<unsafe extern "C-unwind" fn(SEXP, i32, i32, i32)>,
) -> bool {
// Print custom information
eprintln!(" MyData ALTREP");
eprintln!(" - custom_field: {}", /* access your data */);
// Optionally inspect child objects
if let Some(inspect) = inspect_subtree {
unsafe { inspect(/* child SEXP */, pre, deep, pvec); }
}
true // Return true if inspection succeeded
}
}
When to use:
- Debugging complex ALTREP structures
- Showing internal state in
.Internal(inspect()) - Documenting ALTREP design for users
When to skip:
- Most use cases (Rβs default inspection is fine)
- Production code (debugging feature)
πDuplicate - Custom Object Duplication
The duplicate() and duplicate_ex() methods customize how R duplicates your ALTREP object when copy-on-write semantics require it.
impl Altrep for LazyWithCache {
const HAS_DUPLICATE: bool = true;
fn duplicate(x: SEXP, deep: bool) -> SEXP {
let data = unsafe { altrep_data1_as::<LazyWithCache>(x) }?;
if deep {
// Deep copy: clone cached data too
let new_data = LazyWithCache {
params: data.params.clone(),
cache: RefCell::new(data.cache.borrow().clone()),
};
new_data.into_sexp()
} else {
// Shallow copy: share cache (default R behavior)
x // Return self
}
}
}
When to use:
- Controlling what gets copied (cache vs params)
- Optimizing duplication for large cached data
- Implementing copy-on-write semantics
- Sharing immutable state across copies
When to skip:
- Default R duplication is correct
- No shared mutable state
- No expensive cached data
Note: duplicate_ex() is the newer extended version - prefer it over duplicate() if implementing both.
πCoerce - Custom Type Conversion
The coerce() method customizes how R converts your ALTREP to other types (e.g., integer β real, real β integer).
impl Altrep for ArithSeq {
const HAS_COERCE: bool = true;
fn coerce(x: SEXP, to_type: SEXPTYPE) -> SEXP {
use SEXPTYPE::*;
let data = unsafe { altrep_data1_as::<ArithSeq>(x) }?;
match to_type {
REALSXP => {
// Convert integer sequence to real sequence
// Instead of materializing, return a new Real ALTREP
let real_seq = RealArithSeq {
start: data.start as f64,
step: data.step as f64,
len: data.len,
};
real_seq.into_sexp()
}
_ => {
// Let R handle other conversions
std::ptr::null_mut()
}
}
}
}
When to use:
- Converting between related ALTREP types (IntSeq β RealSeq)
- Avoiding materialization during coercion
- Preserving ALTREP properties after conversion
- Optimizing common conversion paths
When to skip:
- Default R coercion is acceptable
- Conversion requires materialization anyway
- Rare conversion path
Return values:
- Return new SEXP: Your custom coercion
- Return
NULL(null_mut()): Let R use default coercion
πMaterialization and DATAPTR
πUnderstanding Materialization
Materialization is the process of converting your lazy/compact ALTREP representation into a standard R vector with contiguous memory. This happens when R needs direct memory access to your data.
πWhen R Requests DATAPTR
R calls the dataptr() or dataptr_or_null() methods when:
-
Operations requiring contiguous memory:
sort(),order(),unique().C()or.Fortran()calls passing the vectoras.vector()with specific types- Some vectorized operations (
x + y,x * 2)
-
Serialization (unless you provide
serialize()) -
Interop with other packages expecting raw pointers
πThe Three Dataptr Strategies
πStrategy 1: No DATAPTR (Lazy Forever)
When to use: Pure lazy evaluation, external data sources, mathematical sequences
// Don't implement AltrepDataptr - only provide elt()
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "LazySequence")]
pub struct LazySequence {
start: i32,
step: i32,
len: usize,
}
impl AltIntegerData for LazySequence {
fn elt(&self, i: usize) -> i32 {
self.start + (i as i32) * self.step
}
}
// No dataptr option
miniextendr_api::impl_altinteger_from_data!(LazySequence);
Behavior:
- β O(1) creation
- β O(1) element access
- β Operations needing DATAPTR will materialize to regular R vector
- β R owns the materialized copy (you lose control)
πStrategy 2: Materialization on Demand
When to use: Lazy until needed, then cache the materialized form
use miniextendr_api::altrep_data::AltrepDataptr;
use std::cell::RefCell;
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "LazyWithCache")]
pub struct LazyWithCache {
// Computation parameters
start: i32,
step: i32,
len: usize,
// Materialized cache (initially None)
materialized: RefCell<Option<Vec<i32>>>,
}
impl AltrepDataptr<i32> for LazyWithCache {
fn dataptr(&mut self, _writable: bool) -> Option<*mut i32> {
// Materialize on first call
let mut mat = self.materialized.borrow_mut();
if mat.is_none() {
let vec: Vec<i32> = (0..self.len)
.map(|i| self.start + (i as i32) * self.step)
.collect();
*mat = Some(vec);
}
// Return pointer to cached data
mat.as_mut().map(|v| v.as_mut_ptr())
}
fn dataptr_or_null(&self) -> Option<*const i32> {
// Return None if not yet materialized (saves memory)
self.materialized
.borrow()
.as_ref()
.map(|v| v.as_ptr())
}
}
// Enable dataptr
miniextendr_api::impl_altinteger_from_data!(LazyWithCache, dataptr);
Behavior:
- β Lazy until DATAPTR requested
- β Subsequent DATAPTR calls are O(1)
- β You control the materialized form
- β οΈ Uses memory after materialization
πStrategy 3: Pre-Materialized (Vec/Box)
When to use: Data already in memory, just wrapping existing vector
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "VecWrapper")]
pub struct VecWrapper {
data: Vec<i32>,
}
impl AltrepDataptr<i32> for VecWrapper {
fn dataptr(&mut self, _writable: bool) -> Option<*mut i32> {
Some(self.data.as_mut_ptr())
}
fn dataptr_or_null(&self) -> Option<*const i32> {
Some(self.data.as_ptr())
}
}
miniextendr_api::impl_altinteger_from_data!(VecWrapper, dataptr);
Behavior:
- β DATAPTR always available (O(1))
- β No lazy evaluation overhead
- β Memory used immediately
- β No computation savings
πMaterialization Trade-offs
| Aspect | No DATAPTR | On-Demand | Pre-Materialized |
|---|---|---|---|
| Memory | Minimal | Grows on use | Full upfront |
| Speed | Fast elt() | Fast after first | Fastest DATAPTR |
| Use case | Math sequences | Caching | Existing data |
| Lazy eval | β Always | β Until DATAPTR | β Never |
πWhen to Provide DATAPTR
Provide DATAPTR if:
- β Your data is already in memory (Vec, Box, slice)
- β Users will frequently perform operations requiring contiguous memory
- β You can efficiently materialize when needed
- β You want to control the materialization process
Skip DATAPTR if:
- β Data is external (database, file, network)
- β Pure mathematical sequence (no need to materialize)
- β Memory is at a premium
- β Rβs default materialization is acceptable
πSafety Requirements
When implementing dataptr():
-
Pointer Validity: The returned pointer must remain valid until the next GC or until the ALTREP object is collected
-
Lifetime: Store materialized data in the ALTREP object itself (in the data1 ExternalPtr)
-
Mutability: If
writable=true, the pointer must be mutable. R may modify the data.
// β WRONG - pointer becomes invalid
fn dataptr(&mut self, _writable: bool) -> Option<*mut i32> {
let vec = vec![1, 2, 3];
Some(vec.as_mut_ptr()) // vec is dropped! Pointer is now invalid!
}
// β
CORRECT - pointer remains valid
fn dataptr(&mut self, _writable: bool) -> Option<*mut i32> {
self.cached_data.as_mut().map(|v| v.as_mut_ptr())
}πExample: Controlling Materialization
#[derive(miniextendr_api::Altrep)]
#[altrep(class = "OptionallyMaterialized")]
pub struct OptionallyMaterialized {
generator: Box<dyn Fn(usize) -> i32>,
len: usize,
cache: RefCell<Option<Vec<i32>>>,
}
impl OptionallyMaterialized {
pub fn is_materialized(&self) -> bool {
self.cache.borrow().is_some()
}
pub fn force_materialize(&mut self) {
if self.cache.borrow().is_none() {
let vec = (0..self.len).map(|i| (self.generator)(i)).collect();
*self.cache.borrow_mut() = Some(vec);
}
}
}
Key Insight: Materialization is a one-way door. Once materialized, you typically stay materialized. Plan your memory strategy accordingly.
πTroubleshooting
πβError: could not find functionβ
- Ensure constructor function has
#[miniextendr]and ispub - Run
just devtools-documentafter adding new functions
πElements return wrong values
- Check your
elt()implementation - Verify index bounds handling
πR crashes on access
- Ensure ALTREP derive (
#[derive(Altrep)]or#[derive(AltrepInteger)]etc.) is on your data type - Check that
into_sexp()is called to create the ALTREP object
πSerialization fails
- Implement
AltrepSerializetrait - Add
serializeoption toimpl_alt*_from_data!
πDATAPTR operations crash
- Implement
AltrepDataptrtrait - Add
dataptroption toimpl_alt*_from_data! - Ensure returned pointer is valid for the vectorβs lifetime
πIterator-Backed ALTREP
miniextendr provides two iterator-backed ALTREP variants:
πIterState (Prefix Caching)
The default iterator state caches elements as a contiguous prefix. When you access element i, all elements 0..=i are generated and cached.
use miniextendr_api::altrep_data::IterIntData;
// Create from an iterator
let data = IterIntData::from_iter((0..1000).map(|x| x * 2), 1000);
// Access element 100 - generates and caches elements 0-100
let elem = data.elt(100);
// Access element 50 - returns from cache (no computation)
let elem = data.elt(50);
Characteristics:
- Cache is contiguous
Vec<T> - All elements up to max accessed index are cached
as_slice()available after full materialization- Memory usage: O(max_accessed_index)
πSparseIterState (Skipping)
For sparse access patterns, use the sparse variants that skip intermediate elements using Iterator::nth():
use miniextendr_api::altrep_data::SparseIterIntData;
// Create from an iterator
let data = SparseIterIntData::from_iter((0..1_000_000).map(|x| x * 2), 1_000_000);
// Access element 999_999 - skips directly there
let elem = data.elt(999_999); // Only this element is generated
// Element 0 was skipped and is now inaccessible
let first = data.elt(0); // Returns NA_INTEGER
Characteristics:
- Cache is sparse
BTreeMap<usize, T> - Only accessed elements are cached
- Skipped elements return NA/default forever
as_slice()always returnsNone- Memory usage: O(num_accessed)
πComparison
| Feature | IterState | SparseIterState |
|---|---|---|
| Cache storage | Contiguous Vec<T> | Sparse BTreeMap<usize, T> |
| Access pattern | Prefix (0..=i) cached | Only accessed indices cached |
| Skipped elements | All cached | Gone forever (return NA) |
| Memory for sparse access | O(max_index) | O(num_accessed) |
as_slice() support | Yes (after full materialization) | No |
πAvailable Types
Prefix caching (IterState):
IterIntData<I>- Integer vectorsIterRealData<I>- Real (f64) vectorsIterLogicalData<I>- Logical (bool) vectorsIterRawData<I>- Raw (u8) vectorsIterStringData<I>- Character vectors (forces full materialization)IterComplexData<I>- Complex number vectorsIterListData<I>- List vectors (SEXP elements)IterIntCoerceData<I, T>- Integer with coercion from other typesIterRealCoerceData<I, T>- Real with coercion from other types
Sparse/skipping (SparseIterState):
SparseIterIntData<I>- Integer vectorsSparseIterRealData<I>- Real (f64) vectorsSparseIterLogicalData<I>- Logical (bool) vectorsSparseIterRawData<I>- Raw (u8) vectorsSparseIterComplexData<I>- Complex number vectors
πWhen to Use Which
Use IterState (prefix caching) when:
- Access is mostly sequential (0, 1, 2, β¦)
- Youβll eventually access most/all elements
- You need
as_slice()or full materialization later
Use SparseIterState (skipping) when:
- Access is truly sparse (e.g., sampling)
- Vector is very large but you only need a few elements
- You donβt need skipped elements ever again
- Memory is constrained