Reference page
Arrow Integration
Zero-copy conversions between R vectors and Apache Arrow arrays.
Zero-copy conversions between R vectors and Apache Arrow arrays.
πQuick Reference
use miniextendr_api::{miniextendr, ffi::SEXP};
use miniextendr_api::optionals::arrow_impl::*;
// R numeric β Arrow Float64Array β back to R: zero-copy both directions
#[miniextendr]
pub fn passthrough_numeric(x: Float64Array) -> Float64Array {
x
}
// R integer β Arrow Int32Array β back to R: zero-copy both directions
#[miniextendr]
pub fn passthrough_integer(x: Int32Array) -> Int32Array {
x
}
// Compute on Arrow, return to R (copies on return β new data)
#[miniextendr]
pub fn doubled(x: Float64Array) -> Float64Array {
x.iter().map(|v| v.map(|f| f * 2.0)).collect()
}
// RecordBatch round-trip: primitive columns zero-copy per-column
#[miniextendr]
pub fn passthrough_df(df: RecordBatch) -> RecordBatch {
df
}πZero-Copy String Vectors
R stores strings as STRSXP (array of CHARSXP pointers). Each CHARSXP is interned,
GC-managed, and has a known LENGTH. Instead of copying into String, borrow directly.
πCow<'static, str> β scalar
#[miniextendr]
pub fn greet(name: Cow<'static, str>) -> String {
// name is Cow::Borrowed β points directly into R's CHARSXP data
// No allocation unless you call .to_mut()
format!("Hello, {}!", name)
}πVec<Cow<'static, str>> β vector, zero-copy per element
#[miniextendr]
pub fn upper_first(words: Vec<Cow<'static, str>>) -> Vec<String> {
// Each element is Cow::Borrowed (zero-copy from R's CHARSXP pool)
words.iter().map(|w| {
let mut s = w.to_string();
if let Some(c) = s.get_mut(0..1) {
c.make_ascii_uppercase();
}
s
}).collect()
}
// NA-aware variant: None for NA_character_
#[miniextendr]
pub fn count_non_na(words: Vec<Option<Cow<'static, str>>>) -> i32 {
words.iter().filter(|w| w.is_some()).count() as i32
}πCow<'static, [T]> β numeric slices
#[miniextendr]
pub fn sum_cow(x: Cow<'static, [f64]>) -> f64 {
// Cow::Borrowed β x points directly into R's REALSXP data
x.iter().sum()
}
// Round-trip: if x was borrowed from R, IntoR returns the original SEXP
#[miniextendr]
pub fn passthrough_cow(x: Cow<'static, [i32]>) -> Cow<'static, [i32]> {
x // zero-copy: SEXP pointer recovery finds the original R vector
}πProtectedStrVec vs StrVec β safety vs speed
ProtectedStrVec and StrVec both wrap an R STRSXP and provide zero-copy
&str access to its elements. They differ in GC safety:
StrVec | ProtectedStrVec | |
|---|---|---|
| Size | 1 word (just the SEXP) | 3 words (SEXP + len + OwnedProtect) |
| Copy | Copy | !Copy (owns protection guard) |
| GC protection | None β callerβs responsibility | OwnedProtect keeps STRSXP alive |
| Borrow lifetime | &'static str (lie) | &'a str tied to &'a self |
| Iterator | StrVecIter (Option<&'static str>) | ProtectedStrVecIter<'a> (Option<&'a str>) |
The key difference is lifetime safety. ProtectedStrVec ties all borrows
to the structβs lifetime. The compiler catches use-after-free:
let dangling: &str;
{
let sv = unsafe { ProtectedStrVec::new(sexp) };
dangling = sv.get_str(0).unwrap(); // borrows &sv
} // sv dropped β SEXP unprotected
// dangling is now invalid β COMPILER ERROR: sv doesn't live long enough
With StrVec or Vec<&'static str>, the same code compiles silently and
produces a dangling pointer β the 'static lifetime is a lie (the data is only
valid while R protects the SEXP).
When to use which:
StrVec/Vec<&'static str>β inside a#[miniextendr]function where R protects the.Callargument. Lightweight, fine. The SEXP wonβt be GCβd during the call.ProtectedStrVecβ when you store the string vector beyond the immediate scope, pass it to a closure, or want the compiler to catch lifetime bugs. TheOwnedProtectguard keeps the STRSXP alive until the struct is dropped.
Usage examples:
use miniextendr_api::ProtectedStrVec;
use std::collections::HashSet;
#[miniextendr]
pub fn count_unique(strings: ProtectedStrVec) -> i32 {
// Lifetimes tied to &self β compiler enforces GC safety
let unique: HashSet<&str> = strings.iter()
.filter_map(|s| s) // skip NA
.collect();
unique.len() as i32
}
// Can't return &str β ProtectedStrVec is consumed by IntoR, so there's
// nothing to borrow from. Return String or the whole ProtectedStrVec.
#[miniextendr]
pub fn first_non_na(strings: ProtectedStrVec) -> String {
strings.iter()
.find_map(|s| s)
.unwrap_or("")
.to_owned()
}use miniextendr_api::StrVec;
#[miniextendr]
pub fn has_empty(strings: StrVec) -> bool {
// StrVec is Copy β just a SEXP wrapper. R protects .Call arguments,
// so this is safe within the function body.
strings.iter().any(|opt| opt == Some(""))
}πArrow Arrays
πR β Arrow (already zero-copy for primitives)
use miniextendr_api::optionals::arrow_impl::*;
#[miniextendr]
pub fn arrow_mean(x: Float64Array) -> f64 {
// x.values() points directly into R's REALSXP data (zero-copy)
// NA values are tracked in Arrow's null bitmap, not in the data
let sum: f64 = x.iter().flatten().sum();
let count = x.len() - x.null_count();
sum / count as f64
}
#[miniextendr]
pub fn arrow_filter_positive(x: Int32Array) -> Int32Array {
// Arrow compute β result is a new array (Rust-allocated)
x.iter()
.map(|v| v.filter(|&i| i > 0))
.collect()
}πArrow β R (automatic SEXP recovery)
When an Arrow arrayβs data buffer came from R (via sexp_to_arrow_buffer),
IntoR automatically recovers the original SEXP using pointer arithmetic.
No wrapper types needed.
// This is zero-copy BOTH directions:
#[miniextendr]
pub fn identity(x: Float64Array) -> Float64Array {
x // RβArrow (zero-copy) β ArrowβR (pointer recovery, zero-copy)
}
// This copies on return (new data, not from R):
#[miniextendr]
pub fn squares(x: Float64Array) -> Float64Array {
x.iter().map(|v| v.map(|f| f * f)).collect()
}πRecordBatch (data.frame)
use arrow_array::cast::AsArray;
#[miniextendr]
pub fn df_add_column(df: RecordBatch) -> RecordBatch {
let col0: &Float64Array = df.column(0).as_primitive();
// Compute new column
let new_col: Float64Array = col0.iter()
.map(|v| v.map(|f| f * 2.0))
.collect();
// Build new batch β original columns return to R zero-copy,
// new column copies (it's Rust-allocated)
let mut fields = df.schema().fields().to_vec();
fields.push(Arc::new(Field::new("doubled", DataType::Float64, true)));
let schema = Arc::new(Schema::new(fields));
let mut columns = df.columns().to_vec();
columns.push(Arc::new(new_col));
RecordBatch::try_new(schema, columns).unwrap()
}πalloc_r_backed_buffer β RustβArrowβR zero-copy
Allocate an Arrow buffer backed by R memory from the start. Write through the raw SEXP pointer, then wrap in Arrow types. When the array is later converted to R, pointer recovery finds the original SEXP.
use miniextendr_api::optionals::arrow_impl::alloc_r_backed_buffer;
#[miniextendr]
pub fn generate_sequence(n: i32) -> SEXP {
use miniextendr_api::IntoR;
let n = n as usize;
// Allocate buffer as R REALSXP β data lives in R's heap
let (buffer, sexp) = unsafe { alloc_r_backed_buffer::<f64>(n) };
// Fill through the SEXP's raw pointer (before wrapping in Arrow)
unsafe {
let ptr = miniextendr_api::ffi::REAL(sexp);
for i in 0..n {
*ptr.add(i) = i as f64;
}
}
// Wrap as Arrow array
let values = arrow_buffer::ScalarBuffer::<f64>::from(buffer);
let array = Float64Array::new(values, None);
// IntoR β pointer recovery β returns the same REALSXP (zero-copy)
array.into_sexp()
}πRStringArray β string round-trip tracking
Arrowβs StringArray and Rβs STRSXP have incompatible layouts (contiguous data+offsets
vs per-element CHARSXPs). Automatic pointer recovery canβt work for strings.
RStringArray explicitly tracks the source STRSXP.
use miniextendr_api::optionals::arrow_impl::RStringArray;
#[miniextendr]
pub fn string_passthrough(x: RStringArray) -> RStringArray {
// x.source is Some(strsxp) β IntoR returns original STRSXP
x
}
#[miniextendr]
pub fn string_lengths(x: RStringArray) -> Vec<i32> {
// Deref to StringArray β all Arrow APIs work
x.iter().map(|opt| opt.map(|s| s.len() as i32).unwrap_or(-1)).collect()
}πALTREP for Cow string vectors
Vec<Cow<'static, str>> supports ALTREP with seamless serialization:
use miniextendr_api::IntoRAltrep;
use std::borrow::Cow;
#[miniextendr]
pub fn lazy_strings(prefix: &str, n: i32) -> SEXP {
let strings: Vec<Cow<'static, str>> = (0..n)
.map(|i| Cow::Owned(format!("{}_{}", prefix, i)))
.collect();
strings.into_sexp_altrep()
// R sees a character vector; elements computed on demand via ALTREP Elt
// saveRDS/readRDS works β serializes to STRSXP, deserializes back to Vec<Cow>
}πHow It Works
πSEXP Pointer Recovery (r_memory module)
R stores vector data at a fixed offset from the SEXP header:
[VECTOR_SEXPREC header (48 bytes on 64-bit)] [data...]
^ ^
SEXP DATAPTR_RO(sexp)
All R vector types (REALSXP, INTSXP, RAWSXP, STRSXP, VECSXP) use the same
VECTOR_SEXPREC header. Non-vector types use larger SEXPREC but donβt have
data pointers.
At package init, we measure the offset on a real R vector. Then in IntoR:
candidate_sexp = data_ptr - offset
verify: TYPEOF(candidate) == expected AND LENGTH(candidate) == expected AND DATAPTR_RO(candidate) == data_ptr
Safety consideration: For Rust-allocated buffers, data_ptr - offset points to
arbitrary heap memory. The 4-byte type-tag read at that address is technically undefined
behavior in Rustβs abstract model (the pointer wasnβt derived from an R allocation).
In practice, this is safe β the address is in mapped heap memory and the read is
immediately validated by the triple check (type + length + DATAPTR_RO round-trip),
which makes false positives impossible. ALTREP vectors also fail safely (the
DATAPTR_RO round-trip check catches them, since ALTREP data isnβt at a fixed offset).
πString conversion (charsxp_to_str)
charsxp_to_str() uses R_CHAR + LENGTH (O(1)) with from_utf8_unchecked.
No per-string UTF-8 validation β miniextendr_assert_utf8_locale() at package init
guarantees all CHARSXPs in the session are valid UTF-8. charsxp_to_cow() wraps
the result in Cow::Borrowed (always borrowed, never owned).
πType Decision Tree
Need strings from R?
βββ Scalar β Cow<'static, str> (zero-copy)
βββ Vector, need ownership β Vec<String> (copies, lossy NAβ"")
βββ Vector, read-only β Vec<Cow<'static, str>> (zero-copy per element)
βββ Vector, NA-aware β Vec<Option<Cow<'static, str>>>
βββ View with GC safety β ProtectedStrVec
βββ Lightweight view β StrVec (Copy, caller manages GC)
Need numerics from R?
βββ As Rust slice β &[f64] / &[i32] (zero-copy, 'static lifetime)
βββ Copy-on-write β Cow<'static, [f64]> (zero-copy, copies on .to_mut())
βββ As Arrow array β Float64Array (zero-copy both directions)
βββ Owned copy β Vec<f64> (copies)
Need data frames?
βββ As Arrow β RecordBatch (primitive cols zero-copy both ways)
βββ As Arrow (string cols too) β use RStringArray per column