Expand description
Utilities for recovering R SEXPs from raw data pointers.
R stores vector data at a fixed offset after the SEXPREC header. Given a pointer into that data region, we can subtract the header size to recover the SEXP — then verify it by reading raw memory fields (type tag, ALTREP bit, and vecsxp.length) without calling any R functions.
This is used by:
- Arrow integration: zero-copy IntoR when the buffer is R-backed
Cow<[T]>IntoR round-trip
§Initialization
init_sexprec_data_offset must be called during package init (before any
recovery attempts). It measures the offset on a real R vector, so it works
across R versions and platforms.
§R’s VECTOR_SEXPREC layout
// From R's Defn.h:
typedef struct VECTOR_SEXPREC {
SEXPREC_HEADER; // sxpinfo(8) + attrib(8) + gengc_next(8) + gengc_prev(8)
struct vecsxp_struct { // length(8) + truelength(8)
R_xlen_t length;
R_xlen_t truelength;
} vecsxp;
} VECTOR_SEXPREC;
typedef union { VECTOR_SEXPREC s; double align; } SEXPREC_ALIGN;
#define STDVEC_DATAPTR(x) ((void *)(((SEXPREC_ALIGN *)(x)) + 1))On 64-bit: sizeof(VECTOR_SEXPREC) = 48 bytes, sizeof(SEXPREC_ALIGN) = 48.
Data starts at sexp + 48. All vector types (REALSXP, INTSXP, RAWSXP,
STRSXP, VECSXP) use the same VECTOR_SEXPREC header.
§Why not #[repr(C)] mirror struct?
A Rust #[repr(C)] struct mirroring VECTOR_SEXPREC would give a
compile-time size_of instead of runtime measurement. However:
- R’s layout can vary by version and compile options (32-bit, padding)
- The runtime measurement is one allocation at init — negligible
- A
repr(C)mirror struct doesn’t help with the real safety issue: reading from a speculative pointer.addr_of!computes field addresses without dereferencing, but we still need toread()the type tag — and that read is from potentially invalid memory for non-R pointers.
The verification (type tag + ALTREP check + XLENGTH) prevents false positives. Only the type tag requires a raw sxpinfo read; ALTREP and XLENGTH use R’s public C API.
§Safety of speculative reads
The candidate pointer is computed from pointer arithmetic on the input data_ptr. For Rust-owned buffers (not R-backed), this points into arbitrary heap memory. We must be careful about which R functions we call on it:
ALTREP(x)— safe: just readsx->sxpinfo.alt(a single bit).XLENGTH(x)on non-ALTREP — safe: readsSTDVEC_LENGTH(struct field, no dispatch, no error).LENGTH(x)— UNSAFE: wraps XLENGTH with> INT_MAXcheck that callsR_BadLongVector()(throws R error on garbage with large length).DATAPTR_RO(x)— UNSAFE on ALTREP: dispatches through class vtable (bogus function pointers on garbage). On non-ALTREP:STDVEC_DATAPTRwhich also checks for long vectors.
The verification sequence is:
- Raw sxpinfo type tag (bits 0-4) — no public TYPEOF that’s safe on garbage
ALTREP(candidate)— gates step 3 (rejects ALTREP before XLENGTH dispatch)XLENGTH(candidate)— safe for non-ALTREP (STDVEC_LENGTH, no errors)
Statics§
- SEXPREC_
DATA_ 🔒OFFSET - Offset in bytes from SEXP address to data pointer for standard (non-ALTREP) vectors.
Functions§
- init_
sexprec_ ⚠data_ offset - Compute and store the SEXPREC data offset by measuring a real R vector.
- sexprec_
data_ offset - Get the computed SEXPREC data offset.
- try_
recover_ ⚠r_ sexp - Try to recover the source R SEXP from a data pointer.