Skip to main content

Module r_memory

Module r_memory 

Source
Expand description

Utilities for recovering R SEXPs from raw data pointers.

R stores vector data at a fixed offset after the SEXPREC header. Given a pointer into that data region, we can subtract the header size to recover the SEXP — then verify it by reading raw memory fields (type tag, ALTREP bit, and vecsxp.length) without calling any R functions.

This is used by:

  • Arrow integration: zero-copy IntoR when the buffer is R-backed
  • Cow<[T]> IntoR round-trip

§Initialization

init_sexprec_data_offset must be called during package init (before any recovery attempts). It measures the offset on a real R vector, so it works across R versions and platforms.

§R’s VECTOR_SEXPREC layout

// From R's Defn.h:
typedef struct VECTOR_SEXPREC {
    SEXPREC_HEADER;           // sxpinfo(8) + attrib(8) + gengc_next(8) + gengc_prev(8)
    struct vecsxp_struct {    // length(8) + truelength(8)
        R_xlen_t length;
        R_xlen_t truelength;
    } vecsxp;
} VECTOR_SEXPREC;

typedef union { VECTOR_SEXPREC s; double align; } SEXPREC_ALIGN;
#define STDVEC_DATAPTR(x) ((void *)(((SEXPREC_ALIGN *)(x)) + 1))

On 64-bit: sizeof(VECTOR_SEXPREC) = 48 bytes, sizeof(SEXPREC_ALIGN) = 48. Data starts at sexp + 48. All vector types (REALSXP, INTSXP, RAWSXP, STRSXP, VECSXP) use the same VECTOR_SEXPREC header.

§Why not #[repr(C)] mirror struct?

A Rust #[repr(C)] struct mirroring VECTOR_SEXPREC would give a compile-time size_of instead of runtime measurement. However:

  • R’s layout can vary by version and compile options (32-bit, padding)
  • The runtime measurement is one allocation at init — negligible
  • A repr(C) mirror struct doesn’t help with the real safety issue: reading from a speculative pointer. addr_of! computes field addresses without dereferencing, but we still need to read() the type tag — and that read is from potentially invalid memory for non-R pointers.

The verification (type tag + ALTREP check + XLENGTH) prevents false positives. Only the type tag requires a raw sxpinfo read; ALTREP and XLENGTH use R’s public C API.

§Safety of speculative reads

The candidate pointer is computed from pointer arithmetic on the input data_ptr. For Rust-owned buffers (not R-backed), this points into arbitrary heap memory. We must be careful about which R functions we call on it:

  • ALTREP(x) — safe: just reads x->sxpinfo.alt (a single bit).
  • XLENGTH(x) on non-ALTREP — safe: reads STDVEC_LENGTH (struct field, no dispatch, no error).
  • LENGTH(x) — UNSAFE: wraps XLENGTH with > INT_MAX check that calls R_BadLongVector() (throws R error on garbage with large length).
  • DATAPTR_RO(x) — UNSAFE on ALTREP: dispatches through class vtable (bogus function pointers on garbage). On non-ALTREP: STDVEC_DATAPTR which also checks for long vectors.

The verification sequence is:

  1. Raw sxpinfo type tag (bits 0-4) — no public TYPEOF that’s safe on garbage
  2. ALTREP(candidate) — gates step 3 (rejects ALTREP before XLENGTH dispatch)
  3. XLENGTH(candidate) — safe for non-ALTREP (STDVEC_LENGTH, no errors)

Statics§

SEXPREC_DATA_OFFSET 🔒
Offset in bytes from SEXP address to data pointer for standard (non-ALTREP) vectors.

Functions§

init_sexprec_data_offset
Compute and store the SEXPREC data offset by measuring a real R vector.
sexprec_data_offset
Get the computed SEXPREC data offset.
try_recover_r_sexp
Try to recover the source R SEXP from a data pointer.