miniextendr provides comprehensive support for converting between Rust types and R data frames, with three complementary approaches offering different trade-offs between ergonomics and flexibility.

πŸ”—Overview

ApproachBest ForCode GenerationFlexibility
#[derive(DataFrameRow)]Type-safe, ergonomic APIsβœ… Generates DataFrame type⭐⭐⭐ Easy
DataFrame<T>Generic, reusable code❌ No codegen⭐⭐ Moderate
impl IntoDataFrameFull control, complex cases❌ Manual impl⭐ Advanced

πŸ”—Core Traits

πŸ”—IntoDataFrame

The foundational trait for converting Rust types to R data frames.

pub trait IntoDataFrame {
    fn into_data_frame(self) -> List;
}

Key Points:

  • Consumes self (owning conversion)
  • Returns a List with data.frame attributes
  • Used by all other approaches under the hood

Related:

  • AsDataFrame (in as_coerce module) - S3 coercion methods for as.data.frame() on ExternalPtr types
  • IntoDataFrame (this trait) - Direct conversion for return values

Use #[derive(DataFrameRow)] for the most ergonomic experience. The macro generates a companion DataFrame type and all necessary conversions.

πŸ”—Basic Usage

use miniextendr_api::{miniextendr, DataFrameRow, IntoList};

#[derive(Clone, IntoList, DataFrameRow)]
struct Measurement {
    time: f64,
    value: f64,
    sensor: String,
}

// Auto-generates:
// - struct MeasurementDataFrame { time: Vec<f64>, value: Vec<f64>, sensor: Vec<String> }
// - impl IntoDataFrame for MeasurementDataFrame
// - impl From<Vec<Measurement>> for MeasurementDataFrame
// - impl IntoIterator for MeasurementDataFrame -> Measurement
// - Measurement::to_dataframe() and from_dataframe() methods

#[miniextendr]
fn get_measurements() -> MeasurementDataFrame {
    let rows = vec![
        Measurement { time: 1.0, value: 10.0, sensor: "A".into() },
        Measurement { time: 2.0, value: 20.0, sensor: "B".into() },
        Measurement { time: 3.0, value: 30.0, sensor: "C".into() },
    ];

    Measurement::to_dataframe(rows)  // or: rows.into()
}

πŸ”—Heterogeneous Types

The derive macro fully supports different types in different fields:

#[derive(Clone, IntoList, DataFrameRow)]
struct Person {
    name: String,      // character in R
    age: i32,          // integer in R
    height: f64,       // numeric in R
    is_student: bool,  // logical in R
}

// Each field maintains its distinct type throughout conversion

πŸ”—Collection Expansion

Fixed-size arrays [T; N] are automatically expanded into N suffixed columns. Use #[dataframe(expand)] or #[dataframe(unnest)] explicitly if desired, though arrays expand by default.

#[derive(Clone, DataFrameRow)]
struct Point3D {
    label: String,
    coords: [f64; 3],  // β†’ coords_1, coords_2, coords_3
}

// Generates:
// struct Point3DDataFrame {
//     label: Vec<String>,
//     coords_1: Vec<f64>,
//     coords_2: Vec<f64>,
//     coords_3: Vec<f64>,
// }

For Vec<T>, Box<[T]>, and &[T], two expansion modes are available:

Fixed width (width = N): Expands into exactly N columns at compile time.

#[derive(Clone, DataFrameRow)]
struct Scored {
    name: String,
    #[dataframe(width = 3)]
    scores: Vec<f64>,  // β†’ scores_1, scores_2, scores_3 as Option<f64>
}
  • Shorter vecs: padded with NA
  • Longer vecs: truncated to N (extra elements silently dropped)

Auto-expand (expand or unnest): Column count determined at runtime from the maximum length across all rows.

#[derive(Clone, DataFrameRow)]
struct Measured {
    name: String,
    #[dataframe(expand)]       // or: #[dataframe(unnest)]
    readings: Vec<f64>,        // β†’ readings_1, readings_2, ... (as many as needed)
}
  • Shorter vecs: padded with NA
  • All elements preserved (no truncation)
  • If all vecs are empty: no expansion columns produced

Box<[T]> and &[T] work identically to Vec<T> for all expansion modes β€” they share the same .get(), .len(), and indexing behavior.

Note: Using &[T] introduces a lifetime parameter on both the row struct and the generated companion struct (e.g., FooDataFrame<'a>). This is zero-cost: &[T] is Copy (just a fat pointer), so pushing into the companion struct copies only the pointer, not the data.

Without width or expand/unnest, Vec<T>, Box<[T]>, and &[T] stay as opaque single columns (list columns in R).

πŸ”—Field-Level Attributes

#[derive(Clone, DataFrameRow)]
struct Row {
    #[dataframe(skip)]           // Omit from DataFrame
    internal_id: u64,

    #[dataframe(rename = "lbl")] // Custom column name
    label: String,

    #[dataframe(as_list)]        // Suppress expansion (keep as single column)
    coords: [f64; 3],

    #[dataframe(width = 5)]      // Expand Vec to 5 columns
    scores: Vec<f64>,
}
AttributeEffectValid On
skipOmit field from DataFrameAny field
rename = "name"Custom column nameAny field
as_listSuppress expansion[T; N], Vec<T>, Box<[T]>, &[T]
expandExplicit expansion (default for [T; N]; auto-expand for Vec<T>/Box<[T]>/&[T])[T; N], Vec<T>, Box<[T]>, &[T]
unnestAlias for expand[T; N], Vec<T>, Box<[T]>, &[T]
width = NPin expansion width (truncates longer vecs/slices)Vec<T>, Box<[T]>, &[T]

Conflicts: as_list + expand/unnest, as_list + width are compile errors.

Note on round-tripping: Structs with expanded fields don’t generate IntoIterator or from_dataframe(), since the companion struct shape differs from the original. Use to_dataframe() only.

πŸ”—Other Collection Types

Non-expanded collection types (opaque columns) work with manual IntoList:

use std::collections::{HashSet, BTreeSet};

#[derive(Clone, DataFrameRow)]
struct ComplexRow {
    measurements: Vec<f64>,      // opaque list column
    data: Box<[i32]>,            // opaque list column
    tags: HashSet<String>,       // opaque list column
    categories: BTreeSet<i32>,   // opaque list column
}

Note: These need manual IntoList implementations (see rpkg/src/rust/dataframe_collections_test.rs).

πŸ”—Enum Align Mode

Enums derive a companion DataFrame where each variant’s fields contribute to a unified schema. Fields absent in a variant are filled with None (β†’ NA in R):

#[derive(Clone, DataFrameRow)]
#[dataframe(tag = "_type")]
enum Event {
    Click { id: i64, x: f64, y: f64 },
    Impression { id: i64, slot: String },
    Error { id: i64, code: i32, message: String },
}

// In R:
//   _type       id    x     y   slot        code  message
//   Click       1     1.5   2.5 NA          NA    NA
//   Impression  2     NA    NA  top_banner  NA    NA
//   Error       3     NA    NA  NA          404   not found

Key points:

  • All enum columns are Vec<Option<T>> (absent fields get None)
  • tag = "col" adds a variant discriminator column
  • align is implicit for enums (accepted but not required)

πŸ”—Type Conflicts Across Variants

If two variants use the same field name with different types, the derive fails by default. Use conflicts = "string" to coerce all conflicting columns to String:

#[derive(Clone, DataFrameRow)]
#[dataframe(conflicts = "string")]
enum Mixed {
    A { value: f64 },
    B { value: String },  // value column becomes String for all variants
}

πŸ”—Enum Field Attributes

All field-level attributes (skip, rename, as_list, width) work in enum variants too:

#[derive(Clone, DataFrameRow)]
#[dataframe(tag = "_type")]
enum Observation {
    Point { id: i32, coords: [f64; 2] },          // coords β†’ coords_1, coords_2
    Measurement { id: i32, #[dataframe(width = 3)] readings: Vec<f64> },
}

πŸ”—With Serde (when serde feature enabled)

use serde::Serialize;

#[derive(Serialize, DataFrameRow)]  // Serialize implies IntoList!
struct Reading {
    timestamp: f64,
    temperature: f64,
    humidity: f64,
}

#[miniextendr]
fn get_readings() -> ReadingDataFrame {
    Reading::to_dataframe(vec![
        Reading { timestamp: 1.0, temperature: 20.5, humidity: 65.0 },
        Reading { timestamp: 2.0, temperature: 21.0, humidity: 63.0 },
    ])
}

πŸ”—Generated Methods

The derive macro adds these methods to your row type:

impl Measurement {
    /// Name of the generated companion DataFrame type
    pub const DATAFRAME_TYPE_NAME: &'static str = "MeasurementDataFrame";

    /// Transpose rows to columns
    pub fn to_dataframe(rows: Vec<Self>) -> MeasurementDataFrame;

    /// Transpose columns back to rows
    pub fn from_dataframe(df: MeasurementDataFrame) -> Vec<Self>;
}

πŸ”—Iterating Over Rows

The generated DataFrame type implements IntoIterator:

let df = get_measurements();

// Iterate over rows
for measurement in df {
    println!("Time: {}, Value: {}", measurement.time, measurement.value);
}

// Or collect back to Vec
let rows: Vec<Measurement> = df.into_iter().collect();

πŸ”—Requirements

The row type must implement IntoList:

  • Automatically via #[derive(IntoList)]
  • Via #[derive(Serialize)] when serde feature is enabled
  • Via manual implementation using List::from_raw_pairs() (for heterogeneous fields)

πŸ”—Container Attributes

#[derive(DataFrameRow)]
#[dataframe(
    name = "Measurements",     // Custom DataFrame name (default: {StructName}DataFrame)
    tag = "_type",             // Add variant discriminator column (enums)
    parallel,                  // Enable rayon parallel fill (requires `rayon` feature)
    conflicts = "string",      // Coerce type conflicts to String (enums)
)]
struct Measurement { /* ... */ }

πŸ”—Parallel Fill with Rayon

Every DataFrameRow companion type gets explicit sequential and parallel constructors. The parallel path requires the rayon feature.

# Cargo.toml
[dependencies]
miniextendr-api = { version = "0.1", features = ["rayon"] }
#[derive(Clone, IntoList, DataFrameRow)]
pub struct Point {
    pub x: f64,
    pub y: f64,
    pub label: String,
}

#[miniextendr]
pub fn big_points() -> PointDataFrame {
    let points: Vec<Point> = (0..100_000)
        .map(|i| Point { x: i as f64, y: (i * 2) as f64, label: format!("p{}", i) })
        .collect();
    // Explicit parallel β€” always uses rayon, no threshold check
    PointDataFrame::from_rows_par(points)
}

Generated methods on every companion type:

  • DfType::from_rows(rows) β€” sequential push-based fill (always available)
  • DfType::from_rows_par(rows) β€” parallel scatter-write via ColumnWriter (#[cfg(feature = "rayon")])
  • From<Vec<Row>> / RowType::to_dataframe(rows) β€” sequential (unchanged)

How from_rows_par works:

  • Pre-allocates column vectors to exact size, then fills indices in parallel
  • Uses rayon::par_iter() with ColumnWriter<T> for safe concurrent writes to disjoint indices
  • No threshold β€” the caller explicitly opts in to parallelism

Enum support: Parallel fill also works with enum DataFrameRow types:

#[derive(Clone, DataFrameRow)]
#[dataframe(tag = "_kind")]
pub enum Event {
    Click { id: i32, x: f64, y: f64 },
    Impression { id: i32, slot: String },
}

// Use the parallel path:
let df = EventDataFrame::from_rows_par(events);

Performance: Parallel fill is most beneficial for:

  • Large row counts (10k+)
  • Structs with many fields (wide data frames)
  • Expensive Clone/conversion per field

For small data frames, use from_rows to avoid rayon overhead.

πŸ”—Columnar Serialization via Serde

When you have types that already implement serde::Serialize, you can convert them directly to R data frames without deriving DataFrameRow:

use serde::Serialize;
use miniextendr_api::serde::ColumnarDataFrame;

#[derive(Serialize)]
struct LogEntry {
    timestamp: f64,
    level: String,
    message: String,
}

#[miniextendr]
fn get_logs() -> miniextendr_api::ffi::SEXP {
    let logs = vec![
        LogEntry { timestamp: 1.0, level: "INFO".into(), message: "started".into() },
        LogEntry { timestamp: 2.0, level: "ERROR".into(), message: "failed".into() },
    ];
    ColumnarDataFrame::from_rows(&logs).expect("serialization failed")
}

Requires the serde feature. Column types are inferred from serde field types:

Rust TypeR Column
boollogical
i8/i16/i32integer
i64/u64/f32/f64numeric
String/&strcharacter
Option<T>Same type with NA for None

This is useful when you already have serde-serializable types and don’t want to add IntoList + DataFrameRow derives. For new types, prefer #[derive(DataFrameRow)] which gives you a typed companion type and better ergonomics.


πŸ”—Approach 2: DataFrame<T>

Generic type for transposing row-oriented data. Works with any T: IntoList.

πŸ”—With IntoList Types

#[derive(IntoList)]
struct Point {
    x: f64,
    y: f64,
}

#[miniextendr]
fn points() -> DataFrame<Point> {
    DataFrame::from_rows(vec![
        Point { x: 1.0, y: 2.0 },
        Point { x: 3.0, y: 4.0 },
    ])
}

πŸ”—With Serialize Types

When the serde feature is enabled, use from_serialize() for the simplest experience:

use serde::Serialize;
use miniextendr_api::SerializeDataFrame;

#[derive(Serialize)]
struct Event {
    timestamp: f64,
    message: String,
}

#[miniextendr]
fn events() -> SerializeDataFrame<Event> {
    let events = vec![
        Event { timestamp: 1.0, message: "start".into() },
        Event { timestamp: 2.0, message: "end".into() },
    ];
    SerializeDataFrame::from_serialize(events)
}

SerializeDataFrame<T> is a type alias for DataFrame<AsSerializeRow<T>>, and from_serialize() handles wrapping each row automatically.

Alternative (explicit wrapping):

If you prefer the explicit form or need more control:

#[miniextendr]
fn events() -> DataFrame<AsSerializeRow<Event>> {
    DataFrame::from_rows(vec![
        AsSerializeRow(Event { timestamp: 1.0, message: "start".into() }),
        AsSerializeRow(Event { timestamp: 2.0, message: "end".into() }),
    ])
}

πŸ”—Methods

impl<T: IntoList> DataFrame<T> {
    pub fn new() -> Self;
    pub fn from_rows(rows: Vec<T>) -> Self;
    pub fn push(&mut self, row: T);
    pub fn len(&self) -> usize;
    pub fn is_empty(&self) -> bool;
}

// Also implements FromIterator
let df: DataFrame<Point> = points.into_iter().collect();

πŸ”—Approach 3: Manual Implementation

For full control or complex scenarios, implement IntoDataFrame manually.

πŸ”—Column-Oriented Data (Homogeneous Types)

For data frames where all columns have the same element type, use List::from_pairs():

struct TimeSeries {
    timestamps: Vec<f64>,
    values: Vec<f64>,
}

impl IntoDataFrame for TimeSeries {
    fn into_data_frame(self) -> List {
        List::from_pairs(vec![
            ("timestamp", self.timestamps),
            ("value", self.values),
        ])
        .set_class_str(&["data.frame"])
        .set_row_names_int(self.timestamps.len())
    }
}

#[miniextendr]
fn time_series() -> TimeSeries {
    TimeSeries {
        timestamps: vec![1.0, 2.0, 3.0],
        values: vec![10.0, 20.0, 30.0],
    }
}
// Automatically converts to data.frame via IntoR

πŸ”—Column-Oriented Data (Heterogeneous Types)

Important: For data frames with different column types, use List::from_raw_pairs() instead of from_pairs():

use miniextendr_api::IntoR;

struct MixedData {
    names: Vec<String>,
    ages: Vec<i32>,
    heights: Vec<f64>,
}

impl IntoDataFrame for MixedData {
    fn into_data_frame(self) -> List {
        List::from_raw_pairs(vec![
            ("name", self.names.into_sexp()),
            ("age", self.ages.into_sexp()),
            ("height", self.heights.into_sexp()),
        ])
        .set_class_str(&["data.frame"])
        .set_row_names_int(self.names.len())
    }
}

Why? from_pairs() is generic over a single type T: IntoR, so all columns must have the same type. from_raw_pairs() accepts pre-converted SEXP values, allowing heterogeneous columns.

πŸ”—Call-Site Control with Wrappers

Force conversion for a specific return without changing the type’s default:

#[miniextendr]
fn as_dataframe() -> ToDataFrame<TimeSeries> {
    ToDataFrame(TimeSeries { /* ... */ })
}

// Or use the extension trait
#[miniextendr]
fn with_extension() -> ToDataFrame<TimeSeries> {
    TimeSeries { /* ... */ }.to_data_frame()
}

πŸ”—Type-Level Default with PreferDataFrame

Make a type always convert to data.frame when returned:

#[derive(PreferDataFrame)]
struct MyData {
    // ... fields ...
}

impl IntoDataFrame for MyData {
    fn into_data_frame(self) -> List {
        // ... implementation ...
    }
}

#[miniextendr]
fn get_data() -> MyData {  // Automatically becomes data.frame in R
    MyData { /* ... */ }
}

πŸ”—Comparison: Row vs Column Oriented

πŸ”—Row-Oriented (Vec of structs)

vec![
    Measurement { time: 1.0, value: 10.0 },
    Measurement { time: 2.0, value: 20.0 },
]

Pros:

  • Natural Rust data structure
  • Easy to work with in Rust code
  • Type-safe field access

Cons:

  • Needs transposition for R
  • Memory layout not optimal for R

πŸ”—Column-Oriented (Struct of Vecs)

MeasurementDataFrame {
    time: vec![1.0, 2.0],
    value: vec![10.0, 20.0],
}

Pros:

  • Direct R data.frame representation
  • No transposition needed
  • Memory efficient for R

Cons:

  • Less ergonomic in Rust
  • Easy to create invalid data (mismatched lengths)

πŸ”—Best Practices

πŸ”—Choosing an Approach

  1. Use #[derive(DataFrameRow)] when:

    • You have row-oriented data in Rust
    • You want type-safe field access
    • You want automatic conversions
  2. Use DataFrame<T> when:

    • You need generic code over many row types
    • You’re working with existing IntoList types
    • You want runtime flexibility
  3. Use manual impl IntoDataFrame when:

    • You already have column-oriented data
    • You need custom data.frame attributes
    • You’re handling complex validation

πŸ”—Handling Missing Data

Use Option<T> for nullable fields:

#[derive(IntoList, DataFrameRow)]
struct Record {
    id: i32,
    value: Option<f64>,  // Becomes NA in R when None
}

πŸ”—Validation

Always validate column lengths when manually constructing data frames:

impl IntoDataFrame for MyData {
    fn into_data_frame(self) -> List {
        assert_eq!(self.col1.len(), self.col2.len(), "Column length mismatch");

        List::from_pairs(vec![
            ("col1", self.col1),
            ("col2", self.col2),
        ])
        .set_class_str(&["data.frame"])
        .set_row_names_int(self.col1.len())
    }
}

πŸ”—Implementation Notes

πŸ”—Row Names

R data frames require row names. miniextendr provides two helpers:

list.set_row_names_int(n)     // Compact: c(NA, -n) form
list.set_row_names(names_vec)  // Explicit: character vector

πŸ”—Class Attribute

Data frames need the "data.frame" class:

list.set_class_str(&["data.frame"])

For subclasses (e.g., tibbles):

list.set_class_str(&["tbl_df", "tbl", "data.frame"])

πŸ”—Empty Data Frames

List::from_raw_pairs(Vec::<(&str, SEXP)>::new())
    .set_class_str(&["data.frame"])
    .set_row_names_int(0)

πŸ”—Feature Flags

  • Base functionality: No features required
  • Serde integration: Requires serde feature
    • Enables impl IntoList for T: Serialize
    • Enables AsSerializeRow<T> wrapper
    • Allows #[derive(Serialize, DataFrameRow)]

πŸ”—Examples

See rpkg/src/rust/dataframe_examples.rs for complete working examples.