Package 'fastreg'

Title: Fast Conversion and Querying of Danish Registers with 'Parquet'
Description: Converts large Danish register files ('sas7bdat') into 'Parquet' format with year-based 'Hive' partitioning and chunked reading for larger-than-memory files. Supports parallel conversion with a 'targets' pipeline and reading those registers into 'DuckDB' tables for faster querying and analyses.
Authors: Signe Kirk Brødbæk [aut, cre] (ORCID: <https://orcid.org/0009-0000-2208-7088>), Luke Johnston [aut] (ORCID: <https://orcid.org/0000-0003-4169-2616>), Steno Diabetes Center Aarhus [cph], Aarhus University [cph]
Maintainer: Signe Kirk Brødbæk <[email protected]>
License: MIT + file LICENSE
Version: 0.12.5
Built: 2026-06-02 14:55:33 UTC
Source: https://github.com/dp-next/fastreg

Help Index


Convert a single register SAS file to Parquet

Description

To be able to handle larger-than-memory files, the SAS file is converted in chunks. It does not check for existing files in the output directory. Existing data will not be overwritten, but might be duplicated if it already exists in the directory, since files are saved with UUIDs in their names.

Usage

convert(path, output_dir, chunk_size = 10000000L)

Arguments

path

Path to a single SAS file.

output_dir

Directory to save the Parquet output to. Must not include the register name as this will be extracted from path to create the register folder.

chunk_size

Number of rows to read and convert at a time.

Value

A tibble with a conversion log about each written chunk.

Examples

sas_file <- fs::path_package("fastreg", "extdata", "test.sas7bdat")
convert(
  path = sas_file,
  output_dir = fs::path_temp("path/to/output/file")
)

List Parquet datasets or files in a project

Description

Only lists Parquet files that end in ⁠part-*.parquet⁠. For datasets, it will only look for Parquet files with a year=YYYY in its path. This function will search the whole system for the project ID, so it might be slow sometimes.

Usage

list_parquet_datasets()

list_parquet_files()

Value

The path(s) to the Parquet datasets (as directories) or files.

Functions

  • list_parquet_datasets(): List all Parquet (Hive partitioned by year) datasets.

  • list_parquet_files(): List all Parquet files within a project.


List SAS files in a directory

Description

Lists all SAS register files (with the extension .sas7bdat case-insensitively) in the specified directory and its subdirectories.

Usage

list_sas_files(path)

Arguments

path

Directory to search.

Value

The path(s) to the found SAS file(s).

Examples

list_sas_files(fs::path_package("fastreg", "extdata"))

Read a single Parquet file or a partitioned dataset as DuckDB table

Description

This is useful when the read_register() incorrectly guesses or can't find the register.

Usage

read_parquet_dataset(path)

read_parquet_file(path)

Arguments

path

Path to a directory with the Parquet files within or a path to a Parquet file.

Value

A DuckDB table.

Functions

  • read_parquet_dataset(): Reads a Parquet partitioned directory.

  • read_parquet_file(): Reads a single Parquet file.


Read a Parquet register

Description

This function uses the options fastreg.project_rawdata_dir and fastreg.project_workdata_dir when set in options() or will try to guess the path by using the project ID and the base directories ⁠E:/<project-id>/rawdata/⁠ and ⁠E:/<project-id>/workdata/⁠. It only reads Parquet datasets (those that are partitioned with the pattern ⁠year=⁠). If this function doesn't work, use read_parquet_dataset() or read_parquet_file() instead.

Usage

read_register(name)

Arguments

name

Name of the Parquet dataset (i.e, the register name). See a list of available datasets with list_parquet_datasets().

Value

A DuckDB table.


Simulate example registers along with output paths for SAS files

Description

A helper function that simulates data using osdc::simulate_registers(). It's used in vignettes and tests. It simulates data for one or more registers and years.

Usage

simulate_registers_with_paths(
  registers,
  years = "",
  n = 1000,
  output_dir = fs::path_temp("E/rawdata/701010/")
)

Arguments

registers

Name of one or more registers. Must be a register that osdc::simulate_registers() can simulate. See osdc::registers() for a list of available registers.

years

One or more years to save the simulated data under. The year is used as a suffix in the file name. For example for register "bef" and year "1999", the file will be named bef1999.sas7bdat. Can also take no year.

n

Number of rows of data to simulate per year.

output_dir

The root directory appended to the created SAS paths. By default, the output_dir is a temp path that mimics the paths on DST, E/rawdata/701010. The default should technically be ⁠E:⁠ on Windows, but the default temporary directory on Windows for R doesn't allow using :, so we use E instead.

Value

A nested tibble with a column data containing the simulated data and a column output_path containing the path where the SAS file should be saved to. Pipe to purrr::pwalk(write_to_sas) or purrr::pmap(write_to_sas) to write each simulated dataset to a SAS file.

Examples

sim_regs <- simulate_registers_with_paths(
  registers = c("bef", "lmdb"),
  years = c("1999", "2000"),
  n = 10,
)
sim_regs

sim_regs |>
  purrr::pwalk(write_to_sas)

Use a targets pipeline for converting SAS registers to Parquet

Description

Copies a ⁠_targets.R⁠ template and a conversion log Quarto Markdown file to the given directory.

Usage

use_template(path = ".", open = rlang::is_interactive())

Arguments

path

Path to the directory where the targets pipeline and conversion log will be created. Defaults to the current directory.

open

Whether to open the file for editing.

Value

The path to the created ⁠_targets.R⁠ file, invisibly.

Examples

use_template(path = fs::path_temp(""))

Write simulated data to a SAS file

Description

A helper function that writes a data frame to a SAS file. It's used mainly in fastreg's vignettes and tests. Pipe the output of simulate_registers_with_paths() with purrr::pwalk() followed by this function to write each simulated dataset to a SAS file.

Usage

write_to_sas(data, output_path)

Arguments

data

A tibble containing the simulated data.

output_path

A string of the path to where the SAS file should be saved.

Value

Invisibly gives the path to the saved SAS file.