Run an R script as a background job

When working with R you may want to run a script that require quite a long time to complete. As modern computers generally have more cores and ram than required this is not going to be a problem in terms of computer resources. Nevertheless, the script will take your R prompt busy preventing you from any other activity. Function rstudioapi::jobRunScript() allows you avoid this problem.

Andrea Spano andreaspano.github.io (Quantide)www.quantide.com
2021-03-27

Introduction

When working with R you may want to run a script, say infile.R, that requires quite a long time to complete.

If you execute this script directly from the R console, it will keep your R session busy for long time preventing you from working on you current R session.

Note that R is single thread, meaning that it takes resources from a single core, while modern computer have plenty of resources available, usually eight or more cores. Assuming that your script is not too eager in terms of RAM, you can easily send your script on anther core and keep working on your current R session.

In order to achive this goal you can use three different approaches:

  1. R CMD BATCH
  2. RScript
  3. jobRunScript

R CMD BATCH

R CMD BATCH is a bash utility that allows you to run R in batch mode at the Unix terminal

The usual command is:

R CMD BATCH infile.R outfile.log


R executes the instructions from infile.R and writes the output, both stdout and stderr, to outfile.log

The above instruction silently implies two extra default parameters:

R CMD BATCH --restore --save infile.R outfile.log


I believe batch R scripts should be run by

R CMD BATCH --no-restore --no-save infile.R outfile.log


With the above command, when R terminates the execution of infile.R the working environment where the computation occurred is not saved.

Therefore, if you want save any object, this should be done explicitly with instructions like save(),saveRDS() or readr::write_rds().

You can always import the files generated from these functions into your working session by using instructions like load(), readRDS(). readr::read_rds().

In the above case, the R script runs in almost complete self isolation. In order to achieve complete isolation you should write:

R CMD BATCH --no-restore --no-save --no-environ --no-site-file --no-init-file infile.R outfile.log


Note that the long above command can be shortened into:

R CMD BATCH --vanilla infile.R outfile.log


In case you want to run R CMD BATCH and see R output at the Unix terminal rather than in a file, the following will do the trick:

mkfifo Rfifo
cat Rfifo &
R CMD BATCH --vanilla infile.R Rfifo


In some situations you may want to pass arguments to the batch script. Quite often this happens when you want to run the same script against different data sources In this case, you do not want to write a version of your script for each data source but a single script that takes your data source as a parameter.

Suppose a piece of code, saved in a file count.R, that simply count the number of rows in a file:

args <- commandArgs(trailingOnly = TRUE)
file <- args[[1]]
length(readLines(file))

You can invoke script count.R with:

R CMD BATCH --vanilla '--args ~/tmp/count.R i.txt'  count.R count.log 

You can read the output by editing count.log

Rscript

RScript is an executable command that comes with R.

It takes as input any properly quoted R expression or script file. Output is usually redirected to stdout.

As a basic example with a single expression

Rscript  -e '1+1'

Or with more than one expression separated bu semicolon

Rscript  -e '1+1; 2+2'

TO BE CONTINUED

jobRunScript (Kevin 2020)

Function jobRunScript() from package rstudioapi is a newer and interesting alternative.

jobRunScript() is an R function from package rstudioapi and not a bash utility. As a result, you can run it at you R prompt within RStudio. jobRunScript() will execute your script in background and immediately return your R session available to you. Clearly, as jobRunScript() is part of or package rstudioapi it will not work outside RStudio

require(rstudioapi)
jobRunScript(path = 'infile.R', 
             name = 'my long script',
             encoding = "unknown", 
             workingDir = NULL,
             importEnv = FALSE, 
             exportEnv = "R_GlobalEnv")


This command tells R to run your script and export the objects created within this environment

As stated by the function help:

Therefore, if you want to run a job a collect the results back in your working environmentyou need to set: exportEnv = "R_GlobalEnv"

In case you need to pass any object from your global environmemnt to teh script you may want to use importEnv = TRUE. Personally, I woulkd not reccomend this choice. A script would be better to be seft contained and not to depend from objects defined elsewhere.

Kevin, Ushey. 2020. “Rstudio Api V0.9.0.” https://www.rdocumentation.org/packages/rstudioapi/versions/0.9.0.

References

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Spano (2021, March 27). andreaspano blog: Run an R script as a background job. Retrieved from https://andreaspano.github.io/posts/2020-09-29-run-an-r-script-as-a-background-job/

BibTeX citation

@misc{spano2021run,
  author = {Spano, Andrea},
  title = {andreaspano blog: Run an R script as a background job},
  url = {https://andreaspano.github.io/posts/2020-09-29-run-an-r-script-as-a-background-job/},
  year = {2021}
}