When working with R you may want to run a script that require quite a long time to complete. As modern computers generally have more cores and ram than required this is not going to be a problem in terms of computer resources. Nevertheless, the script will take your R prompt busy preventing you from any other activity. Function rstudioapi::jobRunScript()
allows you avoid this problem.
When working with R you may want to run a script, say infile.R
, that requires quite a long time to complete.
If you execute this script directly from the R console, it will keep your R session busy for long time preventing you from working on you current R session.
Note that R is single thread, meaning that it takes resources from a single core, while modern computer have plenty of resources available, usually eight or more cores. Assuming that your script is not too eager in terms of RAM, you can easily send your script on anther core and keep working on your current R session.
In order to achive this goal you can use three different approaches:
R CMD BATCH is a bash utility that allows you to run R in batch mode at the Unix terminal
The usual command is:
R CMD BATCH infile.R outfile.log
R
executes the instructions from infile.R
and writes the output, both stdout and stderr, to outfile.log
The above instruction silently implies two extra default parameters:
R CMD BATCH --restore --save infile.R outfile.log
--save
: save the environment at the end of the computation into a file named .RData
--restore
: restore any .RData
file into the R environment prior starting the computation
I believe batch R scripts should be run by
R CMD BATCH --no-restore --no-save infile.R outfile.log
With the above command, when R terminates the execution of infile.R
the working environment where the computation occurred is not saved.
Therefore, if you want save any object, this should be done explicitly with instructions like save()
,saveRDS()
or readr::write_rds()
.
You can always import the files generated from these functions into your working session by using instructions like load()
, readRDS()
. readr::read_rds()
.
In the above case, the R script runs in almost complete self isolation. In order to achieve complete isolation you should write:
R CMD BATCH --no-restore --no-save --no-environ --no-site-file --no-init-file infile.R outfile.log
--no-environ
: do not use any environmental variables from the user profile--no-site-file
: do to execute .Rprofile.site prior running the computation--no-init-file
: do to execute local .Rprofile prior running the computationNote that the long above command can be shortened into:
R CMD BATCH --vanilla infile.R outfile.log
In case you want to run R CMD BATCH
and see R
output at the Unix terminal rather than in a file, the following will do the trick:
mkfifo Rfifo
cat Rfifo &
R CMD BATCH --vanilla infile.R Rfifo
In some situations you may want to pass arguments to the batch script. Quite often this happens when you want to run the same script against different data sources In this case, you do not want to write a version of your script for each data source but a single script that takes your data source as a parameter.
Suppose a piece of code, saved in a file count.R, that simply count the number of rows in a file:
args <- commandArgs(trailingOnly = TRUE)
file <- args[[1]]
length(readLines(file))
You can invoke script count.R with:
R CMD BATCH --vanilla '--args ~/tmp/count.R i.txt' count.R count.log
You can read the output by editing count.log
RScript
is an executable command that comes with R.
It takes as input any properly quoted R expression or script file. Output is usually redirected to stdout.
As a basic example with a single expression
Rscript -e '1+1'
Or with more than one expression separated bu semicolon
Rscript -e '1+1; 2+2'
TO BE CONTINUED
Function jobRunScript()
from package rstudioapi
is a newer and interesting alternative.
jobRunScript()
is an R function from package rstudioapi
and not a bash utility. As a result, you can run it at you R prompt within RStudio. jobRunScript()
will execute your script in background and immediately return your R session available to you. Clearly, as jobRunScript()
is part of or package rstudioapi
it will not work outside RStudio
require(rstudioapi)
jobRunScript(path = 'infile.R',
name = 'my long script',
encoding = "unknown",
workingDir = NULL,
importEnv = FALSE,
exportEnv = "R_GlobalEnv")
This command tells R to run your script and export the objects created within this environment
As stated by the function help:
path: The path to the R script to be run.
name: A name for the background job. When NULL (the default), the filename of the script is used as the job name.
encoding: The text encoding of the script, if known.
workingDir: The working directory in which to run the job. When NULL (the default), the parent directory of the R script is used.
importEnv: Whether to import the global environment into the job.
exportEnv: The name of the environment in which to export the R objects created by the job. Use "" (the default) to skip export, “R_GlobalEnv”’ to export to the global environment, or the name of an environment object to create an object with that name.
Therefore, if you want to run a job a collect the results back in your working environmentyou need to set: exportEnv = "R_GlobalEnv"
In case you need to pass any object from your global environmemnt to teh script you may want to use importEnv = TRUE. Personally, I woulkd not reccomend this choice. A script would be better to be seft contained and not to depend from objects defined elsewhere.
Kevin, Ushey. 2020. “Rstudio Api V0.9.0.” https://www.rdocumentation.org/packages/rstudioapi/versions/0.9.0.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Spano (2021, March 27). andreaspano blog: Run an R script as a background job. Retrieved from https://andreaspano.github.io/posts/2020-09-29-run-an-r-script-as-a-background-job/
BibTeX citation
@misc{spano2021run, author = {Spano, Andrea}, title = {andreaspano blog: Run an R script as a background job}, url = {https://andreaspano.github.io/posts/2020-09-29-run-an-r-script-as-a-background-job/}, year = {2021} }