Local randomness in R
2019-08-13
One approach of using random number generation inside a function without affecting outer state of random generator.
Prologue
Let’s say we have a deterministic (non-random) problem for which one of the solutions involves randomness. One very common example of such problem is a function minimization on certain interval: it can be solved non-randomly (like in most methods of optim()), or randomly (the simplest approach being to generate random set of points on interval and to choose the one with the lowest function value).
What is a “clean” way of writing a function to solve the problem? The issue with direct usage of randomness inside a function is that it affects the state of outer random number generation:
# Demo problem solving function
minimize_f <- function(f, from = 0, to = 1, n = 1e3) {
points <- runif(n, min = from, max = to)
invisible(min(f(points)))
}
# Reference random number output
set.seed(101)
runif(1)
## [1] 0.3721984
# Test random number output is different from reference one. But we want it to
# be the same.
set.seed(101)
minimize_f(sin)
runif(1)
## [1] 0.1016229
So how can we get “more clean” implementation which does not affect outer state? This short post is inspired by the following sources: this StackOverflow question by Yihui Xie and this cookbook advice.
Local randomness
The state of random number generation is stored in .Random.seed variable, which is “an integer vector” and it “can be saved and restored, but should not be altered by the user”. This gives us a very big hint about how to implement “local randomness”: capture state at the start of the function, make necessary computations, and restore state at the end. Bad news is, this also means that we enter here the dark realm of variables and their environments.
How to “save state”? In help page there is a note: “The object .Random.seed
is only looked for in the user’s workspace”. Here “user’s workspace” seems to mean global environment, which should be addressed with variable .GlobalEnv
. So, to “save state” we need to get a value of .Random.seed
variable inside global environment. This is a job for get0():
get_rand_state <- function() {
# Using `get0()` here to have `NULL` output in case object doesn't exist.
# Also using `inherits = FALSE` to get value exactly from global environment
# and not from one of its parent.
get0(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
}
How to “restore state”? We need to assign certain value (of previously saved state) to a .Random.seed
variable in global environment. This is a job for assign():
set_rand_state <- function(state) {
# Assigning `NULL` state might lead to unwanted consequences
if (!is.null(state)) {
assign(".Random.seed", state, envir = .GlobalEnv, inherits = FALSE)
}
}
How to make “local randomness”? We can now save and restore random state. The final peace of a puzzle is to make restoration at the end of computations inside a function. This is a job for on.exit(): call for set_rand_state()
should be wrapped in on.exit()
to perform restoration exactly at the moment when function ends all operations it is supposed to do.
Notes about positioning of calls inside a function:
- Call to
get_rand_state()
should be done right at the beginning of a function body to capture the state just before the function was called. - Simply positioning call to
set_rand_state()
inside function body right before returning result might be not enough, because previous lines of code can terminate earlier (for example, with error). Functionon.exit()
guarantees execution of expression.
Giving all that, the “clean” way of implementing “local randomness” is the following:
my_f <- function() {
old_state <- get_rand_state()
on.exit(set_rand_state(old_state))
# The rest of the code
}
Let’s check this solution on practice:
minimize_f_clean <- function(f, from = 0, to = 1, n = 1e3) {
old_state <- get_rand_state()
on.exit(set_rand_state(old_state))
points <- runif(n, min = from, max = to)
invisible(min(f(points)))
}
# Reference random number output (repeated for reading convenience)
set.seed(101)
runif(1)
## [1] 0.3721984
# Output of `runif(1)` is the same as reference one, which was the goal
set.seed(101)
minimize_f_clean(sin)
runif(1)
## [1] 0.3721984
Epilogue
- Creating a function with “local randomness” although requires some dark R magic (with
get0()
,assign()
, andon.exit()
usage), is pretty straightforward. - If you have some non-trivial R problem, there is a good chance that Yihui Xie has already posted a question on StackOverflow about it.
sessionInfo()
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
##
## locale:
## [1] LC_CTYPE=ru_UA.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=ru_UA.UTF-8 LC_COLLATE=ru_UA.UTF-8
## [5] LC_MONETARY=ru_UA.UTF-8 LC_MESSAGES=ru_UA.UTF-8
## [7] LC_PAPER=ru_UA.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=ru_UA.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_3.6.1 magrittr_1.5 bookdown_0.11 tools_3.6.1
## [5] htmltools_0.3.6 yaml_2.2.0 Rcpp_1.0.1 stringi_1.4.3
## [9] rmarkdown_1.13 blogdown_0.12 knitr_1.23 stringr_1.4.0
## [13] digest_0.6.19 xfun_0.7 evaluate_0.14
Related
Statistical uncertainty with R and pdqr
2019-11-11
Arguments of stats::density()
2019-08-06
Announcing pdqr
2019-08-01