Setting up sparkR in RStudio

In order to use sparkR in RStudio, we have to call the package installed in SPARK_HOME.

Here is a script to be initiated at the beginning of the program:

1
2
3
4
5
6
# launch sparkR in R
Sys.setenv(SPARK_HOME='<spark_dir>')
.libPaths(c(file.path(Sys.getenv('SPARK_HOME'), 'R', 'lib'), .libPaths()))
.libPaths()
library(SparkR)
sc <- sparkR.init(master='yarn-client')

Here is an example with my configuration with Hortonworks HDP:

1
2
3
4
5
6
# launch sparkR in R
Sys.setenv(SPARK_HOME='/usr/hdp/current/spark-client')
.libPaths(c(file.path(Sys.getenv('SPARK_HOME'), 'R', 'lib'), .libPaths()))
.libPaths()
library(SparkR)
sc <- sparkR.init(master='yarn-client')