Apache Spark in a distributed in-memory cluster computing system. Many people including me like to use Spark in python with IPython for a data analysis purpose.
But unfortunately the configuration is always a little bit tricky for the moment.
Steps to follow:
- Download spark and unzip:
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz && tar -zxvf spark-1.5.1-bin-hadoop2.6.tgz
- Configure the global variable
SPARK_HOMEto the unzipped folder, don’t forget to source the .bashrc or .zshrc.
- The installation is simple by using
pip install findspark.
- Get into IPython and play
That’s it, go play with the