Skip to content

Hive Installation using Spark Engine

Awantik Das edited this page Feb 6, 2019 · 2 revisions

Make sure below environment variables exist in ~/.bashrc file. JAVA_HOME should point to you java installation directory. export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 export JRE_HOME=$JAVA_HOME/jre export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop export HADOOP_CLASSPATH=/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME

#HIVE export HIVE_HOME=/usr/lib/hive/apache-hive-2.3.0-bin PATH=$PATH:$HIVE_HOME/bin export HIVE_CONF_DIR=$HIVE_HOME/conf export PATH

#SPARK export SPARK_HOME=/usr/lib/spark/spark-2.2.0-bin-hadoop2.7 PATH=$PATH:$SPARK_HOME/bin export PATH export HADOOP_YARN_HOME=$HADOOP_HOME export YARN_CONF_DIR=$HADOOP_CONF_DIR

Reload environment variables source ~/.bashrc

link scala and spark jars in Hive lib folder cd $HIVE_HOME/lib ln -s $SPARK_HOME/jars/scala-library*.jar ln -s $SPARK_HOME/jars/spark-core*.jar ln -s $SPARK_HOME/jars/spark-network-common*.jar

Add below configurations in hive-site.xml to use Spark execution engine vi $HIVE_HOME/conf/hive-site.xml hive.execution.engine spark Use Map Reduce as default execution engine spark.master spark://localhost:7077 spark.eventLog.enabled true spark.eventLog.dir /tmp spark.serializer org.apache.spark.serializer.KryoSerializer spark.yarn.jars hdfs://localhost:54310/spark-jars/*

Make sure below properties exist in yarn-site.xml. If not add them. These jar paths are needed when using Spark as execution engine for hive. I had to use absolute paths instead of environment variables in below configuration. For some reason environment variables did not work. Make sure these paths refer to your hadoop installation directories. vi $HADOOP_CONF_DIR/yarn-site.xml yarn.application.classpath /usr/local/hadoop/share/hadoop/mapreduce/,/usr/local/hadoop/share/hadoop/mapreduce/lib/,/usr/local/hadoop/share/hadoop/hdfs/,/usr/local/hadoop/share/hadoop/hdfs/lib/,/usr/local/hadoop/share/hadoop/common/lib/,/usr/local/hadoop/share/hadoop/common/,/usr/local/hadoop/share/hadoop/yarn/lib/,/usr/local/hadoop/share/hadoop/yarn/ mapreduce.application.classpath /usr/local/hadoop/share/hadoop/mapreduce/,/usr/local/hadoop/share/hadoop/mapreduce/lib/,/usr/local/hadoop/share/hadoop/hdfs/,/usr/local/hadoop/share/hadoop/hdfs/lib/,/usr/local/hadoop/share/hadoop/common/lib/,/usr/local/hadoop/share/hadoop/common/,/usr/local/hadoop/share/hadoop/yarn/lib/,/usr/local/hadoop/share/hadoop/yarn/

Remove old version of Hive jars from Spark jars folder. This step should be changed as per your version of Hive jars in Spark folder. You can determine version by looking at content of $SPARK_HOME/jars folder with below command ls $SPARK_HOME/jars/hive.jar

In my case those jars were having version 1.2.1. So remove them with below command. rm $SPARK_HOME/jars/hive1.2.1*

Run below command to copy new version of Hive jars to Spark jars folder. These jars are necessary in order to run Hive with new Spark engine that we have. cp $HIVE_HOME/lib/hive.jar $SPARK_HOME/jars/

Run below commands to copy spark jars on HDFS spark-jars folder hadoop fs -mkdir /spark-jars hadoop fs -put $SPARK_HOME/jars/*.jar /spark-jars/

Clone this wiki locally