How to call shell script into perl?

Can I run Hive queries from Shell Script?

  • I have a shell script I have written to pump all my logs into HDFS. After that I want to move my HDFS data into Hive tables. Can I add my Hive queries in the shell script itself? If so, please suggest me how to do it. I have read somewhere to write all queries in a file and call it. Would it make sense? Looking for a clear picture. Response is appreciated.

  • Answer:

    hive -e "query goes here"   hive -f  "path to file"   Both of above options can be executed from shellscript

Harsha Hulageri at Quora Visit the source

Was this solution helpful to you?

Other answers

U can easily go with hive -e "query;"this should surely satisfy ur need.

Yaswanth Sree Harsha

Excerpts from : MapR article. Feel free to post it. I am posting just from saved page.There are many great examples out there for using the Hive shell, as well as examples of ways to automate many of the http://doc.mapr.com/display/MapR/Ecosystem+Guide. However, if you’re http://doc.mapr.com/display/MapR/Getting+Started+with+Hive, or need something fast that won’t stay around long, then all you need to do is throw a few lines of code together with some existing programs in order to avoid re-inventing the workflow. In this blog post, I’ll share a few quick tips on using the Hive shell inside scripts. We’ll take a look at a simple script that needs to pull an item or count, and then look at two ways to use the http://doc.mapr.com/display/MapR/Hive shell to get an answer.All of the steps in this example were executed on Hive 0.13 on a http://doc.mapr.com/display/MapR/Planning+the+Cluster using CentOS.A Look at the Data SetupIn this setup, we have a data set that consists of travel times for specific traffic routes. We’ve pulled together travel times for days of the year, so when someone wants to know what to expect, we can quickly pull the info. In the past this was done on a traditional RDBMS, but we moved it off that in order to save on processing time and money, and are now using the https://www.mapr.com/. The table is an external table and is tab-delimited. There are more efficient ways to store the data, but we’ll cover those comparisons in a later post. The table is also partitioned by collection day. If you’re not familiar with a partition, it just means that when we enter data into our table, it is grouped into folders of records on the file system. The data in each folder in this case represents a day. CREATE EXTERNAL TABLE traffic_speeds ( `id` STRING, `collectiontime` string, `speed` STRING, `traveltime` STRING, `linkpoints` STRING) PARTITIONED BY ( `collectionday` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION 'maprfs:/mapr/my_cluster/data/traffic/trafficspeeds'; Here is a sample of some typical data: Ways to Use HiveIn this example, we’re going to use a simple Bash script to extract some values out of a Hive table using the Hive shell. The same methods can be used for just about any scripting language. In this case, we are going to take a few parameters as arguments and then execute them on Hive using the Hive shell. The results are captured as a variable and then echoed to standard out. There are two basic ways to pass things off to the Hive shell to execute. This script will walk through both of those options. The first option will consist of the Hive command running an external HQL script and returning a value. The second option will consist of passing all the needed commands to Hive at one time and returning a value.Option 1: Using Hive in Conjunction with an HQL ScriptOne way to use Hive inside a script is to have the Hive shell execute an HQL file. This is accomplished with the file option. hive -f my_script.hqlAn HQL script is just a series of Hive query language commands. They are the same ones you would use in the Hive shell.example.hql: show databases; show tables; set dfs.block.size=1073741824; select * from my_database.my_table limit 10; In order to make this a process that can be used more generically, Hive allows you to pass in variables using the -hiveconf option hive -hiveconf MY_VAR1=value1 -hiveconf MY_VAR2=value2 -f my_script.hqlInside the HQL file, it is used like this: show databases; show tables; set dfs.block.size=${hiveconf:MY_ VAR1}; select * from ${hiveconf:MY_ VAR2} limit 10; Operational messages are not typically passed back to the parent script and can be removed with the silent options -S. This will make things far less chatty. hive -S -hiveconf MY_VAR1=value1 -hiveconf MY_VAR2=value2 -f my_script.hqlPulling a value from the command will vary depending on your script language of choice, but one thing that will hold true for all of them is ensuring that only one Hive command that returns rows or values is returned. In the example above, we have three. If the data from the select statement is what I want to get cleanly, then I would remove the show commands from the HQL file. set dfs.block.size=${hiveconf:MY_ VAR1}; select * from ${hiveconf:MY_ VAR2} limit 10;An example of pulling a value using Bash looks something like this: my_value=`hive -S -hiveconf MY_VAR1=value1 -hiveconf MY_VAR2=value2 -f my_script.hql`Option 2: Passing Commands Directly to HiveAnother way to use Hive inside a script is to pass the HQL commands as a query string for the Hive shell to execute. This is accomplished with the -e option. hive -e "select * from my_database.my_table limit 10;"You can actually add several HQL commands in that string, which comes in handy when you have to specify a database because the next command only has a table option. An example is when you load or add a partition with the ALTER command. hive -e "USE my_database; alter table my_table add if not exists partition(my_partition=my_value);"As with the previous example, the only values returned are values pulled from the dataset. Hive operational messages may show up on the screen, but are not included in the response. You can remove that chatter by specifying silent mode again with the -S option. hive -S -e "USE my_database; alter table my_table add if not exists partition(my_partition=my_value); select * from my_table limit 10;" As I mentioned earlier, pulling a value from the command will vary depending on your script language of choice. The one thing that will hold true for all of them is ensuring that only one Hive command has rows or values that are returned. Another example of pulling a value using Bash looks something like this: my_value=$( hive -S -e "USE my_database; alter table my_table add if not exists partition(my_partition=my_value); select * from my_table limit 10;") An Example Script That Uses HiveLet’s take a look at a Bash script that runs either of the options mentioned above in a more generic fashion in order to pull average traffic transit times for a given station.Script OverviewThe script for the rest of the doc is called “hive_scripts.sh”. It allows for two operations: “pull_with_hql_script” or “pull_with_one_liner”. The basic flow is: ./hive_scripts.sh command [options]Example Using External Hive Files with an HQL ScriptLet’s take a look at using Hive HQL files in conjunction with a Bash script in order to ascertain a given station’s average traffic speed.Command argumentsThe arguments for “./http://hive_script.sh pull_with_hql_script” Example executionIn this example, the command line arguments are parsed, and passed to Hive to execute in our HQL script.Command: ./hive_scripts.sh pull_with_hql_script -c id=A2F36 -d default -t traffic_speed -r speedThe select statement is built to execute an average of the traffic times throughout the range specified. In this case, a single station id was selected. select="AVG($requested_column)"The Hive command is executed and the value is echoed to standard out. The variable Hive_script was hard coded, but could be changed to a passed object like all the other variables. hive_script="hive_script.hql" my_value=`hive -S -hiveconf MY_SELECT=$select -hiveconf MY_COMPARISON=$comparison -hiveconf MY_DATABASE=$database -hiveconf MY_TABLE=$table -f $hive_script` echo "returned value = $my_value" Example passing all commands to Hive directlyLet’s take a look at passing Hive a set of HQL commands as a string with a Bash script to find a given station’s average traffic speed:Command argumentsThe arguments for “./http://hive_script.sh pull_with_one_liner” Example executionIn this example, the command line arguments are parsed, and passed to Hive to execute as a string. Additionally, there is an example of checking to see if partitions exist and are loaded.Command: ./hive_scripts.sh pull_with_one_liner -c id=A2F36 -d default -t traffic_speed -r speed -b 20130621 -e 20130629The select statement is built to execute an average of the traffic times throughout the range specified. In this case, a single station id was selected. A date range was also passed and could be used to enhance the data selection and filtering process, but in this case it was used to load partitions for the given range if they were not found.As in the example above, the select statement is built for use in the command. select="AVG($requested_column)"The values are passed and used to build the query. The query is passed to Hive and executed. The value is stored as my_value and is echoed to standard out. # lets build the query we will execute in the hive shell my_query="set mapred.reduce.tasks=30;" my_query="$my_query SELECT $select" my_query="$my_query FROM (" my_query="$my_query SELECT DISTINCT collectiontime, id, collectionday, speed" my_query="$my_query FROM $database.$table" my_query="$my_query WHERE $comparison" my_query="$my_query ) t" my_query="$my_query GROUP by id, collectionday;" # echo the query passed to the hive shell just because echo "hive -S -e \"$my_query\"" my_value=$(hive -e "$my_query") echo "returned value = $my_value" http://hive_scripts.sh/ #!/bin/bash # hive_scripts.sh # this will print the usage statements and exit usage() { case $1 in "") echo "" echo "Usage: hive_scripts.sh command [options]" echo " for info on each command: --> hive_scripts.sh command -h|--help" echo "Commands:" echo " hive_scripts.sh pull_with_hql_script [options]" echo " hive_scripts.sh pull_with_one_liner [options]" echo "" echo "" ;; pull_with_hql_script) echo "" echo "Usage: hive_scripts.sh pull_with_hql_script [-h|--help]" echo "" echo " This is a quick example of using a hql script with a bash script to pull a value:" echo "" echo "Params:" echo " -c|--comparison comparison_statement: column=value" echo " -d|--database database_name" echo " -h|--help: print this help info and exit" echo " -r|--requested_column: column to get averaged value from" echo " -t|--table table_name" echo "Examples:" echo "" echo " ./hive_scripts.sh pull_with_hql_script -c a_column_name=a_value -d a_database_name -t a_table_name -r a_column_name" echo "" ;; pull_with_one_liner) echo "" echo "Usage: hive_scripts.sh pull_with_one_liner [-h|--help]" echo "" echo " This is a quick example of passing query strings directly to a hive shell with a bash script to pull a value:" echo "" echo "Params:" echo " -b|--begin_date yyyymmdd" echo " -c|--comparison comparison_statement: column=value" echo " -d|--database database_name" echo " -e|--end_date yyyymmdd" echo " -h|--help: print this help info and exit" echo " -r|--requested_column: column to get averaged value from" echo " -t|--table table_name" echo "Examples:" echo "" echo " ./hive_scripts.sh pull_with_one_liner -c a_column_name=a_value -d a_database_name -t a_table_name -r a_column_name -b 20120122 -e 20130122" echo "" ;; esac exit } # this will process command line arguments enough to get you to a specific function args() { echo "processing command request" case $1 in pull_with_hql_script) shift pull_with_hql_script $@ ;; pull_with_one_liner) shift pull_with_one_liner $@ ;; *) echo >&2 "Invalid comand: $1" usage ;; esac } pull_with_hql_script() { # init params requested_column="" database="" table="" comparison="" hive_script="hive_script.hql" select="" begin_date="" end_date="" # process args for this block while test $# -gt 0 do case $1 in -c|--comparison) shift comparison=$1 ;; -d|--database) shift database=$1 ;; -h|--help) usage pull_with_hql_script ;; -r|--requested_column) shift requested_column=$1 ;; -t|--table) shift table=$1 ;; *) echo >&2 "Invalid argument: $1" usage "pull_with_hql_script" ;; esac shift done # determine if any option is missing if [ x"$requested_column" == "x" ]; then echo "missing requested column: -r|--requested_column column_name_to_count" usage "pull_with_hql_script" fi if [ x"$database" == "x" ]; then echo "missing database name: -d|--database database_name" usage "pull_with_hql_script" fi if [ x"$table" == "x" ]; then echo "missing table name: -t|--table table_name" usage "pull_with_hql_script" fi if [ x"$comparison" == "x" ]; then echo "missing comparison clause WHERE : -c|--comparison comparison" usage "pull_with_hql_script" fi # set select statement select="AVG($requested_column)" # echo the command used to run the hive hql script just because echo "hive -S -hiveconf MY_SELECT=$select -hiveconf MY_COMPARISON=$comparison -hiveconf MY_DATABASE=$database -hiveconf MY_TABLE=$table -f $hive_script" my_value=`hive -S -hiveconf MY_SELECT=$select -hiveconf MY_COMPARISON=$comparison -hiveconf MY_DATABASE=$database -hiveconf MY_TABLE=$table -f $hive_script` echo "returned value = $my_value" exit } pull_with_one_liner() { # init params requested_column="" distinct="" database="" table="" comparison="" hive_script="hive_script.hql" select="" begin_date="" end_date="" load=true # process args for this block while test $# -gt 0 do case $1 in -b|--begin_date) shift begin_date=$1 ;; -c|--comparison) shift comparison=$1 ;; -d|--database) shift database=$1 ;; -e|--end_date) shift end_date=$1 ;; -h|--help) usage pull_with_one_liner ;; -r|--requested_column) shift requested_column=$1 ;; -t|--table) shift table=$1 ;; *) echo >&2 "Invalid argument: $1" usage "pull_with_one_liner" ;; esac shift done # determine if any option is missing if [ x"$requested_column" == "x" ]; then echo "missing requested column: -r|--requested_column column_name_to_count" usage "pull_with_one_liner" fi if [ x"$database" == "x" ]; then echo "missing database name: -d|--database database_name" usage "pull_with_one_liner" fi if [ x"$table" == "x" ]; then echo "missing table name: -t|--table table_name" usage "pull_with_one_liner" fi if [ x"$comparison" == "x" ]; then echo "missing comparison clause WHERE <comparison>: -c|--comparison comparison" usage "pull_with_one_liner" fi if [ x"$begin_date" == "x" ]; then echo "missing start date <2012-01-28>: -b|--begin_date yyyy-mm-dd" usage "pull_with_one_liner" fi if [ x"$end_date" == "x" ]; then echo "missing ending date <2014-02-17>: -b|--begin_date yyyy-mm-dd" usage "pull_with_one_liner" fi # get start date and end date in correct format begin_date=`date --date="$begin_date" '+%Y%m%d'` end_date=`date --date="$end_date" '+%Y%m%d'` # set select statement in affect overriding any set by argument select="avg($requested_column)" # before we query for information we may want to make sure we have all the partitions loaded? # in this case the table was external so there may be partitions that were not loaded # when the table was created # Here is an example of executing two commands with hive in a bash shell partitions=$(hive -e "USE $database; SHOW PARTITIONS $table") # loop through all the partitions and load if not present # in most case this is not needed. my_date=$begin_date while test $my_date -le $end_date do # Load partitions if [ "$load" = true ]; then # check if partition is present add_partition=true for partition in $partitions do my_timestamp=`date -d "$my_date" "+%s"` if [ "$partition" == "collectionday=$my_timestamp" ]; then add_partition=false; fi done if [ "$add_partition" = true ]; then # Here is an example of executing two statements in the hive shell echo "hive -e \"USE $database; alter table $table add if not exists partition(collectionday=$my_timestamp);\"" hive -e "USE $database; alter table $table add if not exists partition(collectionday=$my_timestamp);" fi fi my_date=`date --date="$my_date + 1 day" '+%Y%m%d'` # end of while loop done # lets build the query we will execute in the hive shell my_query="set mapred.reduce.tasks=30;" my_query="$my_query SELECT $select" my_query="$my_query FROM (" my_query="$my_query SELECT DISTINCT collectiontime, id, collectionday, speed" my_query="$my_query FROM $database.$table" my_query="$my_query WHERE $comparison" my_query="$my_query ) t" my_query="$my_query GROUP by id, collectionday;" # echo the query passed to the hive shell just because echo "hive -S -e \"$my_query\"" my_value=$(hive -e "$my_query") echo "returned value = $my_value" exit } # ------------------------------------------------------------------------------------- # Beginning of script execution # args $@ hive_script.hql set mapred.reduce.tasks=30; use ${hiveconf:MY_DATABASE}; SELECT ${hiveconf:MY_SELECT} FROM ( SELECT DISTINCT collectiontime, id, collectionday, speed FROM ${hiveconf:MY_TABLE} WHERE ${hiveconf:MY_COMPARISON}) t GROUP by id, collectionday; Using the Hive shell inside scripts is a great way to avoid re-inventing the workflow. In this blog post, I’ve showed you how to use Hive in conjunction with an HQL script, as well as how to pass commands directly to Hive.

Anand Mahajan

Aye, you can execute Hive queries from Shell Scripts, by using hive -e you can achieve this on shell prompt. to embed your Hive queries in Shell Scripts you need to write like this:**************************************************************#!/bin/bashhive -e "select * from bucketing_test" >select.txtresult=echo$?echo$?echo $?if [ $result -ne 0 ]; thenecho "Aye here we have a hive Error!!!!"echo "Hive error number is: $result"exit 1elseecho "no error, do your stuffs"fi******************************************************************* Hope this helps!!!!!!!!!!!!!

Shiv Kanaojiya

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.