kmrrun

NAME
SYNOPSIS
DESCRIPTION
OPTIONS

NAME

kmrrun − MapReduce command line tool

SYNOPSIS

kmrrun -n m_num[:r_num] -m mapper [-k kvgenerator] [-r reducer]
[--ckpt] input

DESCRIPTION

kmrrun starts a MapReduce process whose mapper and reducer are user programs. It can run serial/MPI program as mapper and reducer. By using kmrrun, users can easily execute a MapReduce task and also embarrassingly parallel tasks.

Users have to prepare three programs to run MapReduce using kmrrun: mapper, reducer and key-value pair generator programs. The key-value pair generator program is used to parse output files of mapper and then generate key-value pairs passed to reducer.

OPTIONS

The following options are supported:

-n m_num[:r_num]

Specifies the number of processes to run mapper and reducer. If ’r_num’ is omitted, the reducer is run with the same number of processes as the mapper. When the number is 1 the mapper/reducer program is assumed to be a serial program, and when the number is bigger than 1 the program is assumed to be an MPI program.

Default is 1 for both m_num and r_num.

-m mapper

Specifies a mapper program. The program can have arguments, where they are separated by a whitespace.

Mapper specification: A mapper can be a serial program or an MPI program. It receives name of the input file as the last parameter. If the output of mapper needs to be passed to reducer, the output should be written to files which can be distinguished by input file name to the mapper. For example, if the input file to the mapper is "000", the name of the output is "000.dat".

-k kvgenerator

Specifies a key-value pair generator program. The program can have arguments, where they are separated by a whitespace.

KV generator specification: A key-value pair generator program should be a serial program and should read the output files of a mapper. It will receive a input file name to the mapper program as the last parameter. So if the mapper receives "000" as input and generates "000.dat" as output, the KV generator should construct the output file name of the mapper ("000.dat") from input file name of the mapper ("000"). It should output key-value to STDOUT. Output data is a sequence of a line "key value\n", where the fields are separated by a whitespace.

-r reducer

Specifies a reducer program. The program can have arguments, where they are separated by a whitespace.

Reducer Specification: A reducer can be a serial program or an MPI program. It receives a file as input as the last parameter. Name of the input file is "key" and its content is a sequence of a line "key value\n", where the fields are separated by a whitespace.

--ckpt

Enables checkpoint/restart.