KMR
Classes | Macros | Functions
kmrrun.c File Reference

kmrrun is command line version of KMR and it runs a MapReduce program whose mapper and reducers are user specified programs. More...

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/param.h>
#include <dirent.h>
#include <unistd.h>
#include <getopt.h>
#include <time.h>
#include <errno.h>
#include <mpi.h>
#include "kmr.h"

Go to the source code of this file.

Classes

struct  cmdinfo
 

Macros

#define ARGSIZ   8
 
#define ARGSTRLEN   (8 * 1024)
 
#define DEFAULT_PROCS   1
 
#define LINELEN   32767
 
#define PATHLEN   1024
 
#define TMPDIR_PREFIX   "./KMRRUN_TMP"
 

Functions

static int add_command_kv (KMR_KVS *, int, char **, char *, int)
 
static void create_tmpdir (KMR *, char *, size_t)
 
static int delete_file (const struct kmr_kv_box, const KMR_KVS *, KMR_KVS *, void *, long)
 
static void delete_tmpdir (KMR *, char *)
 
static int generate_mapcmd_kvs (const struct kmr_kv_box, const KMR_KVS *, KMR_KVS *, void *, long)
 
static int generate_redcmd_kvs (const struct kmr_kv_box, const KMR_KVS *, KMR_KVS *, void *, long)
 
static void kmrrun_abort (int, const char *,...)
 
int main (int argc, char *argv[])
 
static void parse_args (char *, char *[])
 
static int run_kv_generator (const struct kmr_kv_box, const KMR_KVS *, KMR_KVS *, void *, long)
 
static int write_kvs (const struct kmr_kv_box[], const long, const KMR_KVS *, KMR_KVS *, void *)
 

Detailed Description

kmrrun is command line version of KMR and it runs a MapReduce program whose mapper and reducers are user specified programs.

Both mapper and reducer can be a serial or an MPI program.

When kmrrun is used to run a MapReduce program, user should specify a simple program that generates key-value pairs from the output of mapper. The key-value generator program can be specified by '-k' option and can be implemented by reading outputs of mapper and then writing key-value pairs to the standard output. After shuffling the key-value paris, key-value pairs are written to files on each rank with 'key'-named text files whose line represents a key-value separated by a space. The file is passed to the reducer as the last parameter.

kmrrun can run Map-only MapReduce where no reducer is run. This is very useful if you want to run multiple tasks as a single job.

Options

Usage

$ mpiexec -machinefile machines -n 4 \
./kmrrun -n m_num[:r_num] -m mapper [-k kvgenerator] [-r reducer] [--ckpt]\
inputfile

Examples

e.g.1) Run serial mapper and reducer
$ mpirun -np 2 ./kmrrun -m "./pi.mapper" -k "./pi.kvgen.sh" -r "./pi.reducer" ./work
e.g.2) Run MPI mapper and MPI reducer with 2 MPI processes each.
$ mpirun -np 2 ./kmrrun -n 2 -m "./mpi_pi.mapper" -k "./mpi_pi.kvgen.sh" -r "./mpi_pi.reducer" ./work
e.g.3) Run MPI mapper with 2 MPI processes and serial reducer
$ mpirun -np 2 ./kmrrun -n 2:1 -m "./mpi_pi.mapper" -k "./mpi_pi.kvgen.sh" -r "./pi.reducer" ./work
e.g.4) Only run MPI mapper with 2 MPI processes
$ mpirun -np 2 ./kmrrun -n 2 -m "./mpi_pi.mapper" ./work

Definition in file kmrrun.c.