KMR
Public Attributes | List of all members
kmr_ctx Struct Reference

KMR Context. More...

#include <kmr.h>

Public Attributes

int atoa_requests_limit
 
long atoa_size_limit
 
long atoa_threshold
 
struct kmr_code_lineatwork
 
struct kmr_ckpt_ctxckpt_ctx
 
_Bool ckpt_enable: 1
 
long ckpt_kvs_id_counter
 
_Bool ckpt_no_fsync: 1
 
_Bool ckpt_selective: 1
 
MPI_Comm comm
 
MPI_Info conf
 
_Bool file_io_always_alltoallv: 1
 
long file_io_block_size
 
_Bool file_io_dummy_striping: 1
 
char identifying_name [KMR_JOB_NAME_LEN]
 
_Bool keep_fds_at_fork: 1
 
char * kmr_installation_path
 
_Bool kmrviz_trace: 1
 
struct kmr_kvs_list_head kvses
 
struct kmr_tracekvt_ctx
 
FILE * log_traces
 
size_t malloc_overhead
 
_Bool map_ms_abort_on_signal: 1
 
_Bool map_ms_use_exec: 1
 
long mapper_park_size
 
_Bool mpi_thread_support: 1
 
int nprocs
 
_Bool one_step_sort: 1
 
_Bool onk: 1
 
size_t preset_block_size
 
size_t pushoff_block_size
 
_Bool pushoff_fast_notice: 1
 
_Bool pushoff_hang_out: 1
 
int pushoff_poll_rate
 
_Bool pushoff_stat: 1
 
struct {
   long   counts [10]
 
   double   times [4]
 
pushoff_statistics
 
int rank
 
int rlimit_nofile
 
void * simple_workflow
 
_Bool single_thread: 1
 
long sort_sample_factor
 
int sort_threads_depth
 
long sort_threshold
 
long sort_trivial
 
MPI_Comm ** spawn_comms
 
_Bool spawn_disconnect_but_free: 1
 
_Bool spawn_disconnect_early: 1
 
int spawn_gap_msec [2]
 
int spawn_max_processes
 
_Bool spawn_pass_intercomm_in_argument: 1
 
int spawn_retry_gap_msec
 
int spawn_retry_limit
 
MPI_Comm spawn_self
 
int spawn_size
 
_Bool spawn_sync_at_startup: 1
 
int spawn_watch_accept_onhold_msec
 
int spawn_watch_af
 
_Bool spawn_watch_all: 1
 
char * spawn_watch_host_name
 
int spawn_watch_port_range [2]
 
char * spawn_watch_prefix
 
char * spawn_watch_program
 
_Bool std_abort: 1
 
_Bool step_sync: 1
 
_Bool stop_at_some_check_globally: 1
 
size_t swf_args_size
 
_Bool swf_debug_master
 
_Bool swf_exec_so
 
_Bool swf_record_history
 
char * swf_spawner_library
 
void * swf_spawner_so
 
_Bool trace_alltoall: 1
 
_Bool trace_file_io: 1
 
_Bool trace_iolb: 1
 
_Bool trace_kmrdp: 1
 
_Bool trace_map_ms: 1
 
_Bool trace_map_spawn: 1
 
_Bool trace_sorting: 1
 
uint8_t verbosity
 

Detailed Description

KMR Context.

Structure KMR is a common record of key-value streams. It records a few internal states and many options.

KVSES is a linked-list recording all active key-value streams. It is used to warn about unfreed key-value streams.

CKPT_KVS_ID_COUNTER and CKPT_CTX record checkpointing states.

LOG_TRACES is a file stream, when it is non-null, which records times taken by each call to map/reduce-functions. Note that trace routines call MPI_Wtime() in OMP parallel regions (although it may be non-threaded). ATWORK indicates the caller of the current work of mapping or reducing (or null if it is not associated), which is used in logging traces.

SPAWN_SIZE and SPAWN_COMMS temporarily holds an array of inter-communicators for kmr_map_via_spawn(), so that a communicator can be obtained by kmr_get_spawner_communicator() in a map-function.

MAPPER_PARK_SIZE is the number of entries pooled before calling a map-function. Entries are aggregated to try to call a map-function with threads. PRESET_BLOCK_SIZE is the default allocation size of a buffer of key-values. It is used as a block-size of key-value streams after trimmed by the amount of the malloc overhead. MALLOC_OVERHEAD (usually an amount of one pointer) is reduced from the allocation size, to keep good alignment boundary.

ATOA_THRESHOLD makes the choice of algorithms of all-to-all-v communication by the sizes of messages (set to zero to use all-to-all-v of MPI).

ATOA_SIZE_LIMIT is normally 0. It is mainly for tests. It lowers the limit of the data size of using MPI all-to-all-v from 16GB to the specified value. When the data size exceeds the value, a naive method using isend/irecv is used instead of MPI all-to-all-v.

ATOA_REQUESTS_LIMIT is the limit of the number of isend/irecv requests which are pending in a naive all-to-all-v algorithm, that is used when the data size exceeds the 16GB. It is normally 0, which sets it to 4096.

SORT_TRIVIAL determines the sorter to run on a single node when data size is this value or smaller. SORT_THRESHOLD determines the sorter to use full sampling of a sampling-sort, or pseudo sampling when data size is small. SORT_SAMPLES_FACTOR determines the number of samples of a sampling-sort. SORT_THREADS_DEPTH controls the local sorter. The quick-sort uses Open MP threads until recursion depth reaches this value (set to zero for sequential run).

FILE_IO_BLOCK_SIZE is a block size of file reading, used when the striping information is not available.

PUSHOFF_BLOCK_SIZE is a block size of a push-off key-value stream. It is a communication block size and should be eqauls on all ranks. PUSHOFF_POLL_RATE gives a hint to a polling interval of a push-off key-value stream.

KMR_INSTALLATION_PATH records the installation path, which is taken from the configuration. SPAWN_WATCH_PROGRAM is a watch-program name, which is used in spawning processes which do not communicate to the parent. The variable is a file-path which may be set in advance or may be set to one where the watch-program is copied (usually in the user's home directory). SPAWN_WATCH_PREFIX is a location where a watch-program is to be installed (instead of the home directory). SPAWN_WATCH_HOST_NAME is a name of a host-name of a spawner. It may be set when there is a difficulty in connecting a socket. SPAWN_MAX_PROCESSES limits the number of processes simultaneously spawned without regard to the universe size. SPAWN_WATCH_AF is 0, 4, or 6 as the preferred IP address format used by the watch-program. SPAWN_WATCH_PORT_RANGE[2] is a range of IP port number used by the watch-program (values are inclusive). SPAWN_GAP_MSEC[2] is the time given between spawning calls needed by the MPI runtime to clean-up the resource management. The value is scaled to the log of the universe size, corresponding the 1st value to 0 processes and the 2nd value to 1,000 processes (the default is 1 second to one process and 10 seconds for 1,000 processes).

SPAWN_SELF holds the communicator used in spawning. KMR retries MPI_Comm_spawn() because it can fail due to the race between an issue and a delay in job scheduling. SPAWN_RETRY_LIMIT and SPAWN_RETRY_GAP_MSEC control retries of MPI_Comm_spawn(). It reties MPI_Comm_spawn() by SPAWN_RETRY_LIMIT times taking a SPAWN_RETRY_GAP_MSEC sleep in between (300 seconds in total by default).

SPAWN_WATCH_ACCEPT_ONHOLD_MSEC is the time given to wait for the watch-program to connect back by a socket.

VERBOSITY is the verbosity of warning messages; default 5 is good for typical use.

ONK enables the features on K or FX10. SINGLE_THREAD makes imply the nothreading option for mapper/shuffler/reducer. ONE_STEP_SORT disables a prior sorting step which sort on (packed/hashed) integer keys in local sorting. STEP_SYNC is to call a barrier at each operation step for debugging. TRACE_FILE_IO, TRACE_MAP_MS, and TRACE_MAP_SPAWN let dump trace output for debugging. (TRACE_ALLTOALL lets dump trace output on communication for debugging internals). TRACE_KMRDP lets dump timing information of run of KMR-DP. STD_ABORT lets use abort() instead of MPI_Abort() on errors, to let cores dumped on some MPI implementations. (FILE_IO_DUMMY_STRIPING is for debugging internals, and assigns dummy striping information on not Lustre file-systems). (FILE_IO_ALWAYS_ALLTOALLV is for debugging internals). MAP_MS_USE_EXEC forces KMR use fork-execing instead of system(3C) to start a subprocess in kmr_map_ms_commands(). KMR also uses fork-execing when command strings include null characters (not at the end). MAP_MS_ABORT_ON_SIGNAL makes KMR abort when a subprocess is killed in kmr_map_ms_commands(). SPAWN_DISCONNECT_EARLY (useless) lets the spawner free the inter-communicator immediately after spawning. SPAWN_DISCONNECT_BUT_FREE lets the spawner use MPI_Comm_disconnect() instead of MPI_Comm_free() (It is only used with buggy Intel MPI (4.x)). (SPAWN_PASS_INTERCOMM_IN_ARGUMENT changes the behavior to the old API). MPI_THREAD_SUPPORT records the thread support level. CKPT_ENABLE is a checkpointing enable. CKPT_SELECTIVE enables users to specify which kmr functions take ckpt files of the output key-value stream. To take ckpt files with this option enabled, users should specify TAKE_CKPT option enabled when calling a kmr function. CKPT_NO_FSYNC does not call fsync syscall on writing ckpt files. Both CKPT_SELECTIVE and CKPT_NO_FSYNC should be specified with CKPT_ENABLE. STOP_AT_SOME_CHECK_GLOBALLY forces global checking of stop-at-some state in mapping (not implemented). Mapping with stop-at-some should be stopped when some key-value is added on any rank, but the check is performed only locally by default. PUSHOFF_HANG_OUT makes communication of push-off continue on after a finish of mapping/reducing. PUSHOFF_FAST_NOTICE enables use of RDMA put for event notification in push-off key-value streams. PUSHOFF_STAT enables collecting statistics of communication in push-off key-value streams. KMRVIZ_TRACE enables tracing kmr function calls for KMRViz. IDENTIFYING_NAME is just a note.

Definition at line 247 of file kmr.h.


The documentation for this struct was generated from the following file: