KMR
|
KMR Context. More...
#include <kmr.h>
Public Attributes | |
int | atoa_requests_limit |
long | atoa_size_limit |
long | atoa_threshold |
struct kmr_code_line * | atwork |
struct kmr_ckpt_ctx * | ckpt_ctx |
_Bool | ckpt_enable: 1 |
long | ckpt_kvs_id_counter |
_Bool | ckpt_no_fsync: 1 |
_Bool | ckpt_selective: 1 |
MPI_Comm | comm |
MPI_Info | conf |
_Bool | file_io_always_alltoallv: 1 |
long | file_io_block_size |
_Bool | file_io_dummy_striping: 1 |
char | identifying_name [KMR_JOB_NAME_LEN] |
_Bool | keep_fds_at_fork: 1 |
char * | kmr_installation_path |
_Bool | kmrviz_trace: 1 |
struct kmr_kvs_list_head | kvses |
struct kmr_trace * | kvt_ctx |
FILE * | log_traces |
size_t | malloc_overhead |
_Bool | map_ms_abort_on_signal: 1 |
_Bool | map_ms_use_exec: 1 |
long | mapper_park_size |
_Bool | mpi_thread_support: 1 |
int | nprocs |
_Bool | one_step_sort: 1 |
_Bool | onk: 1 |
size_t | preset_block_size |
size_t | pushoff_block_size |
_Bool | pushoff_fast_notice: 1 |
_Bool | pushoff_hang_out: 1 |
int | pushoff_poll_rate |
_Bool | pushoff_stat: 1 |
struct { | |
long counts [10] | |
double times [4] | |
} | pushoff_statistics |
int | rank |
int | rlimit_nofile |
void * | simple_workflow |
_Bool | single_thread: 1 |
long | sort_sample_factor |
int | sort_threads_depth |
long | sort_threshold |
long | sort_trivial |
MPI_Comm ** | spawn_comms |
_Bool | spawn_disconnect_but_free: 1 |
_Bool | spawn_disconnect_early: 1 |
int | spawn_gap_msec [2] |
int | spawn_max_processes |
_Bool | spawn_pass_intercomm_in_argument: 1 |
int | spawn_retry_gap_msec |
int | spawn_retry_limit |
MPI_Comm | spawn_self |
int | spawn_size |
_Bool | spawn_sync_at_startup: 1 |
int | spawn_watch_accept_onhold_msec |
int | spawn_watch_af |
_Bool | spawn_watch_all: 1 |
char * | spawn_watch_host_name |
int | spawn_watch_port_range [2] |
char * | spawn_watch_prefix |
char * | spawn_watch_program |
_Bool | std_abort: 1 |
_Bool | step_sync: 1 |
_Bool | stop_at_some_check_globally: 1 |
size_t | swf_args_size |
_Bool | swf_debug_master |
_Bool | swf_exec_so |
_Bool | swf_record_history |
char * | swf_spawner_library |
void * | swf_spawner_so |
_Bool | trace_alltoall: 1 |
_Bool | trace_file_io: 1 |
_Bool | trace_iolb: 1 |
_Bool | trace_kmrdp: 1 |
_Bool | trace_map_ms: 1 |
_Bool | trace_map_spawn: 1 |
_Bool | trace_sorting: 1 |
uint8_t | verbosity |
KMR Context.
Structure KMR is a common record of key-value streams. It records a few internal states and many options.
KVSES is a linked-list recording all active key-value streams. It is used to warn about unfreed key-value streams.
CKPT_KVS_ID_COUNTER and CKPT_CTX record checkpointing states.
LOG_TRACES is a file stream, when it is non-null, which records times taken by each call to map/reduce-functions. Note that trace routines call MPI_Wtime() in OMP parallel regions (although it may be non-threaded). ATWORK indicates the caller of the current work of mapping or reducing (or null if it is not associated), which is used in logging traces.
SPAWN_SIZE and SPAWN_COMMS temporarily holds an array of inter-communicators for kmr_map_via_spawn(), so that a communicator can be obtained by kmr_get_spawner_communicator() in a map-function.
MAPPER_PARK_SIZE is the number of entries pooled before calling a map-function. Entries are aggregated to try to call a map-function with threads. PRESET_BLOCK_SIZE is the default allocation size of a buffer of key-values. It is used as a block-size of key-value streams after trimmed by the amount of the malloc overhead. MALLOC_OVERHEAD (usually an amount of one pointer) is reduced from the allocation size, to keep good alignment boundary.
ATOA_THRESHOLD makes the choice of algorithms of all-to-all-v communication by the sizes of messages (set to zero to use all-to-all-v of MPI).
ATOA_SIZE_LIMIT is normally 0. It is mainly for tests. It lowers the limit of the data size of using MPI all-to-all-v from 16GB to the specified value. When the data size exceeds the value, a naive method using isend/irecv is used instead of MPI all-to-all-v.
ATOA_REQUESTS_LIMIT is the limit of the number of isend/irecv requests which are pending in a naive all-to-all-v algorithm, that is used when the data size exceeds the 16GB. It is normally 0, which sets it to 4096.
SORT_TRIVIAL determines the sorter to run on a single node when data size is this value or smaller. SORT_THRESHOLD determines the sorter to use full sampling of a sampling-sort, or pseudo sampling when data size is small. SORT_SAMPLES_FACTOR determines the number of samples of a sampling-sort. SORT_THREADS_DEPTH controls the local sorter. The quick-sort uses Open MP threads until recursion depth reaches this value (set to zero for sequential run).
FILE_IO_BLOCK_SIZE is a block size of file reading, used when the striping information is not available.
PUSHOFF_BLOCK_SIZE is a block size of a push-off key-value stream. It is a communication block size and should be eqauls on all ranks. PUSHOFF_POLL_RATE gives a hint to a polling interval of a push-off key-value stream.
KMR_INSTALLATION_PATH records the installation path, which is taken from the configuration. SPAWN_WATCH_PROGRAM is a watch-program name, which is used in spawning processes which do not communicate to the parent. The variable is a file-path which may be set in advance or may be set to one where the watch-program is copied (usually in the user's home directory). SPAWN_WATCH_PREFIX is a location where a watch-program is to be installed (instead of the home directory). SPAWN_WATCH_HOST_NAME is a name of a host-name of a spawner. It may be set when there is a difficulty in connecting a socket. SPAWN_MAX_PROCESSES limits the number of processes simultaneously spawned without regard to the universe size. SPAWN_WATCH_AF is 0, 4, or 6 as the preferred IP address format used by the watch-program. SPAWN_WATCH_PORT_RANGE[2] is a range of IP port number used by the watch-program (values are inclusive). SPAWN_GAP_MSEC[2] is the time given between spawning calls needed by the MPI runtime to clean-up the resource management. The value is scaled to the log of the universe size, corresponding the 1st value to 0 processes and the 2nd value to 1,000 processes (the default is 1 second to one process and 10 seconds for 1,000 processes).
SPAWN_SELF holds the communicator used in spawning. KMR retries MPI_Comm_spawn() because it can fail due to the race between an issue and a delay in job scheduling. SPAWN_RETRY_LIMIT and SPAWN_RETRY_GAP_MSEC control retries of MPI_Comm_spawn(). It reties MPI_Comm_spawn() by SPAWN_RETRY_LIMIT times taking a SPAWN_RETRY_GAP_MSEC sleep in between (300 seconds in total by default).
SPAWN_WATCH_ACCEPT_ONHOLD_MSEC is the time given to wait for the watch-program to connect back by a socket.
VERBOSITY is the verbosity of warning messages; default 5 is good for typical use.
ONK enables the features on K or FX10. SINGLE_THREAD makes imply the nothreading option for mapper/shuffler/reducer. ONE_STEP_SORT disables a prior sorting step which sort on (packed/hashed) integer keys in local sorting. STEP_SYNC is to call a barrier at each operation step for debugging. TRACE_FILE_IO, TRACE_MAP_MS, and TRACE_MAP_SPAWN let dump trace output for debugging. (TRACE_ALLTOALL lets dump trace output on communication for debugging internals). TRACE_KMRDP lets dump timing information of run of KMR-DP. STD_ABORT lets use abort() instead of MPI_Abort() on errors, to let cores dumped on some MPI implementations. (FILE_IO_DUMMY_STRIPING is for debugging internals, and assigns dummy striping information on not Lustre file-systems). (FILE_IO_ALWAYS_ALLTOALLV is for debugging internals). MAP_MS_USE_EXEC forces KMR use fork-execing instead of system(3C) to start a subprocess in kmr_map_ms_commands(). KMR also uses fork-execing when command strings include null characters (not at the end). MAP_MS_ABORT_ON_SIGNAL makes KMR abort when a subprocess is killed in kmr_map_ms_commands(). SPAWN_DISCONNECT_EARLY (useless) lets the spawner free the inter-communicator immediately after spawning. SPAWN_DISCONNECT_BUT_FREE lets the spawner use MPI_Comm_disconnect() instead of MPI_Comm_free() (It is only used with buggy Intel MPI (4.x)). (SPAWN_PASS_INTERCOMM_IN_ARGUMENT changes the behavior to the old API). MPI_THREAD_SUPPORT records the thread support level. CKPT_ENABLE is a checkpointing enable. CKPT_SELECTIVE enables users to specify which kmr functions take ckpt files of the output key-value stream. To take ckpt files with this option enabled, users should specify TAKE_CKPT option enabled when calling a kmr function. CKPT_NO_FSYNC does not call fsync syscall on writing ckpt files. Both CKPT_SELECTIVE and CKPT_NO_FSYNC should be specified with CKPT_ENABLE. STOP_AT_SOME_CHECK_GLOBALLY forces global checking of stop-at-some state in mapping (not implemented). Mapping with stop-at-some should be stopped when some key-value is added on any rank, but the check is performed only locally by default. PUSHOFF_HANG_OUT makes communication of push-off continue on after a finish of mapping/reducing. PUSHOFF_FAST_NOTICE enables use of RDMA put for event notification in push-off key-value streams. PUSHOFF_STAT enables collecting statistics of communication in push-off key-value streams. KMRVIZ_TRACE enables tracing kmr function calls for KMRViz. IDENTIFYING_NAME is just a note.