tod_calibrate
Empirical batch-size and process-count calibration.
The calibration runs once before the day loop, measures sustained throughput
over a range of candidate configurations using interleaved timing repeats (to
avoid L3 cache warm-up bias), and writes the optimal values to the config file
for subsequent runs. All calibration functions are internal helpers (prefixed
with _) and are not part of the public API.
Runtime calibration for tod_exact_gen_batched.
- Calibrates three knobs jointly:
n_processes (P) — worker processes (parallel over days)
numba_threads (T) — threads per worker (parallel over batch via prange)
batch_size (B) — samples per fused-kernel invocation
The fused kernel (a1d5d36) parallelises the entire Rodrigues+gather over prange(B). Spawning P workers each using T = NUMBA_NUM_THREADS_DEFAULT (=all cores) oversubscribes by P×; the new search enforces P*T ≤ N_cores.
- Strategy (~30s wall time):
Phase A — single-process throughput vs T at a fixed B. Phase B — for best T, sweep B around 16×T to land on the throughput plateau. Phase C — enumerate (P, T) with P*T ≤ N_cores; pick max P × tp(T).
Memory budget per process must accommodate mp_stacked (already in shared memory, but counted defensively) plus transient per-batch buffers.
Beam clustering calibration is unchanged (driven by science accuracy, not speed).
- tod_calibrate.calibrate_runtime(beam_data, folder_scan, probe_day, mp, n_cpu_ceiling, max_processes_user, interp_mode='bilinear', prefix='', center_idx=None, z_skip_threshold=-1.0)[source]
Joint (n_processes, numba_threads, batch_size) calibration.
- Parameters:
beam_data – from prepare_beam_data (after clustering, with mp_stacked).
folder_scan – scan directory.
probe_day – any valid day index for probe data.
mp – list of sky-map components.
n_cpu_ceiling – hard ceiling from scheduler/affinity (_get_ncpus()).
max_processes_user – user-configured n_processes (laptop cap, etc). Acts as an upper bound on P.
interp_mode – ‘nearest’ or ‘bilinear’.
prefix – log prefix.
- Returns:
(n_processes, n_threads, batch_size)
- tod_calibrate.calibrate_beam_clustering(beam_data, folder_scan=None, probe_day=None, mp=None, error_threshold=0.001, bell_lmax=None, interp_mode='bilinear')[source]
Find (tail_fraction, n_clusters) maximising speedup s.t. B_ell error ≤ error_threshold.
Computes reference B_ell (power_cut=1.0) from unclustered beam, then sweeps a fixed (tail_fraction × n_clusters) grid. The pair maximising speedup with relative-RMS B_ell divergence ≤ error_threshold wins; if no pair qualifies, the minimum-divergence pair is returned with a warning.