nemo_rl.data.processors#

Contains data processors for evaluation.

Module Contents#

Functions#

helpsteer3_data_processor

Process a HelpSteer3 preference datum into a DatumSpec for GRPO training.

math_data_processor

Process a datum dictionary (directly loaded from dataset) into a DatumSpec for the Math Environment.

math_hf_data_processor

Process a datum dictionary (directly loaded from data/hf_datasets/openmathinstruct2.py) into a DatumSpec for the Reward Model Environment.

_construct_multichoice_prompt

Construct prompt from question and options.

multichoice_qa_processor

Process a datum dictionary (directly loaded from dataset) into a DatumSpec for multiple-choice problems.

register_processor

Data#

API#

nemo_rl.data.processors.TokenizerType#

None

nemo_rl.data.processors.helpsteer3_data_processor(
datum_dict: dict[str, Any],
task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
tokenizer: nemo_rl.data.processors.TokenizerType,
max_seq_length: int,
idx: int,
) nemo_rl.data.interfaces.DatumSpec#

Process a HelpSteer3 preference datum into a DatumSpec for GRPO training.

This function converts HelpSteer3 preference data to work with GRPO by:

  1. Using the context as the prompt

  2. Using the preferred completion as the target response

  3. Creating a reward signal based on preference scores

nemo_rl.data.processors.math_data_processor(
datum_dict: dict[str, Any],
task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
tokenizer: nemo_rl.data.processors.TokenizerType,
max_seq_length: int,
idx: int,
) nemo_rl.data.interfaces.DatumSpec#

Process a datum dictionary (directly loaded from dataset) into a DatumSpec for the Math Environment.

nemo_rl.data.processors.math_hf_data_processor(
datum_dict: dict[str, Any],
task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
tokenizer: nemo_rl.data.processors.TokenizerType,
max_seq_length: int,
idx: int,
) nemo_rl.data.interfaces.DatumSpec#

Process a datum dictionary (directly loaded from data/hf_datasets/openmathinstruct2.py) into a DatumSpec for the Reward Model Environment.

nemo_rl.data.processors._construct_multichoice_prompt(
prompt: str,
question: str,
options: dict[str, str],
) str#

Construct prompt from question and options.

nemo_rl.data.processors.multichoice_qa_processor(
datum_dict: dict[str, Any],
task_data_spec: nemo_rl.data.interfaces.TaskDataSpec,
tokenizer: nemo_rl.data.processors.TokenizerType,
max_seq_length: int,
idx: int,
) nemo_rl.data.interfaces.DatumSpec#

Process a datum dictionary (directly loaded from dataset) into a DatumSpec for multiple-choice problems.

nemo_rl.data.processors.PROCESSOR_REGISTRY: Dict[str, nemo_rl.data.interfaces.TaskDataProcessFnCallable]#

‘cast(…)’

nemo_rl.data.processors.register_processor(
processor_name: str,
processor_function: nemo_rl.data.interfaces.TaskDataProcessFnCallable,
) None#