PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding
from Language Models is a simple, yet effective constrained decoding method with large pre-trained auto-regressive language model.
It works by warping the model prediction scores and integrates trivially with existing algorithms for greedy and beam search used in auto-regressive decoding from language models.
Autoregressive models are a class of machine learning (ML) models that automatically predict the next component in a sequence by taking measurements from previous inputs in the sequence.
Let:
Autoregression is a statistical technique used in time-series analysis that assumes that the current value of a time series is a function of its past values. Autoregressive models use similar mathematical techniques to determine the probabilistic correlation between elements in a sequence.
By design, PICARD requires a probabilistic auto-regressive model. While most LLM models are of that nature, they are usually exposed as a black box, which effectively denies using that feature.
While PICARD is based on beam search, its not the only search strategy, in fact there are two:
In essence, greedy search prioritizes immediate gains, while beam search explores a wider range of options, potentially leading to better solutions at the cost of increased computational complexity. This is demonstrated in the following figure:
Its arguments are the token ids of the current hypothesis and, for each vocabulary token, the log-softmax scores predicted by the model’s language modeling head.
PICARD's strategy is composed of the following steps:
PICARD supports the following modes: