Chain-of-Thought Reasoning Without Prompting // LLMs have very bad reasoning capabilities

Reasoning is based on templates/examples not logic or rules

sbagency
2 min readFeb 18, 2024
https://arxiv.org/pdf/2402.10200.pdf

In enhancing the reasoning capabilities of large language models (LLMs), prior research primarily focuses on specific prompting techniques such as few-shot or zero-shot chain-of-thought (CoT) prompting. These methods, while effective, often involve manually intensive prompt engineering. Our study takes a novel approach by asking: Can LLMs reason effectively without prompting? Our findings reveal that, intriguingly, CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the decoding process. Rather than conventional greedy decoding, we investigate the top-𝑘 alternative tokens, uncovering that CoT paths are frequently inherent in these sequences. This approach not only bypasses the confounders of prompting but also allows us to assess the LLMs’ intrinsic reasoning abilities. Moreover, we observe that the presence of a CoT in the decoding path correlates with a higher confidence in the model’s decoded answer. This confidence metric effectively differentiates between CoT and non-CoT paths. Extensive empirical studies on various reasoning benchmarks show that the proposed CoT-decoding substantially outperforms the standard greedy decoding.

This paper introduces a new approach called “Chain-of-Thought (CoT) Decoding” that elicits reasoning capabilities from large language models without explicit prompting. The key findings are:

1. By considering alternative top-k tokens at the first decoding step (instead of just greedy decoding), the language model naturally generates CoT reasoning paths during decoding across various reasoning tasks like math, commonsense, and symbolic reasoning.

2. The presence of a CoT path correlates with higher model confidence in the final decoded answer.

3. Leveraging this observation, CoT-decoding extracts more reliable paths by selecting the paths with highest answer confidence, significantly improving reasoning performance over greedy decoding, even compared to few-shot CoT prompting in some cases.

4. CoT paths emerge more readily for tasks well-represented in the pre-training data, while advanced prompting may still be needed for highly synthetic/underrepresented tasks to guide reasoning.

5. CoT-decoding offers a way to assess language models’ intrinsic reasoning abilities without the confounding factors introduced by prompting.

In summary, a simple modification to the decoding process can unveil the latent CoT reasoning capabilities present in large language models in an unsupervised manner, outperforming standard greedy decoding on reasoning tasks.

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

No responses yet