Auto Byte

Science AI

# MetaMath：新数学推理语言模型，训练大模型的逆向思维

1. 经典问题 - 鸡兔同笼

• 正向推理：笼子里有 23 只鸡和 12 只兔，问笼子里有多少个头和多少只脚？

• 逆向推理：有若干只鸡兔同在一个笼子里，从上面数，有 35 个头，从下面数，有 94 只脚。问笼中各有多少只鸡和兔？

2. GSM8K 问题

• 正向推理: James buys 5 packs of beef that are 4 pounds each. The price of beef is \$5.50 per pound. How much did he pay?

• 逆向推理: James buys x packs of beef that are 4 pounds each. The price of beef is \$5.50 per pound. How much did he pay? If we know the answer to the above question is 110, what is the value of unknown variable x?

• 项目地址：https://meta-math.github.io/

• 论文地址：https://arxiv.org/abs/2309.12284

• 数据地址：https://huggingface.co/datasets/meta-math/MetaMathQA

• 模型地址：https://huggingface.co/meta-math

• 代码地址：https://github.com/meta-math/MetaMath

MetaMathQA 包含四种数据增强的方法：

• Question: James buys 5 packs of beef that are 4 pounds each. The price of beef is \$5.50 per pound. How much did he pay?

• Answer: James buys 5 packs of beef that are 4 pounds each, so he buys a total of 5 * 4 = 20 pounds of beef. The price of beef is \$5.50 per pound, so he pays 20 * \$5.50 = \$110. The answer is: 110.

2. Rephrasing Question (问题改写增强)：给定元问题，通过大语言模型重写问题并生成得到正确结果的思维链作为数据增广。

• Question: What is the total amount that James paid when he purchased 5 packs of beef, each weighing 4 pounds, at a price of \$5.50 per pound?

• Answer: Each pack of beef weighs 4 pounds, so 5 packs weigh 4 * 5 = 20 pounds in total. The price per pound of beef is \$5.50, so the total cost for 20 pounds is 20 * \$5.50 = \$110. Therefore, James paid a total of \$110. The answer is: 110.

3. FOBAR Question (FOBAR 逆向问题增强)：给定元问题，通过掩码条件中的数字为 x，给定原有答案并反推 x 来产生逆向问题，并基于该逆向问题生成正确的思维链过程来进行数据增广（反推样例：“If we know the answer to the above question is 110, what is the value of unknown variable x?”）。

• Question: James buys x packs of beef that are 4 pounds each. The price of beef is \$5.50 per pound. How much did he pay? If we know the answer to the above question is 110, what is the value of unknown variable x?

• Answer: James buys x packs of beef that are 4 pounds each, so he buys a total of 4x pounds of beef. The price of beef is \$5.50 per pound, so the total cost of the beef is 5.50 * 4x = 22x. We are given that the total cost is \$110, so we can write: 22x = 110. Dividing both sides by 22, we get: x = 5. The value of x is 5.

4. Self-Verification Question (Self-Verification 逆向问题增强)：在 FOBAR 的基础上，通过大语言模型改写逆向问题部分为陈述的语句来进行数据增广（改写样例：“How much did he pay?” (with the answer 110) 被改写成 “He paid 110”）。

• Question: James buys x packs of beef that are 4 pounds each. The price of beef is \$5.50 per pound. He paid 110. What is the value of unknown variable x?

• Answer: To solve this problem, we need to determine the value of x, which represents the number of packs of beef that James bought. Each pack of beef weighs 4 pounds and costs \$5.50 per pound. The total amount James paid is \$110. We can set up the equation as follows: Number of packs of beef * Weight per pack * Price per pound = Total amount paid; x * 4 * \$5.50 = \$110; Let's simplify and solve for x: 4x * \$5.50 = \$110; 22x = \$110; To isolate x, we divide both sides of the equation by 22: 22x / 22 = \$110 / 22; x = \$5; The value of x is 5.

[1] Berglund, Lukas, et al. "The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”." arXiv preprint arXiv:1907.11692 (2023).

[2] Zhou, Chunting, et al. "Lima: Less is more for alignment." arXiv preprint arXiv:2305.11206 (2023).

https://www.huawei.com/cn/

https://hkust.edu.hk/