一个人工智能研发小组在偶然间发现了一个数学上难以解答的问题，它涉及到一个由奥地利著名数学家 Kurt Gödel 在20世纪30年代发现的一个逻辑悖论，该悖论无法通过常规数学方法解决。
从事机器学习问题的数学专家向我们展示了关于“学习力”的问题，即一个算法是否可以从有限的数据中提取模型，与著名的连续统假设有关。Kurt Gödel 发现，数据库无法被证明是否使用了标准数学语言。最终的研究成果在一月7日的《Nature Machine Intelligence》（《Nature》的机器职能子刊）一片文章中展现。
该论文的联名作者，海法以色列理工学院的 Amir Yehudayoff 表示这对于他们来说是一个惊喜。他说：尽管有一系列技术上的数学问题，众所周知，都同样地“无法判断”，他也并没想过这样的现象能在机器学习的一个相对简单的问题上出现。
英国斯旺西大学的计算机专家 John Tucker 说，这篇论文是“在我们有限知识领域中的一个重量级成果”，基本上对数学和机器学习领域都具备重要意义。并非所有集合都相等
Yehudayoff 与合作者是在研究学习力和“压缩”的联系时发现该成果的，在这个过程中，他们希望找到一种方法，通过一组数量级较小的数据归纳出大数量级的数据的显著特征。文章作者发现信息能被有效压缩的能力，归结于集合理论中的一个问题，即对象的数学集合，比如 Venn 图中的集合。特别要注意的是，它和不同大小且包含无限多个个体（对象）的集合有关。
集合理论的创始人 Georg Cantor 在19世纪70年代论证了不是所有无限集都是相等的：特别地，整数集比所有实数集（即连续统）小。（实数包括无理数、有理数及整数）Cantor 还指出不可能存在位于中间的集合（比整数大，同时比连续统小的集合）。但是他没办法证明这个连续统假说，紧随着，许多后来的数学家和逻辑学家也无法证明。
努力成为白费。1940年，Gödel 的结果（后于60年代由美国数学家Paul Cohen完成）表明，连续统假说从标准的公理（所有数学理论的基础）开始都无法证明是否正确。
Gödel 和 Cohen 在连续统假说的研究意味着允许存在平行的数学宇宙，且两边都满足标准数学规律，其中连续统假设被添加到公理中，因而是正确的，而另一边则是错误的东西。学习力界限
在最新的论文中，Yehudayoff 和他的合作者将学习力定义为通过小数据集推出大数据集的能力。而与 Cantor 的问题的联系在于可以通过无穷多的方式取得较小的集，而无从得知无穷的具体边界。
伦敦大学的计算机专家 Peter O'Hearn 说研究人员已经发现了一系列相似的“无法判断”问题。特别在 Gödel 的研究之后， Alan Turing （共同创造了算法理论，即大名鼎鼎的艾伦图灵）找到一系列任何计算机程序都不能被证明能在有限步骤中回答的问题。
但是，O'Hearn 补充道，最终无法判断作为“罕见”结果，带来更多的是惊喜：它指出了Gödel 的发现在任何数学语言中的内在不完整性。他还说，即使不能确定这些发现对于实践能产生什么影响，但对于机器学习却可能是相当重要的。
Machine learning leads mathematicians to unsolvable problem
A team of researchers has stumbled on a question that is mathematically unanswerable because it is linked to logical paradoxes discovered by Austrian mathematician Kurt Gödel in the 1930s that can’t be solved using standard mathematics.
The mathematicians, who were working on a machine-learning problem, show that the question of ‘learnability’ — whether an algorithm can extract a pattern from limited data — is linked to a paradox known as the continuum hypothesis. Gödel showed that the statement cannot be proved either true or false using standard mathematical language. The latest result appeared on 7 January in Nature Machine Intelligence1.
“For us, it was a surprise,” says Amir Yehudayoff at the Technion–Israel Institute of Technology in Haifa, who is a co-author on the paper. He says that although there are a number of technical maths questions that are known to be similarly ‘undecidable’, he did not expect this phenomenon to show up in a relatively simple problem in machine learning.
John Tucker, a computer scientist at Swansea University, UK, says that the paper is “a heavyweight result on the limits of our knowledge”, with foundational implications for both mathematics and machine learning.Not all sets are equal
Researchers often define learnability in terms of whether an algorithm can generalize its knowledge. The algorithm is given the answer to a ‘yes or no’ question — such as “Does this image show a cat?” — for a limited number of objects, and then has to guess the answer for new objects.
Yehudayoff and his collaborators arrived at their result while investigating the connection between learnability and ‘compression’, which involves finding a way to summarize the salient features of a large set of data in a smaller set of data. The authors discovered that the information’s ability to be compressed efficiently boils down to a question in the theory of sets — mathematical collections of objects such as the sets in Venn diagrams. In particular, it relates to the different sizes of sets containing infinitely many objects.
Georg Cantor, the founder of set theory, demonstrated in the 1870s that not all infinite sets are created equal: in particular, the set of integer numbers is ‘smaller’ than the set of all real numbers, also known as the continuum. (The real numbers include the irrational numbers, as well as rationals and integers.) Cantor also suggested that there cannot be sets of intermediate size — that is, larger than the integers but smaller than the continuum. But he was not able to prove this continuum hypothesis, and nor were many mathematicians and logicians who followed him.
Their efforts were in vain. A 1940 result by Gödel (which was completed in the 1960s by US mathematician Paul Cohen) showed that the continuum hypothesis cannot be proved either true or false starting from the standard axioms — the statements taken to be true — of the theory of sets, which are commonly taken as the foundation for all of mathematics.
Gödel and Cohen’s work on the continuum hypothesis implies that there can exist parallel mathematical universes that are both compatible with standard mathematics — one in which the continuum hypothesis is added to the standard axioms and therefore declared to be true, and another in which it is declared false.Learnability limbo
In the latest paper, Yehudayoff and his collaborators define learnability as the ability to make predictions about a large data set by sampling a small number of data points. The link with Cantor’s problem is that there are infinitely many ways of choosing the smaller set, but the size of that infinity is unknown.
They authors go on to show that if the continuum hypothesis is true, a small sample is sufficient to make the extrapolation. But if it is false, no finite sample can ever be enough. This way they show that the problem of learnability is equivalent to the continuum hypothesis. Therefore, the learnability problem, too, is in a state of limbo that can be resolved only by choosing the axiomatic universe.
The result also helps to give a broader understanding of learnability, Yehudayoff says. “This connection between compression and generalization is really fundamental if you want to understand learning.”
Researchers have discovered a number of similarly ‘undecidable’ problems, says Peter O’Hearn, a computer scientist at University College London. In particular, following work by Gödel, Alan Turing — who co-founded the theory of algorithms — found a class of questions that no computer program can be guaranteed to answer in any finite number of steps.
But the undecidability in the latest results is “of a rare kind”, and much more surprising, O’Hearn adds: it points to what Gödel found to be an intrinsic incompleteness in any mathematical language. The findings will probably be important for the theory of machine learning, he adds, although he is “not sure it will have much impact on the practice”.
1.Ben-David, S., Hrubeš, P., Moran, S., Shpilka, A. & Yehudayoff, A. Nature Mach. Intel. 1, 44–48 (2019).