导语：Eye on A.I.是由纽约时报资深记者 Craig S. Smith 主持的一档双周博客节目。每一期节目，Craig 都将与这一领域有影响力的人物进行交流，推进广义环境中的机器智能新发展，思考技术发展新蕴意。
机器之心为此系列对话的中文合作方。以下为此系列内容的第三篇，Craig Smith 与 Yandex AI负责人Misha Bilenko展开的对话。
Hi, this is Craig Smith with a new podcast about artificial intelligence. I’m a former New York Times correspondent now focused on AI and I’ll be talking to people who are making a difference in the space. I was recently in Stockholm at this year’s ICML, the International Conference on Machine Learning, and had a few conversations with Misha Bilenko, the head of AI at Yandex, which is often described as the Google of Russia. I found much of what Misha had to say enlightening and hope you do, too.
大家好，我是 Craig Smith，这是一个有关人工智能的新播客。我之前是《纽约时报》的记者，现在专注于 AI，我将与正在该领域做贡献的人对话。我目前正在斯德哥尔摩参加今年的国际机器学习会议 ICML，并与 Yandex的 AI负责人 Misha Bilenko进行了对话。Yandex可以说是俄罗斯版的谷歌。我认为 Misha说的很多内容都很有启发性，希望你也这样认为。
We started talking about the role of state actors in artificial intelligence research and the many national AI strategies that have been announced. Vladimir Putin famously said last year that “whoever becomes the leader in AI will rule the world.” I asked Misha whether he saw AI research in national terms and whether there is a risk of an AI “arms race.”
我们首先谈到了人工智能研究中的国家行为者的角色。目前已有很多国家级的 AI战略出台。弗拉基米尔·普京去年有个著名的言论：“谁成为 AI领域的领导者，谁就将掌控世界。”我问 Misha他是否看到了国家层面的 AI研究，他认为是否存在 AI“军备竞赛”的风险。
MISHA: I think in the research field, people don't think in terms of national strategies given how communal and international the field is. If you look at the make-up of researchers in terms of current affiliations, and then affiliations of their educational paths and their origins, everyone has moved around so much and the things they care about are typically in the scientific field rather than in some sort of policy field that I think it's easy to think of anecdotes or think of, basically, come up with stories about national strategies but that exists in a separate realm from the technical field where what people care about is algorithms, where people generally are very eager to collaborate and they have friends and colleagues from all over the world and they, you know, they've worked with people from all over the world in the past and now they're working with people from all over the world in the present and in the process they're moving all around the world.
So, it is a very cosmopolitan community and that's where this - any sort of rhetoric that exists in the political sphere of national strategies is really, just does not, does not jive and does not belong on the ground here because it's just not the way people think and that's not the way they operate. Everybody thinks of it in terms of the, you know, the technical problems we have and the challenges and the algorithms and the progress going forward, and then the general kind of assumption is that everybody is here to push science and technology further.
CRAIG: And there is no Russian, quote unquote, AI, or American, quote unquote, AI. I mean everything is being published in open forums - everything significant - other than perhaps military applications, but very little of that is basic research.
MISHA: I know nothing about military applications - I never worked on military applications - but the venues themselves are international so ICML moves from continent to continent every year by design. Everything that's interesting gets published in the core conferences – ICML, NIPS, ICLR - it goes into arxiv, which is just an Internet archive, literally, of scientific work. And because of that there is no, kind of, like I said, people don't even think of national security and policy in the same realm as the technical work. It just exists in a whole separate sphere of technical achievement and algorithmic excellence in science.
Misha：我对军事应用一无所知——我从没开发过军事应用，但这个领域本身是国际性的，所以 ICML有意设计成每年都从一个大陆转到另一个大陆举办。每一项有趣的研究都能在这些核心会议上得到发表——ICML、NIPS、ICLR；论文还会在 arXiv上发布，这就像是一个互联网科学研究档案馆。因此，就像我说的，和做技术领域的人不同，这个领域搞科研的人基本都不思考国家安全和政策。技术成就和科学的算法发展是完全不同的区域。
CRAIG: The significance then of these national strategies that are being published are simply to build up a national cohort of competent researchers and engineers so that each country that has a national strategy feels that it's got a stake in the game.
MISHA: The way I think of it is, if you look at the pipeline of science, it starts with education and basically you need to find certain professors and labs and directions to make sure that you have, you know, experts being educated in those fields and then those experts can go on and become academics or found companies and then create businesses and create technologies.
And so, in that sense I view it as an economic question - is that if there is so much economy that is being transformed by AI the implication of that is that for every economy to progress it is essential to create experts in the fields that are the fields of the future. And so that's where it's a vital economic imperative for every, you know, technically advanced country to have academic pipelines that prepare people and so as a result, since economic pipelines are typically funded by governments, therefore governments must fund and must adjust their spending priorities to, to fund those fields. And that's where I don't view it through the lens of security at all, I view it through the lens of technology and, and economics first and foremost.
CRAIG: Yeah. And we were talking before about how many competent scientists there are in the field globally right now and it's anywhere from 20 or 30 thousand to 200,000 but it still remains a relatively small number.
Craig：我们之前谈到了目前全球这一领域内称职的科学家的数量，估计是在 2万或三万到 20万之间，但相对来说仍然很少。
MISHA: It really depends on where you draw the line between scientists and engineers and data scientists - because they straddle both worlds, they are as much analyst and scientist as they are engineers. Hundreds of thousands sounds right. But again, it very much depends on where you draw the bar in terms of what makes one.
CRAIG: Do you have any, any sense of where Russia stands in that vis a vis China and the United States, in terms of numbers of…
MISHA: Oh, it's definitely high. I mean if you look, I mean Russia is just like China. And this is where the distinction by states is not as relevant anymore because if you look at names, from which you could infer ethnic origin, you'll see of course lots of Russian names and lots of Chinese names. But, at the same time, if you look at their affiliations people are all over the place and likewise they will see some non-Russian or non-Chinese names being affiliated with Russian or Chinese institutions. So that's where it's, there's definitely a high presence, I mean of, of Chinese scientists or Russian scientists. But there is definitely a global ecosystem of everybody moving around and kind of, and really collaborating, so that's where it's not as much of a national issue.
CRAIG: So, the public perception of there being a sort of arms race in AI is mistaken or is a misnomer.
MISHA: I think it's a misnomer. I don't think it's the right... I mean it's a very kind of competitive and very antagonistic view of the world. I think now - if one sort of tends to view the world this way they certainly can, but, but it's - I think it's not the way anybody in this field views it or most people in the field wouldn't. Most people in the field view it much more in terms of areas you work on and then in terms of collaboration - everybody collaborates with everybody so you know there's not – ‘arms race’ implies, you know, hard competition and, you know, hard distinctions and that's not the case where it's a, it's a very much a collaborative enterprise for everybody.
CRAIG: There's a lot of talk about AI safety as the systems become more generalized and also as the militaries start looking at military applications. Are you optimistic about there being international conventions that different countries will adhere to?
MISHA: I'm not an expert on this but I am generally optimistic - certainly as a person - so I certainly view it as with any other new technology. It will have applications in military as in other areas. And then it's up to basically both the community and the ruling bodies and of all flavors state and nonstate to come up with regulations and policies that are required to basically prevent gross misuses of the technology. So, I think it will certainly emerge and there's certainly awareness of potential harms of technology. But just like with everything else, you know, from lasers to semiconductors, it will basically develop as we get more cases that really set the boundaries of what is okay and what's not okay.
CRAIG: Are there areas that you feel either Yandex or Russia generally are ahead on, in either research or application. We talked before about some lack of the kinds of constraints that exist on the chatbots in the U.S.
MISHA: There’s not really, they're not hard constraints, right. I think the key question is that, whether we've been able to go forward at a much more rapid pace and that's one area where we have been moving very fast and at this point in, in terms of AI being in products, we are ahead of the other companies.
The fact that our personal assistant includes both the pre-programmed intents - you know, things about, asking about weather, for example, or asking about, you know, facts, or asking to play music - as well as the general, what's known as chitchat, where it can converse on any topic whenever the, you know, the user wants to just talk to a bot. And so that's something that no other large-scale personal assistant has deployed.
If look at Siri, if you look at Google Assistant, if you look at Alexa or Cortana – let’s say the big four of the personal assistants - they all have those hardwired intents, and then they have some pre-edited phrases for certain common requests, such as greetings for example or some Easter eggs. But none of them have a true AI sequence-to-sequence engine that is whenever the user just wants to chat, will chat back, on always, and try to be relevant. And we definitely have gotten quite a bit ahead of everybody. First of all because we have it out we have now millions of users using it. We have it being used across many services, in the phones and desktops, in cars and now in a smart speaker.
可以看看 Siri、GoogleAssistant、Alexa或小娜，可以说它们是个人助理四巨头，它们都能执行预设定的任务，有一些预编辑的短语来执行特定的常见要求，还有问候或某些彩蛋。但它们都没有真正的 AI序列到序列引擎，而如果用户想要聊天以及让聊天内容具有相关性，就需要这样的引擎。在这方面我们肯定比其它公司领先。我们最早推出了这一功能，现在已有数百万用户。我们让它运行在很多服务上，有手机和桌面电脑，有汽车，现在还有智能音箱。
Also, if you look at the core metrics of quality such as relevance, that's where we have made a lot of progress on relevance which is why people actually do it, is because they're able to get pretty, you know, what, what they view as snappy interesting answers that actually are not silly.
CRAIG: And that's a learning system, so is it continually improving?
MISHA: We definitely are learning from user feedback and as people talk to it, there's plenty of cues, such as, well, if somebody was talking for longer, if they were engaged, that's a good sign. And so, we definitely are continually improving the system.
In terms of the system changing itself with every - there's multiple ways in which it can learn, what's known as online, and that's where it gets very tricky because we've seen in the past where some of the experiments with true online learning can lead to the system basically being corrupted or a system can be exploited or trolled. And so that's where we're very careful to make sure that this evolution of the system is not basically making it worse or it's not, it cannot be exploited to start producing, you know, saying terrible things.
CRAIG: And you have this function, uh, ‘Ask Pushkin?’ Can you talk about that? Is that a hard-wired intent or is that, uh ...
MISHA: It's a third-party intent. Actually that's, so, of one of the things we have is that, like other major platforms, we have a third party skills platform where basically anybody, whether it's a, say a pizza delivery service can come in and say ‘hey, you can now, you can now ask Alice - Alice is the name of our assistant - to order pizza’ or you know you can ask, for example, like Reebok made a personal training dial-up system that they're shipping through Alice.
Misha：这是一个第三方的功能。实际上这只是我们的功能中的一个。就像其它主要平台一样，我们有一个第三方的技能平台，基本上任何人都可以加入，比如一个披萨外卖服务可以加入进来，然后用户就可以让 Alice帮助点披萨了——Alice是我们的个人助理的名字；还比如 Reebok制作了一个个人训练的拨号新系统，也可通过 Alice使用
So, talk to Pushkin - so talking to a poet or Pushkin, who is the most famous Russian poet of all time, is a third-party skill. So, we actually don't know the specific details. It is definitely amusing. It's hard to say whether, you know, Pushkin has produced an amazing body of work that it's very easy to always find a relevant phrase, or it's the folks at Arzamas, who produced the skill, have done such a good job on matching that it actually does give you fairly good poetic advice, most the time. But yeah. But we're very happy that Alice can be also a pathway to talk to a famous dead poet.
与普希金交谈是一个第三方的技能——普希金是有史以来最有名的俄罗斯诗人。所以我们实际上并不清楚具体细节。这肯定很迷人。你知道的，普希金写出了很多精美的诗篇，找到与当前话题相关的片段并不容易；开发这个技能的 Arzamas 做得很好，能够相当好地进行匹配，从而给出相当好的诗歌建议。我们也很高兴 Alice能够成为一条路径，让人们能与已经去世的著名诗人交谈。
CRAIG: What is the company that produced it?
MISHA: So Arzamas has produced some very interesting content in Russian in the past. They produce podcasts as well that are about science, about technology. They're a great production shop so we’re very glad to host them on the platform as well.
CRAIG: What are some of the other areas of research that you're focused on?
MISHA: So, if you look at the core applications of AI from – so there's the ones that everybody hears about is vision, speech, both synthesis and speech understanding, linear regression. There is also machine translation.
Machine translation is especially essential to us because obviously there's a lot of content out there in English and then because we are based in Russia and most of our users are Russian speaking, for them, we view it as a really core mission for us to basically make all the information out there available to them in the language that they understand best.
Translation is where we have made a lot of progress in recent couple years and that's where the quality is actually really high and we appreciate that the users, we hear from users quite a bit, and we're actively integrating it in multiple products, like our browser for example.
In all of these applications though there is - if you look at vision, there is a really cool new applications like super resolution where we're able, using neural networks, to make the image basically higher resolution and much prettier and you can do this with everything from TV channels that you're streaming to old movies that you can now show in much better quality.
And then just core image search, which is improving really fast. And now besides image search, you can do things like detection of certain, certain objects. So, there's lots of applications there.
And then in speech, of course, and dialogue, there is a ton of exciting stuff happening with both the core quality going up and the error going down where people are just now much more likely to make queries and to use voice because it is just being recognized correctly – to the production of speech, text to speech becoming much more natural sounding. We've, we've been working on that very heavily. There's been a lot of public recognition of progress in English that Google and DeepMind have done in recent years.
But besides that, I think one thing that is also changing very rapidly is that a lot of the time when we're dictating, we use not just sort of common dictionary words but we'll use proper names or will use and even names that are personal to us like say names from the address book. And then those systems becoming aware of basically being personalized and becoming the speech recognition recognizing not just you know the literary text but all sorts of strange names you may have in your address book or you know difficult proper names of say restaurants. That’s definitely kind of a core area that has, English improvement, has been very strong. And that's what has been driving the systems of becoming more widely used and very helpful.
So, in mission translation, just to show an example of how collaboration in the scientific sense is happening in the technology sense is that there is a topology of neural networks known as transformer that was invented by Google scientists. And we've basically taken it and then built on top of it improving both the topology but also the larger system within it to really dramatically improve the quality of translation. And so, on getting the quality of English to Russian and Russian to English translation dramatically in the past year and now the same quality increases are propagating to other directions for translation.
在翻译任务中，举个例子说明下科学和技术方面的合作方式。谷歌的科学家发明了一种名叫 transformer 的神经网络结构。我们基本上就将其拿来用了，然后基于其进行了开发。我们改进了这种结构，并且在其中构建了更大的系统，从而极大地提升了翻译质量。过去一年，我们的英俄翻译和俄英翻译的质量得到了极大提升，现在这样的提升效果也正在向其它翻译方向传播。
CRAIG: And on the computer vision, this improving video quality, is that using NVIDIA's extrapolation?
MISHA: So, we use NVIDIA for the cards but there the networks are entirely our own. So that's something that our teams have, basically they, you know, they read all the literature but the core net nets that are being used they are something that actually we're very proud of our team coming up with through lots of experimentation.
CRAIG: And we also talked last time a little bit on your view of artificial general intelligence. I mean it's the topic everybody likes to talk about.
MISHA: Well, I think it's the - what is general intelligence keeps shifting in public view because as you know things that we, have become commonplace they are no longer as dramatically exciting as they used to be and so I think in that sense if we look at assistants and what they do in terms of they're able to basically help with the variety of tasks, they're able to provide information and now they're getting to a point we're able to also, you know, have small talk with us routinely. The most stringent definitions of general intelligence will also go beyond what the system can do towards what, what is inside it. And there, you know, you would, you define it as having much higher capacity reasoning capabilities for example.
But the alternative view is saying like, well, no matter what happens inside as long as it gives, as long as it can give you relevant and good answers - whether it can be trivialized as just, you know, search and pattern matching, even though done by very powerful algorithms and networks, but still is that intelligent or not?
I mean there is a famous Chinese Room argument which crystallizes this paradox of like, well, do you care about what comes out and is it intelligent or is it really what goes on inside that defines what intelligence implies. And so, I think there is a continual improvement in terms of the quality of what comes out.
But there is when people discuss general intelligence a lot of time, they also focus on the fact that there needs to be much higher order processes inside. But that's, that's as much an engineering task as much as it is a scientific challenge. So that's where there's going to be, you know, there's lots of remaining challenges and continual progress. At the end of the day, it just comes down to better answers and better services.
CRAIG: Thanks, Misha, for your time. That’s all for this episode. Those of you who want to go into greater depth about the things we talked about today can find a transcript of this show in the program notes. Let us know whether you find the podcast interesting or useful and whether you have any suggestions about how we can improve.