和处这才是美丽冻人啊目前wwW正常的播放822nn入口啊,怎么现在的822nn变的老是com不稳定

点击联系发帖人 时间：2018-04-09 19:00

心所安处才是良知

桌面老是有一个五笔输入和搜狗输入的标志,鼠标挪到他们上面就变成等待状态，怎么回事_百度知道
桌面老是有一个五笔输入和搜狗输入的标志,鼠标挪到他们上面就变成等待状态，怎么回事
用搜狗输入时，会同时有两个图标，移动不了
我有更好的答案
告诉你一个最好的办法就是CTRL+空格键就可以了的。
采纳率：66%
重启机就好了。
其他1条回答
为您推荐：
其他类似问题
搜狗输入的相关知识
换一换
回答问题，赢新手礼包
个人、企业类
违法有害信息,请在下方选择后提交
色情、暴力
我们会通过消息、邮箱等方式尽快将举报结果通知您。&figure&&img src=&https://pic2.zhimg.com/v2-7ff4bf111cd617_b.jpg& data-rawwidth=&1214& data-rawheight=&728& class=&origin_image zh-lightbox-thumb& width=&1214& data-original=&https://pic2.zhimg.com/v2-7ff4bf111cd617_r.jpg&&&/figure&&p&这是一个为没有人工智能背景的程序员提供的机器学习上手指南。使用神经网络不需要博士学位，你也不需要成为实现人工智能下一个突破的人，你只需要使用现有的技术就行了——毕竟我们现在已经实现的东西已经很突破了，而且还非常有用。我认为我们越来越多的人将会和机器学习打交道就像我们之前越来越多地使用开源技术一样——而不再仅仅将其看作是一个研究主题。在这份指南中，我们的目标是编写一个可以进行高准确度预测的程序——仅使用图像本身来分辨 data/untrained-samples 中程序未见过的样本图像中是海豚还是海马。下面是两张图像样本：&/p&&figure&&img src=&https://pic3.zhimg.com/v2-b6dcd98cea7ba96c3fccabde_b.jpg& data-rawwidth=&600& data-rawheight=&402& class=&origin_image zh-lightbox-thumb& width=&600& data-original=&https://pic3.zhimg.com/v2-b6dcd98cea7ba96c3fccabde_r.jpg&&&/figure&&figure&&img src=&https://pic4.zhimg.com/v2-1e4ed24cb3cf36c4b49113_b.jpg& data-rawwidth=&600& data-rawheight=&375& class=&origin_image zh-lightbox-thumb& width=&600& data-original=&https://pic4.zhimg.com/v2-1e4ed24cb3cf36c4b49113_r.jpg&&&/figure&&p&为了实现我们的目标，我们将训练和应用一个卷积神经网络（CNN）。我们将从实践的角度来接近我们的目标，而不是阐释其基本原理。目前人们对人工智能有很大的热情，但其中很多都更像是让物理学教授来教你自行车技巧，而不是让公园里你的朋友来教你。&/p&&br&&p&为此，我（GitHub 用户 humphd/David Humphrey）决定在 GitHub 上写下我的指南，而不是直接发在博客上，因为我知道我下面的写的一切可能会有些误导、天真或错误。我目前仍在自学，我发现现在还很缺乏可靠的初学者文档。如果你觉得文章有错误或缺失了某些重要的细节，请发送一个 pull 请求。下面就让我教你「自行车的技巧」吧！&/p&&br&&p&指南地址：&a href=&http://link.zhihu.com/?target=https%3A//github.com/humphd& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&humphd (David Humphrey)&/a&&/p&&br&&p&&strong&概述&/strong&&/p&&br&&p&我们将在这里探索以下内容：&/p&&br&&ul&&li&&p&设置和使用已有的、开源的机器学习技术，尤其是 Caffe 和 DIDITS&/p&&/li&&li&&p&创建一个图像数据集&/p&&/li&&li&&p&从头开始训练一个神经网络&/p&&/li&&li&&p&在我们的神经网络从未见过的图像上对其进行测试&/p&&/li&&li&&p&通过微调已有的神经网络（AlexNet 和 GoogLeNet）来提升我们的神经网络的准确度&/p&&/li&&li&&p&部署和使用我们的神经网络&/p&&/li&&/ul&&br&&p&&em&&strong&问：我知道你说过我们不会谈论神经网络理论，但我觉得在我们开始动手之前至少应该来一点总体概述。我们应该从哪里开始？&/strong&&/em&&/p&&br&&p&对于神经网络的理论问题，你能在网上找到海量的介绍文章——从短帖子到长篇论述到在线课程。根据你喜欢的学习形式，这里推荐了三个比较好的起点选择：&/p&&br&&ul&&li&&p&J Alammar 的博客《A Visual and Interactive Guide to the Basics of Neural Networks》非常赞，使用直观的案例介绍了神经网络的概念：&a href=&http://link.zhihu.com/?target=https%3A//jalammar.github.io/visual-interactive-guide-basics-neural-networks/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&A Visual and Interactive Guide to the Basics of Neural Networks&/a&&/p&&/li&&li&&p&Brandon Rohrer 的这个视频是非常好的卷积神经网络介绍：&a href=&http://link.zhihu.com/?target=https%3A//www.youtube.com/watch%3Fv%3DFmpDIaiMIeA& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&https://www.&/span&&span class=&visible&&youtube.com/watch?&/span&&span class=&invisible&&v=FmpDIaiMIeA&/span&&span class=&ellipsis&&&/span&&/a&&/p&&/li&&li&&p&如果你想了解更多理论上的知识，我推荐 Michael Nielsen 的在线书籍《Neural Networks and Deep Learning》：&a href=&http://link.zhihu.com/?target=http%3A//neuralnetworksanddeeplearning.com/index.html& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&Neural networks and deep learning&/a&&/p&&/li&&/ul&&br&&p&&strong&设置&/strong&&/p&&br&&p&&strong&安装 Caffe&/strong&&/p&&br&&p&Caffe 地址：&a href=&http://link.zhihu.com/?target=http%3A//caffe.berkeleyvision.org/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&Caffe | Deep Learning Framework&/a&&/p&&br&&p&首先，我们要使用来自伯克利视觉和学习中心（Berkely Vision and Learning Center）的 Caffe 深度学习框架（BSD 授权）。&/p&&br&&p&&strong&&em&问：稍等一下，为什么选择 Caffe？为什么不选现在人人都在谈论的 TensorFlow？&/em&&/strong&&/p&&br&&p&没错，我们有很多选择，你也应该了解一下所有的选项。TensorFlow 确实很棒，你也应该试一试。但是这里选择 Caffe 是基于以下原因：&/p&&br&&ul&&li&&p&这是为计算机视觉问题定制的&/p&&/li&&li&&p&支持 C++ 和 Python（即将支持 node.js：&a href=&http://link.zhihu.com/?target=https%3A//github.com/silklabs/node-caffe& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&silklabs/node-caffe&/a&）(&a href=&http://link.zhihu.com/?target=https%3A//github.com/silklabs/node-caffe%25EF%25BC%2589& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&https://&/span&&span class=&visible&&github.com/silklabs/nod&/span&&span class=&invisible&&e-caffe%EF%BC%89&/span&&span class=&ellipsis&&&/span&&/a&)&/p&&/li&&li&&p&快速且稳定&/p&&/li&&/ul&&br&&p&但是我选择 Caffe 的头号原因是不需要写任何代码就能使用它。你可以声明性地完成所有工作（Caffe 使用结构化的文本文件来定义网络架构），并且也可以使用命令行工具。另外，你也可以为 Caffe 使用一些漂亮的前端，这能让你的训练和验证过程简单很多。基于同样的原因，下面我们会选择 NVIDIA 的 DIGITS。&/p&&br&&p&Caffe 的安装有点麻烦。这里有不同平台的安装说明，包括一些预构建的 Docker 或 AWS 配置：&a href=&http://link.zhihu.com/?target=http%3A//caffe.berkeleyvision.org/installation.html& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&Caffe | Installation&/a&&/p&&br&&p&注：当我在进行练习的时候，我使用了来自 GitHub 的尚未发布的 Caffe 版本：&a href=&http://link.zhihu.com/?target=https%3A//github.com/BVLC/caffe/commit/5a201ddcefd9fa9e2a40d2c76ddd73& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&Merge pull request #5057 from Queuecumber/cuda-pascal-cmake · BVLC/caffe@5a201dd&/a&&/p&&br&&p&在 Mac 要配置成功则难得多，这个版本有一些版本问题会在不同的步骤终止你的进度。我用了好几天时间来试错，我看了十几个指南，每一个都有一些不同的问题。最后发现这个最为接近：&a href=&http://link.zhihu.com/?target=https%3A//gist.github.com/doctorpangloss/f8463bddce2a91bea1dcbe4& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&https://&/span&&span class=&visible&&gist.github.com/doctorp&/span&&span class=&invisible&&angloss/f8463bddce2a91bea1dcbe4&/span&&span class=&ellipsis&&&/span&&/a&。另外我还推荐：&a href=&http://link.zhihu.com/?target=https%3A//eddiesmo.wordpress.com//how-to-set-up-caffe-environment-and-pycaffe-on-os-x-10-12-sierra/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&How To: Set up Caffe Environment and pycaffe on OS X 10.12 Sierra&/a&，这篇文章比较新而且链接了许多类似的讨论。&/p&&br&&p&到目前为止，安装 Caffe 就是我们做的最难的事情，这相当不错，因为你可能原来还以为人工智能方面会更难呢！&/p&&br&&p&如果安装遇到问题请不要放弃，痛苦是值得的。如果我会再来一次，我可能会使用一个 Ubuntu 虚拟机，而不是直接在 Mac 上安装。如果你有问题要问，可以到 Caffe 用户讨论组：&a href=&http://link.zhihu.com/?target=https%3A//groups.google.com/forum/%23%21forum/caffe-users& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&https://&/span&&span class=&visible&&groups.google.com/forum&/span&&span class=&invisible&&/#!forum/caffe-users&/span&&span class=&ellipsis&&&/span&&/a&&/p&&br&&p&&em&&strong&问：我需要一个强大的硬件来训练神经网络吗？要是我没法获取一个强大的 GPU 怎么办？&/strong&&/em&&/p&&br&&p&是的，深度神经网络确实需要大量的算力和能量……但那是在从头开始训练并且使用了巨型数据集的情况。我们不需要那么做。我们可以使用一个预训练好的网络（其它人已经为其投入了数百小时的计算和训练），然后根据你的特定数据进行微调即可。我们后面会介绍如何实现这一目标，但首先我要向你说明：后面的一切工作都是在一台没有强大 GPU 的一年前的 MacBook 上完成的。&/p&&br&&p&另外说明一点，因为我有一块集成英特尔显卡，而不是英伟达的 GPU，所以我决定使用 OpenCL Caffe 分支：&a href=&http://link.zhihu.com/?target=https%3A//github.com/BVLC/caffe/tree/opencl& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&BVLC/caffe&/a&，它在我的笔记本电脑上效果良好！&/p&&br&&p&当你安装完 Caffe 之后，你应该有或能够做下列事情：&/p&&br&&ul&&li&&p&一个包含了你构建的 Caffe 的目录。如果你是按标准方式做的，应该会有一个 build/ 目录包含了运行 Caffe 所需的一切、捆绑的 Python 等等，build/ 的父目录将是你的 CAFFE_ROOT（后面我们会用到它）&/p&&/li&&li&&p&运行 make test && make runtest，应该会通过&/p&&/li&&li&&p&安装了所有的 Python 依赖包之后（在 python/ 中执行 for req in $(cat requirements.txt); do pip install $ done；运行 make pycaffe && make pytest 应该会通过&/p&&/li&&li&&p&你也应该运行 make distribute 以在 distribute/ 中创建一个带有所有必要的头文件、二进制文件等的可分发的 Caffe 版本&/p&&/li&&/ul&&br&&p&在我的机器上，Caffe 完全构建后，我的 CAFFE_ROOT 目录有以下基本布局：&/p&&br&&blockquote&&p&&em&caffe/&/em&&/p&&p&&em& build/&/em&&/p&&p&&em& python/&/em&&/p&&p&&em& lib/&/em&&/p&&p&&em& tools/&/em&&/p&&p&&em& caffe ← this is our main binary &/em&&/p&&p&&em& distribute/&/em&&/p&&p&&em& python/&/em&&/p&&p&&em& lib/&/em&&/p&&p&&em& include/&/em&&/p&&p&&em& bin/&/em&&/p&&p&&em& proto/&/em&&/p&&/blockquote&&p&到现在，我们有了训练、测试和编程神经网络所需的一切。下一节我们会为 Caffe 增加一个用户友好的基于网页的前端 DIGITS，这能让我们对网络的训练和测试变得更加简单。&/p&&br&&p&&strong&安装 DIGITS&/strong&&/p&&br&&p&DIGITS 地址：&a href=&http://link.zhihu.com/?target=https%3A//github.com/NVIDIA/DIGITS& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&NVIDIA/DIGITS&/a&&/p&&br&&p&英伟达的深度学习 GPU 训练系统（Deep Learning GPU Training System/DIGITS）是一个用于训练神经网络的 BSD 授权的 Python 网页应用。尽管我们可以在 Caffe 中用命令行或代码做到 DIGITS 所能做到的一切，但使用 DIGITS 能让我们的工作变得更加简单。而且因为 DIGITS 有很好的可视化、实时图表等图形功能，我觉得使用它也能更有乐趣。因为你正在尝试和探索学习，所以我强烈推荐你从 DIGITS 开始。&/p&&br&&p&在 &a href=&http://link.zhihu.com/?target=https%3A//github.com/NVIDIA/DIGITS/tree/master/docs& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&NVIDIA/DIGITS&/a& 有一些非常好的文档，包括一些安装、配置和启动的页面。我强烈建议你在继续之前通读一下。我并不是一个使用 DIGITS 的专家，如果有问题可以在公开的 DIGITS 用户组查询或询问：&a href=&http://link.zhihu.com/?target=https%3A//groups.google.com/forum/%23%21forum/digits-users& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&https://&/span&&span class=&visible&&groups.google.com/forum&/span&&span class=&invisible&&/#!forum/digits-users&/span&&span class=&ellipsis&&&/span&&/a&&/p&&br&&p&安装 DIGITS 的方式有很多种，从 Docker 到 Linux 上的 pre-baked package，或者你也可以从源代码构建。我用的 Mac，所以我就是从源代码构建的。&/p&&br&&p&注：在我的实践中，我使用了 GitHub 上未发布的 DIGITS 版本：&a href=&http://link.zhihu.com/?target=https%3A//github.com/NVIDIA/DIGITS/commit/81be5131821ade454ebd7c09753d9& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&Add steps to specify the Python layer file (#1347) · NVIDIA/DIGITS@81be513&/a&&/p&&br&&p&因为 DIGITS 只是一些 Python 脚本，所以让它们工作起来很简单。在启动服务器之前你要做的事情是设置一个环境变量，告诉 DIGITS 你的 CAFFE_ROOT 的位置在哪里：&/p&&br&&blockquote&&p&&em&export CAFFE_ROOT=/path/to/caffe&/em&&/p&&p&&em&./digits-devserver&/em&&/p&&/blockquote&&p&注：在 Mac 上，这些服务器脚本出现了一些问题，可能是因为我的 Python 二进制文件叫做 python2，其中我只有 python2.7。&/p&&br&&p&你可以在 /usr/bin 中 symlink 它或在你的系统上修改 DIGITS 启动脚本以使用合适的二进制文件。&/p&&br&&p&一旦服务器启动，你可以在你的浏览器中通过 &a href=&http://link.zhihu.com/?target=http%3A//localhost%3A5000& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&localhost:5000&/span&&span class=&invisible&&&/span&&/a& 来完成一切后续工作。&/p&&br&&p&&strong&训练一个神经网络&/strong&&/p&&br&&p&训练神经网络涉及到几个步骤：&/p&&br&&p&1. 准备一个带有分类图像的数据集&/p&&p&2. 定义网络架构&/p&&p&3. 使用准备好的数据集训练和验证这个网络&/p&&br&&p&下面我们会做这三个步骤，以体现从头开始和使用预训练的网络之间的差异，同时也展示如何使用 Caffe 和 DIGITS 上最常用的两个预训练的网络 AlexNet、 GoogLeNet。&/p&&br&&p&对于我们的训练，我们将使用一个海豚（Dolphins）和海马（Seahorses）图像的小数据集。这些图像放置在 data/dolphins-and-seahorses。你至少需要两个类别，可以更多（有些我们将使用的网络在 1000 多个类别上进行了训练）。我们的目标是：给我们的网络展示一张图像，它能告诉我们图像中的是海豚还是海马。&/p&&br&&p&&strong&准备数据集&/strong&&/p&&br&&blockquote&&p&&em&dolphins-and-seahorses/&/em&&/p&&p&&em& dolphin/&/em&&/p&&p&&em& image_0001.jpg&/em&&/p&&p&&em& image_0002.jpg&/em&&/p&&p&&em& image_0003.jpg&/em&&/p&&p&&em& ...&/em&&/p&&p&&em& seahorse/&/em&&/p&&p&&em& image_0001.jpg&/em&&/p&&p&&em& image_0002.jpg&/em&&/p&&p&&em& image_0003.jpg&/em&&/p&&p&&em& ...&/em&&/p&&/blockquote&&br&&p&最简单的开始方式就是将你的图片按不同类别建立目录：&/p&&br&&p&在上图中的每一个目录都是按将要分类的类别建立的，所建文件夹目录下是将以用于训练和验证的图片。&/p&&br&&p&&em&&strong&问：所有待分类和验证的图片必须是同样大小吗？文件夹的命名有影响吗？&/strong&&/em&&/p&&br&&p&回答都是「否」。图片的大小会在图片输入神经网络之前进行规范化处理，我们最终需要的图片大小为 256×256 像素的彩色图片，但是 DIGITS 可以很快地自动裁切或缩放（我们采用缩放）我们的图像。文件夹的命名没有任何影响——重要的是其所包含的图片种类。&/p&&br&&p&&em&&strong&问：我能对这些类别做更精细的区分吗？&/strong&&/em&&/p&&br&&p&当然可以。详见 &a href=&http://link.zhihu.com/?target=https%3A//github.com/NVIDIA/DIGITS/blob/digits-4.0/docs/ImageFolderFormat.md& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&NVIDIA/DIGITS&/a&。&/p&&br&&p&我们要用这些图片来创建一个新的数据集，准确的说是一个分类数据集（Classification Dataset）。&/p&&p&&figure&&img src=&https://pic1.zhimg.com/v2-d89b34022c_b.png& data-rawwidth=&771& data-rawheight=&437& class=&origin_image zh-lightbox-thumb& width=&771& data-original=&https://pic1.zhimg.com/v2-d89b34022c_r.jpg&&&/figure&我们会使用 DIGITS 的默认设置，并把我们的训练图片文件路径设置到 data/dolphins-and-seahorses 文件夹。如此一来，DIGITS 将会使用这些标签（dolphin 和 seahorse）来创建一个图像缩放过的数据集——图片的大小将会是 256×256，其中 75% 的为训练图片，25% 的为测试图片。&/p&&br&&p&给你的数据集起一个名字，如 dolphins-and-seahorses，然后鼠标点击创建（Create）。&/p&&figure&&img src=&https://pic1.zhimg.com/v2-14e3f916c6c461c04ad64dc_b.png& data-rawwidth=&985& data-rawheight=&1096& class=&origin_image zh-lightbox-thumb& width=&985& data-original=&https://pic1.zhimg.com/v2-14e3f916c6c461c04ad64dc_r.jpg&&&/figure&&br&&p&通过上面的步骤我们已经创建了一个数据集了，在我的笔记本上只需要 4 秒就可以完成。最终在所建的数据集里有 2 个类别的 92 张训练图片（其中 49 张 dolphin，43 张 seahorse），另外还有 30 张验证图片（16 张 dolphin 和 14 张 seahorse）。不得不说这的确是一个非常小的数据集，但是对我们的示范试验和 DIGITS 操作学习来说已经足够了，因为这样网络的训练和验证就不会用掉太长的时间了。&/p&&p&你可以在这个数据库文件夹里查看压缩之后的图片。&/p&&figure&&img src=&https://pic2.zhimg.com/v2-8bbf36febddc8b284a71_b.png& data-rawwidth=&981& data-rawheight=&1061& class=&origin_image zh-lightbox-thumb& width=&981& data-original=&https://pic2.zhimg.com/v2-8bbf36febddc8b284a71_r.jpg&&&/figure&&p&&strong&训练尝试 1：从头开始&/strong&&/p&&br&&p&回到 DIGITS 的主页，我们需要创建一个新的分类模型（Classification Model）：&/p&&br&&p&&figure&&img src=&https://pic4.zhimg.com/v2-f94e502b7d86b512f8935bfae38d171b_b.png& data-rawwidth=&995& data-rawheight=&438& class=&origin_image zh-lightbox-thumb& width=&995& data-original=&https://pic4.zhimg.com/v2-f94e502b7d86b512f8935bfae38d171b_r.jpg&&&/figure&我们将开始用上一步所建立的 dolphins-and-seahorses 数据集来训练模型，仍然使用 DIGITS 的默认设置。对于第一个神经网络模型，我们可以从提供的神经网络架构中选取一个既有的标准模型，即 AlexNet。AlexNet 的网络结构在 2012 年的计算机视觉竞赛 ImageNet 中获胜过（ImageNet 为计算机视觉顶级比赛）。在 ImageNet 竞赛里需要完成 120 万张图片中 1000 多类图片的分类。&/p&&figure&&img src=&https://pic1.zhimg.com/v2-08589dbebaadc3a833b00_b.png& data-rawwidth=&962& data-rawheight=&1425& class=&origin_image zh-lightbox-thumb& width=&962& data-original=&https://pic1.zhimg.com/v2-08589dbebaadc3a833b00_r.jpg&&&/figure&&p&Caffe 使用结构化文本文件（structured text files）来定义网络架构，其所使用的文本文件是基于谷歌的 Protocol Buffer。你可以阅读 Caffe 采用的方案：&a href=&http://link.zhihu.com/?target=https%3A//github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&BVLC/caffe&/a&。其中大部分内容在这一部分的神经网络训练的时候都不会用到，但是了解这些构架对于使用者还是很有用的，因为在后面的步骤里我们将会对它们进行调整。AlexNet 的 prototxt 文件是这样的，一个实例： &a href=&http://link.zhihu.com/?target=https%3A//github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&BVLC/caffe&/a&。&/p&&br&&p&我们将会对这个神经网络进行 30 次 epochs，这意味着网络将会进行学习（运用我们的训练图片）并自行测试（运用我们的测试图片），然后根据训练的结果调整网络中各项参数的权重值，如此重复 30 次。每一次 epoch 都会输出一个分类准确值（Accuracy，介于 0% 到 100% 之间，当然值越大越好）和一个损失度（Loss，所有错误分类的比率，值越小越好）。理想的情况是我们希望所训练的网络能够有较高的准确率（Accuracy）和较小的损失度（Loss）。&/p&&br&&p&初始训练的时候，所训练网络的准确率低于 50%。这是情理之中的，因为第一次 epoch，网络只是在随意猜测图片的类别然后任意设置权重值。经过多次 epochs 之后，最后能够有 87.5% 的准确率，和 0.37 的损失度。完成 30 次的 epochs 只需不到 6 分钟的时间。&/p&&p&&figure&&img src=&https://pic4.zhimg.com/v2-fdc2aaa024d35f56b4e4d174b7aa54f3_b.png& data-rawwidth=&576& data-rawheight=&500& class=&origin_image zh-lightbox-thumb& width=&576& data-original=&https://pic4.zhimg.com/v2-fdc2aaa024d35f56b4e4d174b7aa54f3_r.jpg&&&/figure&我们可以上传一张图片或者用一个 URL 地址的图片来测试训练完的网络。我们来测试一些出现在我们训练和测试数据集中的图片：&/p&&p&&figure&&img src=&https://pic2.zhimg.com/v2-42ecfc8f82cb5b35d036101_b.png& data-rawwidth=&628& data-rawheight=&345& class=&origin_image zh-lightbox-thumb& width=&628& data-original=&https://pic2.zhimg.com/v2-42ecfc8f82cb5b35d036101_r.jpg&&&/figure&&figure&&img src=&https://pic1.zhimg.com/v2-ab3bbc26eeacc_b.png& data-rawwidth=&639& data-rawheight=&348& class=&origin_image zh-lightbox-thumb& width=&639& data-original=&https://pic1.zhimg.com/v2-ab3bbc26eeacc_r.jpg&&&/figure&网络的分类结果非常完美，当我们测试一些不属于我们训练和测试数据集的其他图片时：&/p&&figure&&img src=&https://pic2.zhimg.com/v2-bddb8c89a32b8c15483c49_b.png& data-rawwidth=&636& data-rawheight=&343& class=&origin_image zh-lightbox-thumb& width=&636& data-original=&https://pic2.zhimg.com/v2-bddb8c89a32b8c15483c49_r.jpg&&&/figure&&p&分类的准确率直接掉下来了，误把 seahorse 分类为 dolphin，更糟糕的是网络对这样的错误分类有很高的置信度。&/p&&br&&p&事实是我们的数据集太小了，根本无法用来训练一个足够好的神经网络。我们需要数万乃至数百万张图片才能训练一个有用的神经网络，用这么多的图片也意味着需要很强劲的计算能力来完成所有的计算过程。&/p&&br&&p&&strong&训练尝试 2：微调 AlexNet&/strong&&/p&&br&&p&怎么微调网络&/p&&br&&p&从头设计一个神经网络，收集足量的用以训练这个网络的数据（如，海量的图片），并在 GPU 上运行数周来完成网络的训练，这些条件远非我们大多数人可以拥有。能够以更加实际——用较小一些的数据集来进行训练，我们运用一个称为迁移学习（Transfer Learning）或者说微调（Fine Tuning）的技术。Fine tuning 借助深度学习网络的输出，运用已训练好的神经网络来完成最初的目标识别。&/p&&br&&p&试想使用神经网络的过程就好比使用一个双目望远镜看远处的景物。那么当你第一次把双目望远镜放到眼前的时候，你看到的是一片模糊。当你开始调焦的时候，你慢慢可以看出颜色、线、形状，然后最终你可以分辨出鸟的外形，在此之上你进一步调试从而可以识别出鸟的种类。&/p&&br&&p&在一个多层网络中，最开始的几层是用于特征提取的（如，边线），之后的网络层通过这些提取的特征来识别外形「shape」（如，一个轮子，一只眼睛），然后这些输出将会输入到最后的分类层，分类层将会根据之前所有层的特征积累来确定待分类目标的种类（如，判断为猫还是狗）。一个神经网络从像素、线形、眼睛、两只眼睛的确定位置，这样的步骤来一步步确立分类目标的种类（这里是猫）。&/p&&br&&p&我们在这里所做的就是给新的分类图片指定一个已训练好的网络用于初始化网络的权重值，而不是用新构建网络自己的初始权重。因为已训练好的网络已经具备「看」图片特征的功能的，我们所需要的是这个已训练的网络能「看」我们所建图片数据集——这一具体任务中特定类型的图片。我们不需要从头开始训练大部分的网络层——我们只需要将已训练网络中已经学习的层转接到我们新建的分类任务上来。不同于我们的上一次的实验，在上次实验中网络的初始权重值是随机赋予的，这次实验中我们直接使用已经训练网络的最终权重值作为我们新建网络的初始权重值。但是，必须去除已经训练好的网络的最后分类层并用我们自己的图片数据集再次训练这个网络，即在我们自己的图片类上微调已训练的网络。&/p&&br&&p&对于这次实验，我们需要一个与经由与我们训练数据足够相似的数据集所训练的网络，只有这样已训练网络的权重值才对我们有用。幸运的是，我们下面所使用的网络是在海量数据集（自然图片集 ImageNet）上训练得到的，这样的已训练网络能满足大部分分类任务的需要。&/p&&br&&p&这种技术已经被用来做一些很有意思的任务如医学图像的眼疾筛查，从海里收集到的显微图像中识别浮游生物物种，给 Flickr 上的图片进行艺术风格分类。&/p&&br&&p&完美的完成这些任务，就像所有的机器学习一样，你需要很好的理解数据以及神经网络结构——你必须对数据的过拟合格外小心，你或许需要调整一些层的设置，也很有可能需要插入一些新的网络层，等等类似的调整。但是，我们的经验表明大部分时候还是可以完成任务的「Just work」，而且用我们这么原始的方法去简单尝试一下看看结果如何是很值得的。&/p&&br&&p&上传预训练网络&/p&&br&&p&在我们的第一次尝试中，我们使用了 AlexNet 的架构，但是网络各层的权重是随机分布的。我们需要做的就是需要下载使用一个已经经过大量数据集训练的 AlexNet。&/p&&br&&p&AlexNet 的快照（Snapshots）如下，可供下载：&a href=&http://link.zhihu.com/?target=https%3A//github.com/BVLC/caffe/tree/master/models/bvlc_alexnet& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&BVLC/caffe&/a&。我们需要一个二进制文件 .caffemodel，含有训练好的权重，可供下载 &a href=&http://link.zhihu.com/?target=http%3A//dl.caffe.berkeleyvision.org/bvlc_alexnet.caffemodel& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&dl.caffe.berkeleyvision.org&/span&&span class=&invisible&&/bvlc_alexnet.caffemodel&/span&&span class=&ellipsis&&&/span&&/a&。在你下载这些与训练模型的时候，让我们来趁机多学点东西。2014 年的 ImageNet 大赛中，谷歌利用其开源的 GoogLeNet (&a href=&http://link.zhihu.com/?target=https%3A//research.google.com/pubs/pub43022.html& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&https://&/span&&span class=&visible&&research.google.com/pub&/span&&span class=&invisible&&s/pub43022.html&/span&&span class=&ellipsis&&&/span&&/a&)（一个 22 层的神经网络）赢得了比赛。GoogLeNet 的快照如下，可供下载： &a href=&http://link.zhihu.com/?target=https%3A//github.com/BVLC/caffe/tree/master/models/bvlc_googlenet& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&BVLC/caffe&/a&。在具备了所有的预训练权重之后，我们还需要.caffemodel 文件，可供下载：&a href=&http://link.zhihu.com/?target=http%3A//dl.caffe.berkeleyvision.org/bvlc_googlenet.caffemodel& class=& external& target=&_blank& rel=&nofollow noreferrer&&&span class=&invisible&&http://&/span&&span class=&visible&&dl.caffe.berkeleyvision.org&/span&&span class=&invisible&&/bvlc_googlenet.caffemodel&/span&&span class=&ellipsis&&&/span&&/a&.&/p&&br&&p&有了 .caffemodel 文件之后，我们既可以将它们上传到 DIGITS 当中。在 DIGITS 的主页当中找到预训练模型（Pretrained Models）的标签，选择上传预训练模型（Upload Pretrained Model）：&/p&&figure&&img src=&https://pic4.zhimg.com/v2-d6eeddce57b472c22b57477_b.png& data-rawwidth=&985& data-rawheight=&408& class=&origin_image zh-lightbox-thumb& width=&985& data-original=&https://pic4.zhimg.com/v2-d6eeddce57b472c22b57477_r.jpg&&&/figure&&p&对于这些预训练的模型，我们可以使用 DIGITS 的默认值（例如，大小为 256×256 像素的彩色图片）。我们只需要提供 Weights (.caffemodel) 和 Model Definition (original.prototxt)。点击这些按钮来选择文件。&/p&&br&&p&模型的定义，GoogLeNet 我们可以使用 &a href=&http://link.zhihu.com/?target=https%3A//github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/train_val.prototxt& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&BVLC/caffe&/a&，AlexNet 可以使用 &a href=&http://link.zhihu.com/?target=https%3A//github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&BVLC/caffe&/a&。我们不打算使用这些网络的分类标签，所以我们可以直接添加一个 labels.txt 文件：&/p&&figure&&img src=&https://pic3.zhimg.com/v2-54c04cba_b.png& data-rawwidth=&492& data-rawheight=&496& class=&origin_image zh-lightbox-thumb& width=&492& data-original=&https://pic3.zhimg.com/v2-54c04cba_r.jpg&&&/figure&&p&在 AlexNet 和 GoogLeNet 都重复这一过程，因为我们在之后的步骤当中两者我们都会用到。&/p&&br&&p&&em&&strong&问题：有其他的神经网络能作为微调的基础吗？&/strong&&/em&&/p&&br&&p&回答：Caffe Model Zoo 有许多其他预训练神经网络可供使用，详情请查看 &a href=&http://link.zhihu.com/?target=https%3A//github.com/BVLC/caffe/wiki/Model-Zoo& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&BVLC/caffe&/a&&/p&&br&&p&使用预训练 Caffe 模型进行人工神经网络训练就类似于从头开始实现，虽然我们只需要做一些调整。首先我们需要将学习速率由 0.01 调整到 0.001，因为我们下降步长不需要这么大（我们会进行微调）。我们还将使用预训练网络（Pretrained Network）并根据实际修改它。&/p&&figure&&img src=&https://pic2.zhimg.com/v2-1c5d5dd0da2a819b418ad3f6921da20d_b.png& data-rawwidth=&951& data-rawheight=&1709& class=&origin_image zh-lightbox-thumb& width=&951& data-original=&https://pic2.zhimg.com/v2-1c5d5dd0da2a819b418ad3f6921da20d_r.jpg&&&/figure&&p&在预训练模型的定义（如原文本）中，我们需要对最终完全连接层（输出结果分类的地方）的所有 references 重命名。我们这样做是因为我们希望模型能从现在的数据集重新学习新的分类，而不是使用以前最原始的训练数据（我们想将当前最后一层丢弃）。我们必须将最后的全连接层由「fc8」重命名为一些其他的（如 fc9）。最后我们还需要将分类类别从 1000 调整为 2，这里需要调整 num_output 为 2。&/p&&br&&p&下面是我们需要做的一些调整代码：&/p&&div class=&highlight&&&pre&&code class=&language-python&&&span&&/span&&span class=&err&&@@&/span& &span class=&o&&-&/span&&span class=&mi&&332&/span&&span class=&p&&,&/span&&span class=&mi&&8&/span& &span class=&o&&+&/span&&span class=&mi&&332&/span&&span class=&p&&,&/span&&span class=&mi&&8&/span& &span class=&err&&@@&/span&
&span class=&p&&}&/span&
&span class=&n&&layer&/span& &span class=&p&&{&/span&&span class=&o&&-&/span&
&span class=&n&&name&/span&&span class=&p&&:&/span& &span class=&s2&&&fc8&&/span&&span class=&o&&+&/span&
&span class=&n&&name&/span&&span class=&p&&:&/span& &span class=&s2&&&fc9&&/span&
&span class=&nb&&type&/span&&span class=&p&&:&/span& &span class=&s2&&&InnerProduct&&/span&
&span class=&n&&bottom&/span&&span class=&p&&:&/span& &span class=&s2&&&fc7&&/span&&span class=&o&&-&/span&
&span class=&n&&top&/span&&span class=&p&&:&/span& &span class=&s2&&&fc8&&/span&&span class=&o&&+&/span&
&span class=&n&&top&/span&&span class=&p&&:&/span& &span class=&s2&&&fc9&&/span&
&span class=&n&&param&/span& &span class=&p&&{&/span&
&span class=&n&&lr_mult&/span&&span class=&p&&:&/span& &span class=&mi&&1&/span&&span class=&err&&@@&/span& &span class=&o&&-&/span&&span class=&mi&&345&/span&&span class=&p&&,&/span&&span class=&mi&&5&/span& &span class=&o&&+&/span&&span class=&mi&&345&/span&&span class=&p&&,&/span&&span class=&mi&&5&/span& &span class=&err&&@@&/span&
&span class=&p&&}&/span&
&span class=&n&&inner_product_param&/span& &span class=&p&&{&/span&&span class=&o&&-&/span&
&span class=&n&&num_output&/span&&span class=&p&&:&/span& &span class=&mi&&1000&/span&&span class=&o&&+&/span&
&span class=&n&&num_output&/span&&span class=&p&&:&/span& &span class=&mi&&2&/span&
&span class=&n&&weight_filler&/span& &span class=&p&&{&/span&
&span class=&nb&&type&/span&&span class=&p&&:&/span& &span class=&s2&&&gaussian&&/span&&span class=&err&&@@&/span& &span class=&o&&-&/span&&span class=&mi&&359&/span&&span class=&p&&,&/span&&span class=&mi&&5&/span& &span class=&o&&+&/span&&span class=&mi&&359&/span&&span class=&p&&,&/span&&span class=&mi&&5&/span& &span class=&err&&@@&/span&
&span class=&n&&name&/span&&span class=&p&&:&/span& &span class=&s2&&&accuracy&&/span&
&span class=&nb&&type&/span&&span class=&p&&:&/span& &span class=&s2&&&Accuracy&&/span&&span class=&o&&-&/span&
&span class=&n&&bottom&/span&&span class=&p&&:&/span& &span class=&s2&&&fc8&&/span&&span class=&o&&+&/span&
&span class=&n&&bottom&/span&&span class=&p&&:&/span& &span class=&s2&&&fc9&&/span&
&span class=&n&&bottom&/span&&span class=&p&&:&/span& &span class=&s2&&&label&&/span&
&span class=&n&&top&/span&&span class=&p&&:&/span& &span class=&s2&&&accuracy&&/span&&span class=&err&&@@&/span& &span class=&o&&-&/span&&span class=&mi&&367&/span&&span class=&p&&,&/span&&span class=&mi&&5&/span& &span class=&o&&+&/span&&span class=&mi&&367&/span&&span class=&p&&,&/span&&span class=&mi&&5&/span& &span class=&err&&@@&/span&
&span class=&n&&name&/span&&span class=&p&&:&/span& &span class=&s2&&&loss&&/span&
&span class=&nb&&type&/span&&span class=&p&&:&/span& &span class=&s2&&&SoftmaxWithLoss&&/span&&span class=&o&&-&/span&
&span class=&n&&bottom&/span&&span class=&p&&:&/span& &span class=&s2&&&fc8&&/span&&span class=&o&&+&/span&
&span class=&n&&bottom&/span&&span class=&p&&:&/span& &span class=&s2&&&fc9&&/span&
&span class=&n&&bottom&/span&&span class=&p&&:&/span& &span class=&s2&&&label&&/span&
&span class=&n&&top&/span&&span class=&p&&:&/span& &span class=&s2&&&loss&&/span&&span class=&err&&@@&/span& &span class=&o&&-&/span&&span class=&mi&&375&/span&&span class=&p&&,&/span&&span class=&mi&&5&/span& &span class=&o&&+&/span&&span class=&mi&&375&/span&&span class=&p&&,&/span&&span class=&mi&&5&/span& &span class=&err&&@@&/span&
&span class=&n&&name&/span&&span class=&p&&:&/span& &span class=&s2&&&softmax&&/span&
&span class=&nb&&type&/span&&span class=&p&&:&/span& &span class=&s2&&&Softmax&&/span&&span class=&o&&-&/span&
&span class=&n&&bottom&/span&&span class=&p&&:&/span& &span class=&s2&&&fc8&&/span&&span class=&o&&+&/span&
&span class=&n&&bottom&/span&&span class=&p&&:&/span& &span class=&s2&&&fc9&&/span&
&span class=&n&&top&/span&&span class=&p&&:&/span& &span class=&s2&&&softmax&&/span&
&span class=&n&&include&/span& &span class=&p&&{&/span& &span class=&n&&stage&/span&&span class=&p&&:&/span& &span class=&s2&&&deploy&&/span& &span class=&p&&}&/span&
&/code&&/pre&&/div&&br&&p&我已经将所有的改进文件放在 src/alexnet-customized.prototxt 里面。&/p&&br&&p&这一次，我们的准确率由 60% 多先是上升到 87.5%，然后到 96% 一路到 100%，同时损失度也稳步下降。五分钟后，我们的准确率到达了 100%，损失也只有 0.0009。&/p&&p&&figure&&img src=&https://pic2.zhimg.com/v2-2ba9ebaefeea1e_b.png& data-rawwidth=&579& data-rawheight=&499& class=&origin_image zh-lightbox-thumb& width=&579& data-original=&https://pic2.zhimg.com/v2-2ba9ebaefeea1e_r.jpg&&&/figure&测试海马图像时以前的网络会出错，现在我们看到完全相反的结果，即使是小孩画的海马，系统也 100% 确定是海马，海豚的情况也一样。&/p&&p&&figure&&img src=&https://pic4.zhimg.com/v2-8ebc054bf04acb1c8793d73_b.png& data-rawwidth=&637& data-rawheight=&335& class=&origin_image zh-lightbox-thumb& width=&637& data-original=&https://pic4.zhimg.com/v2-8ebc054bf04acb1c8793d73_r.jpg&&&/figure&&figure&&img src=&https://pic4.zhimg.com/v2-eb3cabcc629c2e7cf85ad3edeed51597_b.png& data-rawwidth=&628& data-rawheight=&343& class=&origin_image zh-lightbox-thumb& width=&628& data-original=&https://pic4.zhimg.com/v2-eb3cabcc629c2e7cf85ad3edeed51597_r.jpg&&&/figure&&figure&&img src=&https://pic4.zhimg.com/v2-5f006c9aab8c2cce01e5b3_b.png& data-rawwidth=&630& data-rawheight=&342& class=&origin_image zh-lightbox-thumb& width=&630& data-original=&https://pic4.zhimg.com/v2-5f006c9aab8c2cce01e5b3_r.jpg&&&/figure&即使你认为可能很困难的图像，如多个海豚挤在一起，并且它们的身体大部分在水下，系统还是能识别。&/p&&figure&&img src=&https://pic3.zhimg.com/v2-5bb10cca15ffee94f2e59e_b.png& data-rawwidth=&626& data-rawheight=&337& class=&origin_image zh-lightbox-thumb& width=&626& data-original=&https://pic3.zhimg.com/v2-5bb10cca15ffee94f2e59e_r.jpg&&&/figure&&p&&strong&训练尝试 3：微调 GoogLeNet&/strong&&/p&&br&&p&像前面我们微调 AlexNet 模型那样，同样我们也能用 GoogLeNet。修改这个网络会有点棘手，因为你已经定义了三层全连接层而不是只有一层。&/p&&figure&&img src=&https://pic1.zhimg.com/v2-be2a1a1e8802dcbb7305d4_b.png& data-rawwidth=&954& data-rawheight=&1706& class=&origin_image zh-lightbox-thumb& width=&954& data-original=&https://pic1.zhimg.com/v2-be2a1a1e8802dcbb7305d4_r.jpg&&&/figure&&p&在这个案例中微调 GoogLeNet，我们需要再次创建一个新的分类模型：我们需要重命名三个全连接分类层的所有 references，即 loss1/classifier、loss2/classifier 和 loss3/classifier，并重新定义结果类别数（num_output: 2）。下面是我们需要将三个分类层重新命名和从 1000 改变输出类别数为 2 的一些代码实现。&/p&&br&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&@@ -917,10 +917,10 @@
exclude { stage: &deploy& }
name: &loss1/classifier&+
name: &loss1a/classifier&
type: &InnerProduct&
bottom: &loss1/fc&-
top: &loss1/classifier&+
top: &loss1a/classifier&
lr_mult: 1
decay_mult: 1@@ -930,7 +930,7 @@
decay_mult: 0
inner_product_param {-
num_output: 1000+
num_output: 2
weight_filler {
type: &xavier&
std: 0.@@ -945,7 +945,7 @@
name: &loss1/loss&
type: &SoftmaxWithLoss&-
bottom: &loss1/classifier&+
bottom: &loss1a/classifier&
bottom: &label&
top: &loss1/loss&
loss_weight: 0.3@@ -954,7 +954,7 @@
name: &loss1/top-1&
type: &Accuracy&-
bottom: &loss1/classifier&+
bottom: &loss1a/classifier&
bottom: &label&
top: &loss1/accuracy&
include { stage: &val& }@@ -962,7 +962,7 @@
name: &loss1/top-5&
type: &Accuracy&-
bottom: &loss1/classifier&+
bottom: &loss1a/classifier&
bottom: &label&
top: &loss1/accuracy-top5&
include { stage: &val& }@@ -05,10 @@
exclude { stage: &deploy& }
name: &loss2/classifier&+
name: &loss2a/classifier&
type: &InnerProduct&
bottom: &loss2/fc&-
top: &loss2/classifier&+
top: &loss2a/classifier&
lr_mult: 1
decay_mult: 1@@ -18,7 @@
decay_mult: 0
inner_product_param {-
num_output: 1000+
num_output: 2
weight_filler {
type: &xavier&
std: 0.@@ -33,7 @@
name: &loss2/loss&
type: &SoftmaxWithLoss&-
bottom: &loss2/classifier&+
bottom: &loss2a/classifier&
bottom: &label&
top: &loss2/loss&
loss_weight: 0.3@@ -42,7 @@
name: &loss2/top-1&
type: &Accuracy&-
bottom: &loss2/classifier&+
bottom: &loss2a/classifier&
bottom: &label&
top: &loss2/accuracy&
include { stage: &val& }@@ -50,7 @@
name: &loss2/top-5&
type: &Accuracy&-
bottom: &loss2/classifier&+
bottom: &loss2a/classifier&
bottom: &label&
top: &loss2/accuracy-top5&
include { stage: &val& }@@ -35,10 @@
name: &loss3/classifier&+
name: &loss3a/classifier&
type: &InnerProduct&
bottom: &pool5/7x7_s1&-
top: &loss3/classifier&+
top: &loss3a/classifier&
lr_mult: 1
decay_mult: 1@@ -48,7 @@
decay_mult: 0
inner_product_param {-
num_output: 1000+
num_output: 2
weight_filler {
type: &xavier&
}@@ -61,7 @@
name: &loss3/loss&
type: &SoftmaxWithLoss&-
bottom: &loss3/classifier&+
bottom: &loss3a/classifier&
bottom: &label&
top: &loss&
loss_weight: 1@@ -70,7 @@
name: &loss3/top-1&
type: &Accuracy&-
bottom: &loss3/classifier&+
bottom: &loss3a/classifier&
bottom: &label&
top: &accuracy&
include { stage: &val& }@@ -78,7 @@
name: &loss3/top-5&
type: &Accuracy&-
bottom: &loss3/classifier&+
bottom: &loss3a/classifier&
bottom: &label&
top: &accuracy-top5&
include { stage: &val& }@@ -89,7 @@
name: &softmax&
type: &Softmax&-
bottom: &loss3/classifier&+
bottom: &loss3a/classifier&
top: &softmax&
include { stage: &deploy& }
&/code&&/pre&&/div&&p&我己经将完整的文件放在 src/googlenet-customized.prototxt 里面。&/p&&br&&p&&em&&strong&问题：这些神经网络的原文本（prototext）定义需要做什么修改吗？我们修改了全连接层名和输出结果分类类别数，那么在什么情况下其它参数也能或也需要修改的？&/strong&&/em&&/p&&br&&p&回答：问得好，这也是我有一些疑惑的地方。例如，我知道我们能「固定」确切的神经网络层级，并保证层级之间的权重不改变。但是要做其它的一些改变就涉及到理解我们的神经网络层级是如何起作用的，这已经超出了这份入门向导的范围，同样也超出了这份向导作者现有的能力。&/p&&br&&p&就像我们对 AlexNet 进行微调，将下降的学习速率由 0.01 减少十倍到 0.001 一样。&/p&&br&&p&&em&&strong&问：还有什么修改是对这些网络微调有意义的？遍历所有数据的次数（numbers of epochs）不同怎么样，改变批量梯度下降的大小（batch sizes）怎么样，求解器的类型（Adam、 AdaDelta 和 AdaGrad 等）呢？还有下降学习速率、策略（Exponential Decay、Inverse Decay 和 Sigmoid Decay 等）、步长和 gamma 值呢？&/strong&&/em&&/p&&br&&p&问得好，这也是我有所疑惑的。我对这些只有一个模糊的理解，如果你知道在训练中如何修改这些值，那么我们很可能做出些改进，并且这需要更好的文档。&/p&&br&&p&因为 GoogLeNet 比 AlexNet 有更复杂的网络构架，所以微调需要更多的时间。在我的笔记本电脑上，用我们的数据集重新训练 GoogLeNet 需要 10 分钟，这样才能实现 100% 的准确率，同时损失函数值只有 0.0070。&/p&&p&&figure&&img src=&https://pic4.zhimg.com/v2-f43d5f4867ec8ebfc0a56cc7c88a8967_b.png& data-rawwidth=&579& data-rawheight=&499& class=&origin_image zh-lightbox-thumb& width=&579& data-original=&https://pic4.zhimg.com/v2-f43d5f4867ec8ebfc0a56cc7c88a8967_r.jpg&&&/figure&正如我们看到的 AlexNet 微调版本，我们修改过的 GoogLeNet 表现得十分惊人，是我们目前最好的。&/p&&figure&&img src=&https://pic3.zhimg.com/v2-6feefda28edcb_b.png& data-rawwidth=&634& data-rawheight=&333& class=&origin_image zh-lightbox-thumb& width=&634& data-original=&https://pic3.zhimg.com/v2-6feefda28edcb_r.jpg&&&/figure&&figure&&img src=&https://pic1.zhimg.com/v2-5e68d183ef41c6d5dab23dfcd51b4a4c_b.png& data-rawwidth=&634& data-rawheight=&341& class=&origin_image zh-lightbox-thumb& width=&634& data-original=&https://pic1.zhimg.com/v2-5e68d183ef41c6d5dab23dfcd51b4a4c_r.jpg&&&/figure&&figure&&img src=&https://pic4.zhimg.com/v2-2291effce9a050a93b6ef_b.png& data-rawwidth=&627& data-rawheight=&336& class=&origin_image zh-lightbox-thumb& width=&627& data-original=&https://pic4.zhimg.com/v2-2291effce9a050a93b6ef_r.jpg&&&/figure&&p&&strong&使用我们的模型&/strong&&/p&&br&&p&我们的网络在训练和检测之后，就可以下载并且使用了。我们利用 DIGITS 训练的每一个模型都有了一下载模型（Download Model）键，这也是我们在训练过程中选择不同 snapshots 的一种方法（例如 Epoch #30）：&/p&&figure&&img src=&https://pic3.zhimg.com/v2-52e63fe20b118c103f32_b.png& data-rawwidth=&602& data-rawheight=&151& class=&origin_image zh-lightbox-thumb& width=&602& data-original=&https://pic3.zhimg.com/v2-52e63fe20b118c103f32_r.jpg&&&/figure&&p&在点击 Download Model 之后，你就会下载一个 tar.gz 的文档，里面包含以下文件：&/p&&div class=&highlight&&&pre&&code class=&language-text&&&span&&/span&deploy.prototxt
mean.binaryproto
solver.prototxt
original.prototxt
labels.txt
snapshot_iter_90.caffemodel
train_val.prototxt
&/code&&/pre&&/div&&p&在 Caffe 文档中对我们所建立的模型使用有一段非常好的描述。如下：&/p&&br&&blockquote&&p&&em&一个网络是由其设计，也就是设计（prototxt）和权重（.caffemodel）决定。在网络被训练的过程中，网络权重的当前状态被存储在一个.caffemodel 中。这些东西我们可以从训练/检测阶段移到生产阶段。在它的当前状态中，网络的设计并不是为了部署的目的。在我们可以将我们的网络作为产品发布之前，我们通常需要通过几种方法对它进行修改：&/em&&/p&&br&&p&&em&1. 移除用来训练的数据层，因为在分类时，我们已经不再为数据提供标签了。&/em&&/p&&p&&em&2. 移除所有依赖于数据标签的层。&/em&&/p&&p&&em&3. 设置接收数据的网络。&/em&&/p&&p&&em&4. 让网络输出结果。&/em&&br&&/p&&/blockquote&&p&DIGITS 已经为我们做了这些工作，它已经将我们 prototxt 文件中所有不同的版本都分离了出来。这些文档我们在使用网络时会用到：&/p&&br&&ul&&li&&p&deploy.prototxt -是关于网络的定义，准备接收图像输入数据&/p&&/li&&li&&p&mean.binaryproto - 我们的模型需要我们减去它处理的每张图像的图像均值，所产生的就是平均图像（mean image）。&/p&&/li&&li&&p&labels.txt - 标签列表 (dolphin, seahorse)，以防我们想要把它们打印出来，否则只有类别编号。&/p&&/li&&li&&p&snapshot_iter_90.caffemodel -这些是我们网络的训练权重。&/p&&/li&&/ul&&br&&p&利用这些文件，我们可以通过多种方式对新的图像进行分类。例如，在 CAFFE_ROOT 中，我们可以使用 build/examples/cpp_classification/classification.bin 来对一个图像进行分类：&/p&&br&&blockquote&&p&&em&&strong&$ cd $CAFFE_ROOT/build/examples/cpp_classification&/strong&&/em&&/p&&p&&em&&strong&$ ./classification.bin deploy.prototxt snapshot_iter_90.caffemodel mean.binaryproto labels.txt dolphin1.jpg&/strong&&/em&&br&&/p&&/blockquote&&p&这会产生很多的调试文本，后面会跟着对这两种分类的预测结果：&/p&&br&&blockquote&&p&&strong&&em&0.9997 -「dolphin」&/em&&/strong&&/p&&p&&strong&&em&0.0003 -「seahorse」&/em&&/strong&&br&&/p&&/blockquote&&p&你可以在这个 Caffe 案例中查看完整的 C++ 源码：&a href=&http://link.zhihu.com/?target=https%3A//github.com/BVLC/caffe/tree/master/examples& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&BVLC/caffe&/a&&/p&&br&&p&使用 Python 界面和 DIGITS 进行分类的案例：&a href=&http://link.zhihu.com/?target=https%3A//github.com/NVIDIA/DIGITS/tree/master/examples/classification& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&NVIDIA/DIGITS&/a&&/p&&br&&p&最后，Caffe 的案例中还有一个非常好的 Python 演示：&a href=&http://link.zhihu.com/?target=https%3A//github.com/BVLC/caffe/blob/master/examples/00-classification.ipynb& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&BVLC/caffe&/a&&/p&&br&&p&我希望可以有更多更好的代码案例、API 和预先建立的模型等呈现给大家。老实说，我找到的大多数代码案例都非常的简短，并且文档介绍很少——Caffe 的文档虽然有很多，但也有好有坏。对我来说，这似乎意味着会有人为初学者建立比 Caffe 更高级的工具。如果说在高级语言中出现了更加简单的模型，我可以用我们的模型「做正确的事情」；应该有人将这样的设想付诸行动，让使用 Caffe 模型变得像使用 DIGITS 训练它们一样简单。当然我们不需要对这个模型或是 Caffe 的内部了解那么多。虽然目前我还没有使用过 DeepDetect，但是它看起来非常的有趣，另外仍然还有其他我不知道的工具。&/p&&br&&p&&strong&结果&/strong&&/p&&br&&p&文章开头提到，我们的目标是编写一个使用神经网络对 data/untrained-samples 中所有的图像进行高准确度预测的程序。这些海豚和海马的图像是在训练数据或是验证数据时候从未使用过的。&/p&&br&&p&&strong&未被训练过的海豚图像&/strong&&/p&&p&&figure&&img src=&https://pic3.zhimg.com/v2-b6dcd98cea7ba96c3fccabde_b.jpg& data-rawwidth=&600& data-rawheight=&402& class=&origin_image zh-lightbox-thumb& width=&600& data-original=&https://pic3.zhimg.com/v2-b6dcd98cea7ba96c3fccabde_r.jpg&&&/figure&&figure&&img src=&https://pic1.zhimg.com/v2-cbbde4b854c8c48e2fe0_b.jpg& data-rawwidth=&600& data-rawheight=&441& class=&origin_image zh-lightbox-thumb& width=&600& data-original=&https://pic1.zhimg.com/v2-cbbde4b854c8c48e2fe0_r.jpg&&&/figure&&figure&&img src=&https://pic1.zhimg.com/v2-9eaf05cc51fcb_b.jpg& data-rawwidth=&600& data-rawheight=&400& class=&origin_image zh-lightbox-thumb& width=&600& data-original=&https://pic1.zhimg.com/v2-9eaf05cc51fcb_r.jpg&&&/figure&&b&未被训练过的海马图像&/b&&/p&&figure&&img src=&https://pic4.zhimg.com/v2-1e4ed24cb3cf36c4b49113_b.jpg& data-rawwidth=&600& data-rawheight=&375& class=&origin_image zh-lightbox-thumb& width=&600& data-original=&https://pic4.zhimg.com/v2-1e4ed24cb3cf36c4b49113_r.jpg&&&/figure&&figure&&img src=&https://pic3.zhimg.com/v2-b429dd15f7d8f30a4c8c826_b.jpg& data-rawwidth=&667& data-rawheight=&442& class=&origin_image zh-lightbox-thumb& width=&667& data-original=&https://pic3.zhimg.com/v2-b429dd15f7d8f30a4c8c826_r.jpg&&&/figure&&br&&figure&&img src=&https://pic3.zhimg.com/v2-d1ad863bb4cab0a_b.jpg& data-rawwidth=&480& data-rawheight=&640& class=&origin_image zh-lightbox-thumb& width=&480& data-original=&https://pic3.zhimg.com/v2-d1ad863bb4cab0a_r.jpg&&&/figure&&p&接下来，让我们一起来看看在这一挑战当中存在的三次尝试的结果：&/p&&br&&p&模型尝试 1：从零开始构建 AlexNet（第 3 位）&/p&&p&&figure&&img src=&https://pic3.zhimg.com/v2-75ab03c6b9bde22a7ad2aa_b.png& data-rawwidth=&571& data-rawheight=&394& class=&origin_image zh-lightbox-thumb& width=&571& data-original=&https://pic3.zhimg.com/v2-75ab03c6b9bde22a7ad2aa_r.jpg&&&/figure&模型尝试 2：微调 AlexNet（第 2 位）&/p&&figure&&img src=&https://pic2.zhimg.com/v2-db76fb92ad362b92f7412c01_b.png& data-rawwidth=&572& data-rawheight=&396& class=&origin_image zh-lightbox-thumb& width=&572& data-original=&https://pic2.zhimg.com/v2-db76fb92ad362b92f7412c01_r.jpg&&&/figure&&p&模型尝试 3：微调 GoogLeNet（第 1 位）&/p&&figure&&img src=&https://pic4.zhimg.com/v2-1c248957_b.png& data-rawwidth=&571& data-rawheight=&396& class=&origin_image zh-lightbox-thumb& width=&571& data-original=&https://pic4.zhimg.com/v2-1c248957_r.jpg&&&/figure&&p&&strong&结论&/strong&&br&&/p&&p&我们的模型运作非常好，这可能是通过调整一个预训练的网络完成的。很显然，海豚 vs. 海马的例子有一些牵强，数据集也非常的有限——如果我们想拥有一个强大的网络，那我们确实需要更多、更好的数据。但因为我们的目标是去检测神经网络的工具和工作流程，所以这其实是一种很理想的情况，尤其是它不需要昂贵的设备或是花费大量的时间。&br&&/p&&p&综上所述，我希望这些经验能够让那些一直对机器学习望而却步的人摆脱对开始学习的恐惧。在你看到它的作用之后，再决定是否要在学习积极学习和神经网络理论中投入时间要简单很多。现在你已经对它的设置和工作方法都已经有所了解，之后你便可以尝试去做一些分类。你也可以利用 Caffe 和 DIGITS 去做一些其他的事情，例如，在图像中寻找物体，或是进行图像分割。&/p&&p&选自&a href=&http://link.zhihu.com/?target=https%3A//github.com/humphd/have-fun-with-machine-learning/blob/master/README.md& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&GitHub&/a&&strong&机器之心编译&/strong&&/p&
这是一个为没有人工智能背景的程序员提供的机器学习上手指南。使用神经网络不需要博士学位，你也不需要成为实现人工智能下一个突破的人，你只需要使用现有的技术就行了——毕竟我们现在已经实现的东西已经很突破了，而且还非常有用。我认为我们越来越多的人…
&figure&&img src=&https://pic1.zhimg.com/v2-3cfe84e6b6b58c00e9398e_b.jpg& data-rawwidth=&1172& data-rawheight=&757& class=&origin_image zh-lightbox-thumb& width=&1172& data-original=&https://pic1.zhimg.com/v2-3cfe84e6b6b58c00e9398e_r.jpg&&&/figure&&blockquote&&p&本文后续：&a href=&https://www.zhihu.com/question//answer/& class=&internal&&Wasserstein GAN最新进展：从weight clipping到gradient penalty，更加先进的Lipschitz限制手法&/a&&/p&&/blockquote&&p&在GAN的相关研究如火如荼甚至可以说是泛滥的今天，一篇新鲜出炉的arXiv论文《&a href=&http://link.zhihu.com/?target=https%3A//arxiv.org/abs/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&Wasserstein GAN&/a&》却在Reddit的Machine Learning频道火了，连Goodfellow都&a href=&http://link.zhihu.com/?target=https%3A//www.reddit.com/r/MachineLearning/comments/5qxoaz/r__wasserstein_gan/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&在帖子里和大家热烈讨论&/a&，这篇论文究竟有什么了不得的地方呢？ &br&&/p&&p&要知道自从&a href=&http://link.zhihu.com/?target=https%3A//arxiv.org/abs/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&2014年Ian Goodfellow提出&/a&以来，GAN就存在着训练困难、生成器和判别器的loss无法指示训练进程、生成样本缺乏多样性等问题。从那时起，很多论文都在尝试解决，但是效果不尽人意，比如最有名的一个改进&a href=&http://link.zhihu.com/?target=https%3A//arxiv.org/abs/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&DCGAN&/a&依靠的是对判别器和生成器的架构进行实验枚举，最终找到一组比较好的网络架构设置，但是实际上是治标不治本，没有彻底解决问题。而今天的主角Wasserstein GAN（下面简称WGAN）成功地做到了以下爆炸性的几点：&/p&&ul&&li&彻底解决GAN训练不稳定的问题，不再需要小心平衡生成器和判别器的训练程度&/li&&li&基本解决了collapse mode的问题，确保了生成样本的多样性 &br&&/li&&li&训练过程中终于有一个像交叉熵、准确率这样的数值来指示训练的进程，这个数值越小代表GAN训练得越好，代表生成器产生的图像质量越高（如题图所示）&/li&&li&以上一切好处不需要精心设计的网络架构，最简单的多层全连接网络就可以做到&/li&&/ul&&p&那以上好处来自哪里？这就是令人拍案叫绝的部分了——实际上作者整整花了两篇论文，在第一篇《&a href=&http://link.zhihu.com/?target=https%3A//arxiv.org/abs/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&Towards Principled Methods for Training Generative Adversarial Networks&/a&》里面推了一堆公式定理，从理论上分析了原始GAN的问题所在，从而针对性地给出了改进要点；在这第二篇《&a href=&http://link.zhihu.com/?target=https%3A//arxiv.org/abs/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&Wasserstein GAN&/a&》里面，又再从这个改进点出发推了一堆公式定理，最终给出了改进的算法实现流程，&b&而改进后相比原始GAN的算法实现流程却只改了四点&/b&：&/p&&ul&&li&判别器最后一层去掉sigmoid&/li&&li&生成器和判别器的loss不取log&/li&&li&每次更新判别器的参数之后把它们的绝对值截断到不超过一个固定常数c&/li&&li&不要用基于动量的优化算法（包括momentum和Adam），推荐RMSProp，SGD也行&/li&&/ul&&p& 算法截图如下：&figure&&img src=&https://pic1.zhimg.com/v2-6be6e2ef3d15c4b10c2a943e9bf4db70_b.jpg& data-rawwidth=&1169& data-rawheight=&681& class=&origin_image zh-lightbox-thumb& width=&1169& data-original=&https://pic1.zhimg.com/v2-6be6e2ef3d15c4b10c2a943e9bf4db70_r.jpg&&&/figure&&/p&&p& 改动是如此简单，效果却惊人地好，以至于Reddit上不少人在感叹：就这样？没有别的了？太简单了吧！这些反应让我想起了一个颇有年头的鸡汤段子，说是一个工程师在电机外壳上用粉笔划了一条线排除了故障，要价一万美元——画一条线，1美元；知道在哪画线，9999美元。上面这四点改进就是作者Martin Arjovsky划的简简单单四条线，对于工程实现便已足够，但是知道在哪划线，背后却是精巧的数学分析，而这也是本文想要整理的内容。&/p&&p&本文内容分为五个部分：&/p&&ul&&li&原始GAN究竟出了什么问题？（此部分较长）&/li&&li&WGAN之前的一个过渡解决方案 &br&&/li&&li&Wasserstein距离的优越性质&/li&&li&从Wasserstein距离到WGAN&/li&&li&总结&br&&/li&&/ul&&br&&p&&i&理解原文的很多公式定理需要对测度论、拓扑学等数学知识有所掌握，本文会从直观的角度对每一个重要公式进行解读，有时通过一些低维的例子帮助读者理解数学背后的思想，所以不免会失于严谨，如有引喻不当之处，欢迎在评论中指出。 &/i&&/p&&p&&i&以下简称《&a href=&http://link.zhihu.com/?target=https%3A//arxiv.org/abs/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&Wassertein GAN&/a&》为“WGAN本作”，简称《&a href=&http://link.zhihu.com/?target=https%3A//arxiv.org/abs/& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&Towards Principled Methods for Training Generative Adversarial Networks&/a&》为“WGAN前作”。&/i&&/p&&p&&i&WGAN源码实现：&a href=&http://link.zhihu.com/?target=https%3A//github.com/martinarjovsky/WassersteinGAN& class=& wrap external& target=&_blank& rel=&nofollow noreferrer&&martinarjovsky/WassersteinGAN&/a&&/i&&br&&/p&&h2&&u&第一部分：原始GAN究竟出了什么问题？&/u&&/h2&&p&回顾一下，原始GAN中判别器要最小化如下损失函数，尽可能把真实样本分为正例，生成样本分为负例：&/p&&p&&img src=&http://www.zhihu.com/equation?tex=-%5Cmathbb%7BE%7D_%7Bx%5Csim+P_r%7D%5B%5Clog+D%28x%29%5D+-+%5Cmathbb%7BE%7D_%7Bx%5Csim+P_g%7D%5B%5Clog%281-D%28x%29%29%5D& alt=&-\mathbb{E}_{x\sim P_r}[\log D(x)] - \mathbb{E}_{x\sim P_g}[\log(1-D(x))]& eeimg=&1&& （公式1 ）&br&&/p&&p&其中&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&是真实样本分布，&img src=&http://www.zhihu.com/equation?tex=P_g& alt=&P_g& eeimg=&1&&是由生成器产生的样本分布。对于生成器，Goodfellow一开始提出来一个损失函数，后来又提出了一个改进的损失函数，分别是&/p&&p&&img src=&http://www.zhihu.com/equation?tex=%5Cmathbb%7BE%7D_%7Bx%5Csim+P_g%7D%5B%5Clog%281-D%28x%29%29%5D& alt=&\mathbb{E}_{x\sim P_g}[\log(1-D(x))]& eeimg=&1&& （公式2）&/p&&p&&img src=&http://www.zhihu.com/equation?tex=%5Cmathbb%7BE%7D_%7Bx%5Csim+P_g%7D%5B-+%5Clog+D%28x%29%5D& alt=&\mathbb{E}_{x\sim P_g}[- \log D(x)]& eeimg=&1&& （公式3）&/p&&p&后者在WGAN两篇论文中称为“the - log D alternative”或“the - log D trick”。WGAN前作分别分析了这两种形式的原始GAN各自的问题所在，下面分别说明。&/p&&h2&&u&第一种原始GAN形式的问题&/u&&/h2&&p&&b&一句话概括：判别器越好，生成器梯度消失越严重。&/b&WGAN前作从两个角度进行了论证，第一个角度是从生成器的等价损失函数切入的。&br&&/p&&p&首先从公式1可以得到，在生成器G固定参数时最优的判别器D应该是什么。对于一个具体的样本&img src=&http://www.zhihu.com/equation?tex=x& alt=&x& eeimg=&1&&，它可能来自真实分布也可能来自生成分布，它对公式1损失函数的贡献是&/p&&img src=&http://www.zhihu.com/equation?tex=-+P_r%28x%29+%5Clog+D%28x%29+-+P_g%28x%29+%5Clog+%5B1+-+D%28x%29%5D& alt=&- P_r(x) \log D(x) - P_g(x) \log [1 - D(x)]& eeimg=&1&&&p&令其关于&img src=&http://www.zhihu.com/equation?tex=D%28x%29& alt=&D(x)& eeimg=&1&&的导数为0，得&/p&&img src=&http://www.zhihu.com/equation?tex=-+%5Cfrac%7BP_r%28x%29%7D%7BD%28x%29%7D+%2B+%5Cfrac%7BP_g%28x%29%7D%7B1+-+D%28x%29%7D+%3D+0& alt=&- \frac{P_r(x)}{D(x)} + \frac{P_g(x)}{1 - D(x)} = 0& eeimg=&1&&&p&化简得最优判别器为：&br&&/p&&p&&img src=&http://www.zhihu.com/equation?tex=D%5E%2A%28x%29+%3D+%5Cfrac%7BP_r%28x%29%7D%7BP_r%28x%29+%2B+P_g%28x%29%7D& alt=&D^*(x) = \frac{P_r(x)}{P_r(x) + P_g(x)}& eeimg=&1&&（公式4）&/p&&p&这个结果从直观上很容易理解，就是看一个样本&img src=&http://www.zhihu.com/equation?tex=x& alt=&x& eeimg=&1&&来自真实分布和生成分布的可能性的相对比例。如果&img src=&http://www.zhihu.com/equation?tex=P_r%28x%29+%3D+0& alt=&P_r(x) = 0& eeimg=&1&&且&img src=&http://www.zhihu.com/equation?tex=P_g%28x%29+%5Cneq+0& alt=&P_g(x) \neq 0& eeimg=&1&&，最优判别器就应该非常自信地给出概率0；如果&img src=&http://www.zhihu.com/equation?tex=P_r%28x%29+%3D+P_g%28x%29& alt=&P_r(x) = P_g(x)& eeimg=&1&&，说明该样本是真是假的可能性刚好一半一半，此时最优判别器也应该给出概率0.5。&/p&&p&然而GAN训练有一个trick，就是别把判别器训练得太好，否则在实验中生成器会完全学不动（loss降不下去），为了探究背后的原因，我们就可以看看在极端情况——判别器最优时，生成器的损失函数变成什么。给公式2加上一个不依赖于生成器的项，使之变成&/p&&img src=&http://www.zhihu.com/equation?tex=%5Cmathbb%7BE%7D_%7Bx%5Csim+P_r%7D%5B%5Clog+D%28x%29%5D+%2B+%5Cmathbb%7BE%7D_%7Bx%5Csim+P_g%7D%5B%5Clog%281-D%28x%29%29%5D& alt=&\mathbb{E}_{x\sim P_r}[\log D(x)] + \mathbb{E}_{x\sim P_g}[\log(1-D(x))]& eeimg=&1&&&p&注意，最小化这个损失函数等价于最小化公式2，而且它刚好是判别器损失函数的反。代入最优判别器即公式4，再进行简单的变换可以得到&/p&&p&&img src=&http://www.zhihu.com/equation?tex=%5Cmathbb%7BE%7D_%7Bx+%5Csim+P_r%7D+%5Clog+%5Cfrac%7BP_r%28x%29%7D%7B%5Cfrac%7B1%7D%7B2%7D%5BP_r%28x%29+%2B+P_g%28x%29%5D%7D+%2B+%5Cmathbb%7BE%7D_%7Bx+%5Csim+P_g%7D+%5Clog+%5Cfrac%7BP_g%28x%29%7D%7B%5Cfrac%7B1%7D%7B2%7D%5BP_r%28x%29+%2B+P_g%28x%29%5D%7D+-+2%5Clog+2& alt=&\mathbb{E}_{x \sim P_r} \log \frac{P_r(x)}{\frac{1}{2}[P_r(x) + P_g(x)]} + \mathbb{E}_{x \sim P_g} \log \frac{P_g(x)}{\frac{1}{2}[P_r(x) + P_g(x)]} - 2\log 2& eeimg=&1&&（公式5)&/p&&p&变换成这个样子是为了引入Kullback–Leibler divergence（简称KL散度）和Jensen-Shannon divergence（简称JS散度）这两个重要的相似度衡量指标，后面的主角之一Wasserstein距离，就是要来吊打它们两个的。所以接下来介绍这两个重要的配角——KL散度和JS散度：&/p&&p&&img src=&http://www.zhihu.com/equation?tex=KL%28P_1%7C%7CP_2%29+%3D+%5Cmathbb%7BE%7D_%7Bx+%5Csim+P_1%7D+%5Clog+%5Cfrac%7BP_1%7D%7BP_2%7D& alt=&KL(P_1||P_2) = \mathbb{E}_{x \sim P_1} \log \frac{P_1}{P_2}& eeimg=&1&&（公式6）&/p&&p&&img src=&http://www.zhihu.com/equation?tex=JS%28P_1+%7C%7C+P_2%29+%3D+%5Cfrac%7B1%7D%7B2%7DKL%28P_1%7C%7C%5Cfrac%7BP_1+%2B+P_2%7D%7B2%7D%29+%2B+%5Cfrac%7B1%7D%7B2%7DKL%28P_2%7C%7C%5Cfrac%7BP_1+%2B+P_2%7D%7B2%7D%29& alt=&JS(P_1 || P_2) = \frac{1}{2}KL(P_1||\frac{P_1 + P_2}{2}) + \frac{1}{2}KL(P_2||\frac{P_1 + P_2}{2})& eeimg=&1&&（公式7）&/p&&p&于是公式5就可以继续写成&/p&&img src=&http://www.zhihu.com/equation?tex=2JS%28P_r+%7C%7C+P_g%29+-+2%5Clog+2& alt=&2JS(P_r || P_g) - 2\log 2& eeimg=&1&&（公式8）&br&&p&到这里读者可以先喘一口气，看看目前得到了什么结论：&b&根据原始GAN定义的判别器loss，我们可以得到最优判别器的形式；而在最优判别器的下，我们可以把原始GAN定义的生成器loss等价变换为最小化真实分布&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&与生成分布&img src=&http://www.zhihu.com/equation?tex=P_g& alt=&P_g& eeimg=&1&&之间的JS散度。我们越训练判别器，它就越接近最优，最小化生成器的loss也就会越近似于最小化&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&和&img src=&http://www.zhihu.com/equation?tex=P_g& alt=&P_g& eeimg=&1&&之间的JS散度。&/b&&/p&&p&问题就出在这个JS散度上。我们会希望如果两个分布之间越接近它们的JS散度越小，我们通过优化JS散度就能将&img src=&http://www.zhihu.com/equation?tex=P_g& alt=&P_g& eeimg=&1&&“拉向”&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&，最终以假乱真。这个希望在两个分布有所重叠的时候是成立的，但是如果两个分布完全没有重叠的部分，或者它们重叠的部分可忽略（下面解释什么叫可忽略），它们的JS散度是多少呢？&/p&&p&答案是&img src=&http://www.zhihu.com/equation?tex=%5Clog+2& alt=&\log 2& eeimg=&1&&，因为对于任意一个x只有四种可能：&/p&&p&&img src=&http://www.zhihu.com/equation?tex=P_1%28x%29+%3D+0& alt=&P_1(x) = 0& eeimg=&1&&且&img src=&http://www.zhihu.com/equation?tex=P_2%28x%29+%3D+0& alt=&P_2(x) = 0& eeimg=&1&&&/p&&p&&img src=&http://www.zhihu.com/equation?tex=P_1%28x%29+%5Cneq+0& alt=&P_1(x) \neq 0& eeimg=&1&&且&img src=&http://www.zhihu.com/equation?tex=P_2%28x%29+%5Cneq+0& alt=&P_2(x) \neq 0& eeimg=&1&&&br&&/p&&p&&img src=&http://www.zhihu.com/equation?tex=P_1%28x%29+%3D+0& alt=&P_1(x) = 0& eeimg=&1&&且&img src=&http://www.zhihu.com/equation?tex=P_2%28x%29+%5Cneq+0& alt=&P_2(x) \neq 0& eeimg=&1&&&/p&&p&&img src=&http://www.zhihu.com/equation?tex=P_1%28x%29+%5Cneq+0& alt=&P_1(x) \neq 0& eeimg=&1&&且&img src=&http://www.zhihu.com/equation?tex=P_2%28x%29+%3D+0& alt=&P_2(x) = 0& eeimg=&1&&&/p&&p&第一种对计算JS散度无贡献，第二种情况由于重叠部分可忽略所以贡献也为0，第三种情况对公式7右边第一个项的贡献是&img src=&http://www.zhihu.com/equation?tex=%5Clog+%5Cfrac%7BP_2%7D%7B%5Cfrac%7B1%7D%7B2%7D%28P_2+%2B+0%29%7D+%3D+%5Clog+2& alt=&\log \frac{P_2}{\frac{1}{2}(P_2 + 0)} = \log 2& eeimg=&1&&，第四种情况与之类似，所以最终&img src=&http://www.zhihu.com/equation?tex=JS%28P_1%7C%7CP_2%29+%3D+%5Clog+2& alt=&JS(P_1||P_2) = \log 2& eeimg=&1&&。&/p&&p&换句话说，无论&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&跟&img src=&http://www.zhihu.com/equation?tex=P_g%0A& alt=&P_g
& eeimg=&1&&是远在天边，还是近在眼前，只要它们俩没有一点重叠或者重叠部分可忽略，JS散度就固定是常数&img src=&http://www.zhihu.com/equation?tex=%5Clog+2& alt=&\log 2& eeimg=&1&&，&b&而这对于梯度下降方法意味着——梯度为0&/b&！此时对于最优判别器来说，生成器肯定是得不到一丁点梯度信息的；即使对于接近最优的判别器来说，生成器也有很大机会面临梯度消失的问题。 &/p&&p&但是&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&与&img src=&http://www.zhihu.com/equation?tex=P_g& alt=&P_g& eeimg=&1&&不重叠或重叠部分可忽略的可能性有多大？不严谨的答案是：非常大。比较严谨的答案是：&b&当&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&与&img src=&http://www.zhihu.com/equation?tex=P_g& alt=&P_g& eeimg=&1&&的支撑集（support）是高维空间中的低维流形（manifold）时，&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&与&img src=&http://www.zhihu.com/equation?tex=P_g& alt=&P_g& eeimg=&1&&重叠部分测度（measure）为0的概率为1。&/b&&/p&&p&不用被奇怪的术语吓得关掉页面，虽然论文给出的是严格的数学表述，但是直观上其实很容易理解。首先简单介绍一下这几个概念：&/p&&ul&&li&支撑集（support）其实就是函数的非零部分子集，比如ReLU函数的支撑集就是&img src=&http://www.zhihu.com/equation?tex=%280%2C+%2B%5Cinfty%29& alt=&(0, +\infty)& eeimg=&1&&，一个概率分布的支撑集就是所有概率密度非零部分的集合。&/li&&li&流形（manifold）是高维空间中曲线、曲面概念的拓广，我们可以在低维上直观理解这个概念，比如我们说三维空间中的一个曲面是一个二维流形，因为它的本质维度（intrinsic dimension）只有2，一个点在这个二维流形上移动只有两个方向的自由度。同理，三维空间或者二维空间中的一条曲线都是一个一维流形。&/li&&li&测度（measure）是高维空间中长度、面积、体积概念的拓广，可以理解为“超体积”。&/li&&/ul&&p&回过头来看第一句话，“当&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&与&img src=&http://www.zhihu.com/equation?tex=P_g& alt=&P_g& eeimg=&1&&的支撑集是高维空间中的低维流形时”，基本上是成立的。原因是GAN中的生成器一般是从某个低维（比如100维）的随机分布中采样出一个编码向量，再经过一个神经网络生成出一个高维样本（比如64x64的图片就有4096维）。当生成器的参数固定时，生成样本的概率分布虽然是定义在4096维的空间上，但它本身所有可能产生的变化已经被那个100维的随机分布限定了，其本质维度就是100，再考虑到神经网络带来的映射降维，最终可能比100还小，所以生成样本分布的支撑集就在4096维空间中构成一个最多100维的低维流形，“撑不满”整个高维空间。&/p&&p&“撑不满”就会导致真实分布与生成分布难以“碰到面”，这很容易在二维空间中理解：一方面，二维平面中随机取两条曲线，它们之间刚好存在重叠线段的概率为0；另一方面，虽然它们很大可能会存在交叉点，但是相比于两条曲线而言，交叉点比曲线低一个维度，长度（测度）为0，可忽略。三维空间中也是类似的，随机取两个曲面，它们之间最多就是比较有可能存在交叉线，但是交叉线比曲面低一个维度，面积（测度）是0，可忽略。从低维空间拓展到高维空间，就有了如下逻辑：因为一开始生成器随机初始化，所以&img src=&http://www.zhihu.com/equation?tex=P_g& alt=&P_g& eeimg=&1&&几乎不可能与&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&有什么关联，所以它们的支撑集之间的重叠部分要么不存在，要么就比&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&和&img src=&http://www.zhihu.com/equation?tex=P_g& alt=&P_g& eeimg=&1&&的最小维度还要低至少一个维度，故而测度为0。所谓“重叠部分测度为0”，就是上文所言“不重叠或者重叠部分可忽略”的意思。 &/p&&p&我们就得到了WGAN前作中关于生成器梯度消失的第一个论证：&b&在（近似）最优判别器下，最小化生成器的loss等价于最小化&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&与&img src=&http://www.zhihu.com/equation?tex=P_g& alt=&P_g& eeimg=&1&&之间的JS散度，而由于&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&与&img src=&http://www.zhihu.com/equation?tex=P_g& alt=&P_g& eeimg=&1&&几乎不可能有不可忽略的重叠，所以无论它们相距多远JS散度都是常数&img src=&http://www.zhihu.com/equation?tex=%5Clog+2& alt=&\log 2& eeimg=&1&&，最终导致生成器的梯度（近似）为0，梯度消失。&/b&&/p&&p&接着作者写了很多公式定理从第二个角度进行论证，但是背后的思想也可以直观地解释：&/p&&ul&&li&首先，&img src=&http://www.zhihu.com/equation?tex=P_r& alt=&P_r& eeimg=&1&&与&img src=&http://www.zhihu.com/equation?tex=P_g& alt=&P_g& eeimg=&1&&之间几乎不可能有不可忽略的重叠，所以无论它们之间的“缝隙”多狭小，都肯定存在一个最优分割曲面把它们隔开，最多就是在那些可忽略的重叠处隔不开而已。&/li&&li&由于判别器作为一个神经网络可以无限拟合这个分隔曲面，所以存在一个最优判别器，对几乎所有真实样本给出概率1，对几乎所有生成样本给出概率0，而那些隔不开的部分就是难以被最优判别器分类的样本，但是它们的测度为0，可忽略。&/li&&li&最优判别器在真实分布和生成分布的支撑集上给出的概率都是常数（1和0），导致生成器的loss梯度为0，梯度消失。&/li&&/ul&&p&有了这些理论分析，原始GAN不稳定的原因就彻底清楚了：判别器训练得太好，生成器梯度消失，生成器loss降不下去；判别器训练得不好，生成器梯度不准，四处乱跑。只有判别器训练得不好不坏才行，但是这个火候又很难把握，甚至在同一轮训练的前后不同阶段这个火候都可能不一样，所以GAN才那么难训练。&/p&&p&实验辅证如下：&/p&&figure&&img src=&https://pic4.zhimg.com/v2-ae_b.jpg& data-rawwidth=&1008& data-rawheight=&786& class=&origin_image zh-lightbox-thumb& width=&1008& data-original=&https://pic4.zhimg.com/v2-ae_r.jpg&&&/figure&&blockquote&&p&WGAN前作Figure 2。先分别将DCGAN训练1，20，25个epoch，然后固定生成器不动，判别器重新随机初始化从头开始训练，对于第一种形式的生成器loss产生的梯度可以打印出其尺度的变化曲线，可以看到随着判别器的训练，生成器的梯度均迅速衰减。注意y轴是对数坐标轴。&br&&/p&&/blockquote&&h2&&u&第二种原始GAN形式的问题&/u&&/h2&&p&&b&一句话概括：最小化第二种生成器loss函数，会等价于最小化一个不合理的距离衡量，导致两个问题，一是梯度不稳定，二是collapse mode即多样性不足。&/b&WGAN前作又是从两个角度进行了论证，下面只说第一个角度，因为对于第二个角度我难以找到一个直观的解释方式，感兴趣的读者还是去看论文吧（逃）。&/p&&p&如前文所说，Ian Goodfellow提出的“- log D trick”是把生成器loss改成&/p&&p&&img src=&http://www.zhihu.com/equation?tex=%5Cmathbb%7BE%7D_%7Bx%5Csim+P_g%7D%5B-+%5Clog+D%28x%29%5D& alt=&\mathbb{E}_{x\sim P_g}[- \log D(x)]& eeimg=&1&&（公式3） &br&&/p&&p&上文推导已经得到在最优判别器&img src=&http://www.zhihu.com/equation?tex=D%5E%2A& alt=&D^*& eeimg=&1&&下&/p&&p&&img src=&http://www.zhihu.com/equation?tex=%5Cmathbb%7BE%7D_%7Bx%5Csim+P_r%7D%5B%5Clog+D%5E%2A%28x%29%5D+%2B+%5Cmathbb%7BE%7D_%7Bx%5Csim+P_g%7D%5B%5Clog%281-D%5E%2A%28x%29%29%5D+%3D+2JS%28P_r+%7C%7C+P_g%29+-+2%5Clog+2& alt=&\mathbb{E}_{x\sim P_r}[\log D^*(x)] + \mathbb{E}_{x\sim P_g}[\log(1-D^*(x))] = 2JS(P_r || P_g) - 2\log 2& eeimg=&1&&（公式9）&/p&&p&我们可以把KL散度（注意下面是先g后r）变换成含&img src=&http://www.zhihu.com/equation?tex=D%5E%2A& alt=&D^*& eeimg=&1&&的形式：&/p&&p&&img src=&http://www.zhihu.com/equation?tex=%5Cbegin%7Balign%7D%0AKL%28P_g+%7C%7C+P_r%29+%26%3D+%5Cmathbb%7BE%7D_%7Bx+%5Csim+P_g%7D+%5B%5Clog+%5Cfrac%7BP_g%28x%29%7D%7BP_r%28x%29%7D%5D+%5C%5C}

杰西卡魔网络