论文笔记 - Noisy Channel Language Model Prompting for Few-Shot Text Classification

Direct && Noise Channel

进一步把语言模型推理的模式分为了：

channel

直观来看：

Direct 模式

Noise Channel 模式

也就是说把数据和标签调换了位置。

公式推导

Direct：

$$y_{test}=argmax\;P(y_{test}|\theta,c,x_{test})\;\;\;c=context$$

Noise Channel：

$$y_{test}=argmax\;P(y)P(x_{test}|\theta,c',y)\;\;\;c'=context_{reversed}$$

注意这个式子 y 也就是条件是变化的，反而是 $x_test$ 固定了。

为什么 Noise Channel 性能更好呢：

不太严谨的理解：由于 $y$ 维度比较小（一般也就几个类别）容易受到 distribution shift 的影响，x （由于维度比较大）稍微的变化会造成 output 的 distribution shift（比方说比提供的 prompt 全是 positive 的，你预测一个 $x_test$ 也会有 positive 的 bias）。但是把 x 作为输出，x 的 distribution shift 对 x 出现的概率影响不大，也就是 robust 变强了。

论文 笔记 - Noisy Channel Language Model Prompting for Few-Shot Text Classification的相关教程结束。

《论文笔记 - Noisy Channel Language Model Prompting for Few-Shot Text Classification.doc》

下载本文的Word格式文档，以方便收藏与打印。

论文笔记 - Noisy Channel Language Model Prompting for Few-Shot Text Classification

Direct && Noise Channel

公式推导

Direct：

Noise Channel：

为什么 Noise Channel 性能更好呢：

论文 笔记 - Noisy Channel Language Model Prompting for Few-Shot Text Classification的相关教程结束。

相关推荐

Golang channel如何应用

Go通道channel怎么通过通信共享内存

Programming abstractions in C阅读笔记:p91-p106

论文解读（AAD）《Knowledge distillation for BERT unsupervised domain adaptation》

论文解读（DWL）《Dynamic Weighted Learning for Unsupervised Domain Adaptation》

论文解读（APCA）《Adaptive prototype and consistency alignment for semi-supervised domain adaptation》

【Python笔记】第一章Python基本语法

Django笔记三十三之缓存操作