Identity encoder:采用pretrained face reognition model(Arcface)。有理由相信经过大规模2D人脸训练的识别模型可以提供一个representative identity embeddings
Multi-level Attributes Encoder: Face attributes, such as pose, expression, lighting and background, require more spatial informations than identity. In order to preserve such details, we propose to represent the attributes embedding as multi-level feature maps, instead of compressing it into a single vector as previous methods [5, 29]. 人脸特性比较复杂,用多个特征表示,而不是最终一个特征。人脸属性不需要标注信息,这是自监督的(因为需要产生的Yst和Xt的属性相同)
颜色失配:通过训练数据增强和后期处理,可以显着减少Celeb-DF中合成供体的脸部与原始目标的脸部之间的颜色失配。训练时将输入人脸增加随即颜色扰动,迫使网络学习相同颜色空间的人脸。后处理采用颜色转移算法。Color transfer between images
不正确的面罩:之前的数据集中的面罩要么是矩形的,要么是convex hull of landmarks on eyebrew and lower lip,这样让边界有时候看起来很明显。改进面罩的生成过程,其实就是让关键点范围更大,然后用差值的方法让面罩变得更大。如下图所示。
合成视频的帧间闪烁:用卡尔曼滤波对人脸特征点进行平滑,减少不同帧检测的关键点的误差。
另外,文中3.3节对视频质量进行评估,这之前,没有广泛使用的标准。文中采用face in-painting任务中的评估标准Mask-SSIM分数作为参考定量指标,Mask-SSIM指的是在头部区域(including face and hair)的SSIM分数( SSIM由亮度对比、对比度对比、结构对比三部分组成,用于比较两张图片的相似性)。
DeeperForensics-1.0
|--lists
|--manipulated_videos_distortions_meta
<several meta files of distortion information in the manipulated videos>
|--manipulated_videos_lists
<several lists of the manipulated videos>
|--source_videos_lists
<several lists of the source videos>
|--splits
<train, val, test data splits for your reference>
|--manipulated_videos
<11 folders named by the type of variants for the manipulated videos>
|--source_videos
<100 folders named by the ID of the actors for the source videos, and their subfolders named by the source video information>
|--terms_of_use.pdf