These general semantic segmentation methods mainly focus on multi-scale context modeling, ignoring the special issues in the HSR remote sensing imagery, such as false alarms and foreground-background imbalance. This causes that these methods are lack of explicit modeling for the foreground.
文章提到现有的语义分割方法主要关注多尺度语义建模,忽视了高分辨率遥感图像上前景和背景不平衡导致的误检,所以这篇文章提出了上图的方法
Apart from the pyramidal feature maps vi, a extrabranch is attached on C5 to generate a geospatial scene feature C6 via global context aggregation. For simplicity, we use global average pooling as the aggregation function. C6 is used to model the relation between geospatial scene and foreground
主要思想就是学了一个场景的embedding:u,这个u是通过resnet作为backbone提到的特征 C5(维度H×W×du)全局池化得到的一个du维向量,对于每个位置i(row,col)的前景概率计算:ri=φ(u,~vi)=u和~vi的内积
那么相当于学了一个du维空间里区分前景和背景的基u,如果是前景,则ri在空间中越靠近u这个基所在的方向上,内积响应更大,也就是r(H×W)中更红的位置
下面那一路kwi(vi)只是一个1×1卷积+Relu 重新对vi特征做了一次非线性变换 维度H×W×du到H×W×du。
zi=11+exp(−ri)×kwi(vi) 那么刚刚算的ri∈[0,1]越大(越接近于1)保留的特征越多,ri越小(越接近于0)保留的特征越少,保留的特征范围在[0.5×kwi(vi),ee+1×kwi(vi)]