Semantic segmentation

Deep learning for interferogram segmentation using FCN and ICnet

FCN

Implementation

Model : VGG16/resnet18 + fcn

VGG16 FCN layer :

  class fcn32(nn.Module):
      def __init__(self):
          super().__init__()
          self.features_map=VGG16(num_classes=2)
          self.conv=nn.Sequential(nn.Conv2d(512,4096,7),
                          nn.ReLU(inplace=True),
                          nn.Dropout(),
                          nn.Conv2d(4096,4096,1),
                          nn.ReLU(inplace=True),
                          nn.Dropout()
                          )
          self.score_fr=nn.Conv2d(4096,21,1) 
          self.upscore=nn.ConvTranspose2d(21,2,64,32)
      def forward(self,x):
          x_size=x.size()
          pool=self.conv(self.features_map(x))
          score_fr=self.score_fr(pool)
          upscore=self.upscore(score_fr)
          return upscore[:,:,16:(16+x_size[2]),16:(16+x_size[3])]

Resnet FCN layer :

  class fcn(nn.Module):
      def __init__(self):
          super().__init__()
          self.backnone = resnet18()
          self.conv1 = nn.Conv2d(512,64,3,1,1)
          self.conv2 = nn.Conv2d(64,2,1,1)
          self.convtrans1 = nn.ConvTranspose2d(2,2, 16, 8, 4)
          self.convtrans1.weight.data = bilinear_kernel(2,2, 16)
    
  def forward(self, x):
      x = self.backbone(x)
      x = self.conv1(x)
      x = self.conv2(x)
      x = self.convtrans1(x)
      return x

loss : Cross entropy Loss and asymmertric loss
optimizer : Adam (lr = 0.001)
environment : pytorch 1.6

Input data

3072 x 512 x 1 interferogram crop fom 3072 x 3072 image randomly
Data labeled using opencv selectROI
- to make it simple, use bbox to label data
Then turn bbox data into binary mask (sample and background)

Result

In this project I assume asymmertric loss would perfomed 
better since sp and bg pixel considerable disparity 

Use IoU and pixel accuracy to scored the performence
After training, score for testing image can be up to :
- ASL Loss :
  - Mean IoU : 0.764
- Cross entropy loss :
  - Mean IoU : 0.801
  - Pixel accuracy : 0.995
Change backbone to resnet18
- Cross entropy loss :
  - Mean IoU : 0.832

Above image shows the process of pridiction using trained 
model. the whole precell cost around 0.158 sec

Reference

vgg : https://github.com/chongwar/vgg16-pytorch
ASL : https://github.com/Alibaba-MIIL/ASL
fcn32 : https://github.com/sairin1202/fcn32-pytorch

ICnet

inorder to build a faster sementic segmentation program, I choose ICnet(image cascade network) to implement.

Model

Basic ICnet model and loss function from https://github.com/liminn/ICNet-pytorch
Slightly modified the input size, and parameters of different resolution for loss calculation

Data augumentation

To enhance the target region, multiple image had been average then subtract by every image
Use opencv clahe do the histogram equalization
flip image to enlarge data size

TODO

Change pyrimid pooling structure to RNN for better sequence image segmentation
- in this project, the removal of pyrimid pooling layer only influnce the time consume to converge

Result

input data : 3072x3072x1 (around 800 image), then downsize to 1024x1024 for training and testing.
Data augumentation : flipud, fliplr
MeanIOU : ~0.82
Frame rate:
- In HDD : 11 fps
- In SSD : 23 fps

Reference

ICnet : https://arxiv.org/pdf/1704.08545.pdf

Few tips to improve frame rate

Since most time cost by loading data, following are 
few ways to reduce time consume of data loading

Data should be storage in SSD or the hard drive that runs pythn. Data in ssd can lead to 2x faster then in HDD
Use OPENCV2 instead of PIL ~~(pretty slow)~~ for image reading and simple transformation (easy to implement while loading test data which has no need to do complex transform)
set num_workers > 0, since data will preloaded to RAM while GPU training/testing. However if image data were storage in HDD, larger num_worker might take even slower than num_workers=0
Use for data , target in data loader : , instead of using “iter” function