Semantic segmentation
Deep learning for interferogram segmentation using FCN and ICnet
FCN
Implementation
- Model : VGG16/resnet18 + fcn
- VGG16 FCN layer :
class fcn32(nn.Module): def __init__(self): super().__init__() self.features_map=VGG16(num_classes=2) self.conv=nn.Sequential(nn.Conv2d(512,4096,7), nn.ReLU(inplace=True), nn.Dropout(), nn.Conv2d(4096,4096,1), nn.ReLU(inplace=True), nn.Dropout() ) self.score_fr=nn.Conv2d(4096,21,1) self.upscore=nn.ConvTranspose2d(21,2,64,32) def forward(self,x): x_size=x.size() pool=self.conv(self.features_map(x)) score_fr=self.score_fr(pool) upscore=self.upscore(score_fr) return upscore[:,:,16:(16+x_size[2]),16:(16+x_size[3])]
- Resnet FCN layer :
class fcn(nn.Module): def __init__(self): super().__init__() self.backnone = resnet18() self.conv1 = nn.Conv2d(512,64,3,1,1) self.conv2 = nn.Conv2d(64,2,1,1) self.convtrans1 = nn.ConvTranspose2d(2,2, 16, 8, 4) self.convtrans1.weight.data = bilinear_kernel(2,2, 16) def forward(self, x): x = self.backbone(x) x = self.conv1(x) x = self.conv2(x) x = self.convtrans1(x) return x
- VGG16 FCN layer :
- loss : Cross entropy Loss and asymmertric loss
- optimizer : Adam (lr = 0.001)
- environment : pytorch 1.6
Input data
- 3072 x 512 x 1 interferogram crop fom 3072 x 3072 image randomly
- Data labeled using opencv selectROI
- to make it simple, use bbox to label data
- Then turn bbox data into binary mask (sample and background)
Result
In this project I assume asymmertric loss would perfomed
better since sp and bg pixel considerable disparity
- Use IoU and pixel accuracy to scored the performence
- After training, score for testing image can be up to :
- ASL Loss :
- Mean IoU : 0.764
- Cross entropy loss :
- Mean IoU : 0.801
- Pixel accuracy : 0.995
- ASL Loss :
- Change backbone to resnet18
- Cross entropy loss :
- Mean IoU : 0.832
- Cross entropy loss :
Above image shows the process of pridiction using trained
model. the whole precell cost around 0.158 sec
Reference
- vgg : https://github.com/chongwar/vgg16-pytorch
- ASL : https://github.com/Alibaba-MIIL/ASL
- fcn32 : https://github.com/sairin1202/fcn32-pytorch
ICnet
inorder to build a faster sementic segmentation program, I choose ICnet(image cascade network) to implement.
Model
- Basic ICnet model and loss function from https://github.com/liminn/ICNet-pytorch
- Slightly modified the input size, and parameters of different resolution for loss calculation
Data augumentation
- To enhance the target region, multiple image had been average then subtract by every image
- Use opencv clahe do the histogram equalization
- flip image to enlarge data size
TODO
- Change pyrimid pooling structure to RNN for better sequence image segmentation
- in this project, the removal of pyrimid pooling layer only influnce the time consume to converge
Result
- input data : 3072x3072x1 (around 800 image), then downsize to 1024x1024 for training and testing.
- Data augumentation : flipud, fliplr
- MeanIOU : ~0.82
- Frame rate:
- In HDD : 11 fps
- In SSD : 23 fps
Reference
- ICnet : https://arxiv.org/pdf/1704.08545.pdf
Few tips to improve frame rate
Since most time cost by loading data, following are
few ways to reduce time consume of data loading
-
Data should be storage in SSD or the hard drive that runs pythn. Data in ssd can lead to 2x faster then in HDD
-
Use OPENCV2 instead of PIL
(pretty slow)for image reading and simple transformation (easy to implement while loading test data which has no need to do complex transform) -
set num_workers > 0, since data will preloaded to RAM while GPU training/testing. However if image data were storage in HDD, larger num_worker might take even slower than num_workers=0
-
Use
for data , target in data loader :
, instead of using “iter” function