Description
We tend to skip the step of exploring and analysing the data while working with images. In this talk, we will look at EDA and visualisation techniques for image segmentation tasks by reviewing three different competition I have participated recently on Kaggle. I will also summarise with a guideline for EDA and visualisation for image segmentation challenges.
In computer vision, image segmentation is the process of assigning labels to every pixel of an image. An example use case of this could be identifying some kind of defect from an image where every pixel is labeled ok or not ok. Data for image segmentation typically comes in two parts: the image data and mask to label the class and area of interest. In practice, we label each pixel by building a deep learning model to automatically produce the mask as an output.
For traditional datasets, like text data, understanding the data by doing an exploratory data analysis (EDA) is almost a mandatory step before doing any modelling. However, for image segmentation tasks, this step often goes missing. This is partly because as the input data is literally images, there is less opportunity to be creative. Of course we can print out a few images, and the masks; but this will not give us the whole story. As we often skip EDA and jump right to modelling without understanding the content of the data, we less aware about different training strategies for building our model.
Recently, I have participated in three image segmentation competitions on Kaggle, which were were very interesting on their own. For example, one of the competition challenge was to identify and label pneumothorax, a type of lung inflammation, from chest x rays of the patient. For all three competitions I created notebooks to perform exploratory data analysis (EDA) before creating a model. The analysis from these notebooks brought a lot of useful insights to the Kaggle community, and helped them come up with better strategies to train the model they have built. All three notebooks went on to receive a gold medal on Kaggle and link to these notebooks are attached.
In this talk, I want to walk the audience through the process of doing exploratory data analysis and visualisation for image segmentation tasks by reviewing the three kernels linked below. These kernels uses everyday python libraries like Pandas, Open-CV, Matplotlib, Plotly, and MLxtend. After walking through the examples from the kernel below, I will summarise with a guideline, which others working with image segmentation tasks can use as a starting point.
https://www.kaggle.com/ekhtiar/finding-pneumo-part-1-eda-and-unet https://www.kaggle.com/ekhtiar/defect-area-segments-eda-with-plotly-fp-mining https://www.kaggle.com/ekhtiar/eda-find-me-in-the-clouds