Semantic segmentation of landcover for cropland mapping and area estimation using Machine Learning techniques

ABSTRACT The paper has focussed on the global landcover for the identification of cropland areas. Population growth and rapid industrialization are somehow disturbing the agricultural lands and eventually the food production needed for human survival. Appropriate agricultural land monitoring requires proper management of land resources. The paper has proposed a method for cropland mapping by semantic segmentation of landcover to identify the cropland boundaries and estimate the cropland areas using machine learning techniques. The process has initially applied various filters to identify the features responsible for detecting the land boundaries through the edge detection process. The images are masked or annotated to produce the ground truth for the label identification of croplands, rivers, buildings, and backgrounds. The selected features are transferred to a machine learning model for the semantic segmentation process. The methodology has applied Random Forest, which has compared to two other techniques, Support Vector Machine and Multilayer perceptron, for the semantic segmentation process. Our dataset is composed of satellite images collected from the QGIS application. The paper has derived the conclusion that Random forest has given the best result for segmenting the image into different regions with 99% training accuracy and 90% test accuracy. The results are cross-validated by computing the Mean IoU and kappa coefficient that shows 93% and 69% score value respectively for Random Forest, found maximum among all. The paper has also calculated the area covered under the different segmented regions. Overall, Random Forest has produced promising results for semantic segmentation of landcover for cropland mapping.


INTRODUCTION
Accurate cropland mapping is needed for better food systems and yield prediction. The rise in data and high-resolution imagery like Google satellite have advanced the data-driven agricultural process for better cropland mapping with improved algorithms for areas like remote sensing, machine learning, and computer vision. Emphasis should be given to food production and cultivation to satisfy the ever-increasing demand of the population. This process is preceded by the identification of cultivated and uncultivated land that could be utilized for further production [1,2]. This can be accomplished with fast and reliable cropland mapping [3,4,5] to identify the yield production [6,7] which is required for sustaining the global food security and living standard [8]. Cropland mapping can further be used for crop type mapping [9,10,11] which eventually be useful for yield prediction of crops, agriculture area estimation, monitoring of agricultural practices [12], soil quality monitoring, identification and estimation of crop damage, etc. [8,13,14]. Image segmentation plays an important role in cropland mapping from the land cover. It's a computer vision problem where the appropriate mapping of the croplands needs to be identified to estimate the regions under agriculture farming for proper monitoring and management. The cropland expansion [15] can be performed on the barren and uncultivated areas to balance the need of the rising population. Semantic segmentation is needed for cropland boundary identification [16]. Semantic segmentation is different from the classification process as the classification process classifies the images as a whole to one of the classes, while semantic segmentation classifies every pixel or element to one of the labels. This labeling functionality of the semantic segmentation process is employed for different AI problems. The various areas where semantic segmentation has been applied are agriculture [17,18,19,20], object detection [21], disaster management [22], semantic text segmentation [23], etc. Satellite images help in reflecting the concise view of different regions of landcover which are enormously generating a large amount of data, which could be employed for several other purposes like cropland mapping [24], area estimation of cultivated lands, agriculture monitoring [9], early identification and detection of any damage in the various landcover [3,25], etc.
The paper aims to improve the landcover mapping through semantic segmentation by appropriate mapping of croplands along with the area estimation covered under the cultivated regions. The mapping is carried out with the feature extraction process through different kernels or filters followed by the semantic segmentation with three approaches: Random Forest, SVM, and ANN. The paper has applied Machine Learning algorithms rather than deep Learning because of the small data size and better computational complexity and memory usage policy of Machine Learning algorithms. The area of the croplands thus mapped from the landcover is estimated. For comparison, the three Machine Learning algorithms are tested for the accuracy, Mean IoU, and kappa score evaluated during the semantic segmentation of landcover for cropland mapping. The method has successfully segmented the regions with a promising result irrespective of the texture complexity and pixel similarity of different regions. The paper has been divided into different sections, the Introduction is given in Section 1, while Section 2 details the literature and motivation. Section Semantic segmentation of landcover for cropland mapping and area estimation using Machine Learning techniques

LITERATURE REVIEW
Several classification techniques and algorithms have been employed in the literature for image segmentation and classification like Support vector machine, random forest, deep learning, and many more. Random Forest framework has been applied through the learned representation of filters using a particle swarm optimization algorithm [26] to perform the semantic segmentation of road scene and Indore scene images. Random forest was being used for landcover classification as an object-based image classification technique [27]. An image segmentation model was developed with deep learning based on pixel values for Mudrock, which compared with Random Forest for accuracy [28]. A CNN model is used for semantic segmentation of the ground covering of Worldview-2 satellite images. The results compared with Random forest and SVM algorithms [29]. Class-specific image semantic segmentation has been performed considering textual and color features on CamVid and MSRC-v2 with the application of Semantic Texton Forest framework [30]. Support Vector Machine (SVM) [31] has been applied [21] to develop a homogeneous strategy that will grasp distinct spectral and spatial features using the landcover information. Four machine learning algorithms, such as RF, SVM, Naive Bayes, and k-NN, were compared for object-based analysis and semantic segmentation of land covers applied on the satellite data [32]. The prediction of landslide vulnerable areas was performed through semantic segmentation and Deep Convolutional Network using optical remote sensing images [33]. Land cover mapping has been performed on CORONA archive images through the SVM algorithm [34]. An object-based image analysis classification method was proposed using DCNN and FCN, which were later compared with random forest and SVM for wetland area classification. The difference lies in the number of training samples and computational time [35]. A Multi-level Feature Aggregation Network was proposed to give a fusion of extraction and up-sampling of features to classify the land cover dataset [36]. A method was proposed to partition the remote sensing images from the Kalideos database into a different region of interest by performing object and pixel processing. A multilayer feed-forward neural network (MLFFNN) was employed for the semantic segmentation of images, which compared with SVM and Maximum Likelihood Classification (MLC) [37]. A model was proposed for the segmentation of wetlands through four classifiers such as kNN, random forest, kNN, and decision tree, laters these models are compared with a hybrid model with ANN to find the accuracy of classifications [38]. Semantic segmentation of road surface, embankments, guardrails, ditches, fences, and borders were executed through PointNet and ANN [39]. A semantic segmentation method was proposed that combines the properties of CNN with multiple filters and segmentation with multi-resolution applied on LiDAR data and optical images with high resolution [40].
These algorithms perform pixel-wise semantic segmentation to classify the region of interest. Since the satellite images are complex and difficult to segment because of the similarity in the texture of various areas. Thus, these algorithms are employed to extract the texture, patterns, and orientation from the image to identify the different regions. It has been observed from the literature that there is no such study identified that specifically emphasized the agricultural cropland mapping and raises the level of agriculture production. Our work has performed this agricultural cropland mapping and thereby uses the semantic segmentation results to identify the area under the crop cultivation and the barren lands. The areas identified as barren could be utilized for further production and cultivation to balance the food supply. Moreover, it also helps Semantic segmentation of landcover for cropland mapping and area estimation using Machine Learning techniques to identify the reduction in land under the cultivated land every year, which ultimately helps in environmental sustainability. Machine learning algorithms, such as SVM, Random Forest, and ANN, are better for smallsize data, computational complexity, and memory usage, compared to deep learning algorithms.

DATA COLLE CTION AND PREPARATION
The research is conducted in one of the states of India named Madhya Pradesh, as given in Fig. 1. The dataset consists of google satellite images collected from QGIS 3.18 Zurich application. Overall, 20 google satellite images were collected from the application, where every picture is augmented to create four input images in a total 80 images dataset. These images are annotated, using http://www.apeer.com to produce the mask or ground truth creating four different labels: croplands, rivers, buildings, and uncultivated lands as background. Thus, the final dataset includes 80 input images and their respective 80 annotated images. The dataset is restricted to 80 pictures as the ".tiff" images take more computational time and speed. Fig. 2 shows some of the landcover images considered for the work. The dataset is divided into the 8:2 ratio where 80% of the data is reserved for training, while 20% of the data is considered for the test. Python 3.7 and Nvidia GPU processor has been used for the work along with TensorFlow 2.0 and Keras. The other libraries employed for the experiment are scikit learn and NumPy for the computation and Open CV libraries for handling images.

MATERIALS AN D METHODS
A method has been developed to perform the semantic segmentation of images followed by the area estimation of different regions for cropland mapping. The process started with the image collection from the QGIS application covering all the four labels as shown in the annotated masks. These images are then transferred to the feature extraction process to identify the features needed for the segmentation process. The most important attribute is the pixel value itself which divides the pixels as bright and dark ones where anything above a particular threshold is considered as intense and principal pixel, while low pixel values from 0-40 are considered background pixels. The method has applied seven kinds of filters or kernels for feature extraction: Gabor filter, Scharr filter, Robert Edge filter, Sobel filter, Gaussian filter, Prewitt filter, and Median filter. This will generate different feature values for the image data. All the feature vectors derived from the feature extraction processes are collected in a data frame to form a complete set of feature values of an image. A list of significant features is set up that significantly contributes to the feature space, the rest gets removed from the data frame. The selected attributes are imparted to the machine learning models for the Semantic segmentation process, along with the annotated masks. Our paper has applied three learning models, ANN, SVM, and Random forest, as segmentation techniques. The Artificial Neural Network employed in this work involves one input layer, three hidden layers, and one output layer, which iterates for 50 epochs, 16 batch sizes, and a 0.001 learning rate. All three models are compared for the accuracy measure and validated using MeanIoU and Kappa score that measures the correctness of the segmentation. Finally, the area under the segmented regions gets evaluated for cropland mapping. The entire methodology of the research work has shown in Fig. 3.

Image Feature Extraction as a preprocessing step
The paper has applied seven different types of edge detection and feature extraction filters or kernels on the image set. The filters are compared to find the best feature values, eventually impacting the segmentation of image data. These are:

Gabor filter
Gabor filters are used for the identification of texture and edge features in image preprocessing, such as the visual cortex [41,42]. They are suitable for the segmentation of images based on texture because of their distinguished localization properties covering spatial and frequency areas. They are considered as a bandpass filter that allows a certain band of frequencies while stopping the other set of frequencies. These filters form an impulsive response by combining the Gaussian envelope function with a complex oscillation that eventually minimizes the time and space uncertainty [43]. The paper has applied a bank of 32 localized Gabor filters. A Gabor filter is a blend of Gaussian and Sinusoidal filters. For 2D, Gabor filter is represented as (1): where, a′ = acos(h) + bsin(h) and b′ = −asin(h) + bcos(h).
In the above equation, the sinusoidal factor's wavelength is given by l, h defines the orientation corresponding to a Gabor function, the phase offset by y, s represents the standard deviation concerning Gaussian filters and c denotes the aspect ratio for spatial distribution which defines the ellipticity of the Gabor function. In this context, Gaussian filters give the weights while the Sinusoidal filters give the directionality.

Robert Edge filter
The Robert edge filter is a non-linear filter used for edge detection that applies horizontal and vertical filters in sequence to detect edges, which adds up to give the final result. Its a faster technique due to the small number of calculations and provides shifting from light to dark pixels [44,45]. Robert edge filter proposed two equations to focus on variation in intensity in a diagonal direction, which are given as (2) and (3): where, a is image intial intensity score, c is the derivative evaluated and k, l defines the image location.

Sobel filter
The Sobel edge detectors focus on high spatial frequency for edge detection by approximating an absolute gradient magnitude at every pixel of an image [46,47]. The filter uses the original image convolved with two 3*3 kernels to find the horizontal and vertical changes.

Scharr filter
Scharr filter uses the first derivative to detect the gradient edges through the pixels of an image and defines the gradients independently along the x-axis and y-axis [48,49].

Prewitt filter
Prewitt filters detect horizontal and vertical edges using two different 3*3 kernels. The difference in the pixel intensities of an image calculates the edges. These kernels are convolved with images to calculate the derivatives [50,51].

Gaussian filter
Gaussian filtering is used for blurring images, apart from removing noises and other details of an image. As it is applied for blurring the edges and reducing the contrast, which will help in the edge detection. Gaussian filters allow sigma as the standard deviation, which helps in fluctuating the actions of Gaussian distribution. The paper has applied the Gaussian filters with two sigma values 3 and 7. The larger value of sigma forms greater blurring. The Gaussian nature of the filters is maintained by increasing the kernel size with an increase in sigma value. The value of sigma determines the Gaussian kernel coefficient [52,53]. In 2D, the Gaussian filter is represented, as (4): where, s represents the standard deviation of the gaussian distribution, x is the horizontal deflection from origin, and y is the vertical deflection from origin.

Median filter
The median filters facilitate noise reduction from images. It finds the neighbors of a pixel value, calculates their median, and replaces the pixel value with the estimated value. The filter checks the representation of every pixel value with its surrounding pixel values, set out in numerical order, and replaces the pixel with the identified middle value [54,55].
The feature importances reflected by the different filters are shown with the help of a graph given in Fig.4. The x-axis gives the feature importance score achieved by the different filters while the y-axis defines the different kernels or filters applied. It can be seen from the graph that some of the filters have a bigger contribution, some contribute a little while the others have nil contribution in the identification of features from the image data. Thus, it will help in reducing the dimensionality by removing the filters values with zero impact. The filters which are contributing the most for image processing are shown in Table 1 in decending order means the filters with maximum contribution is at number 1 and the one with lowest contribution is at number 23.

Semantic segmentation of landcover for cropland mapping and area estimation using Machine Learning techniques
The feature importance identifies the features that contribute the maximum for the detection and segmentation of an image. The contributing features are diagnosed by applying different filters vertically and horizontally and with different angles and wavelengths in various convolutions. These filters are kernels of 3*3 or 5*5, where every kernel value is multiplied by the corresponding pixel value of an image, to give the overall importance detected by the filter.

Machine Learning Models
The semantic segmentation of Land covers is performed in this paper with three machine learning techniques. These are Artificial Neural Network, Random Forest, and Support Vector Machine. Random forest is a technique often applied for classification and regression problems, which is competent in handling complex issues by providing the solution with the ensembling of different classifiers together. It is considered a better approach than a decision tree concerning overfitting and precision. The random forest has been applied for semantic segmentation of images in various fields like medical imaging, landcover, plant diseases, geographical factors, etc. Support vector machine produces a hyperplane in multi-dimensional space for classification and regression problems with multiple features as a supervised learning algorithm. Support vector machine is an effective technique for high dimensional space and good for outliers detection. These techniques have been applied for semantic segmentation of images for disease identification, cropland mapping, Landcover identification, satellite images, etc. Artificial neural networks are biologically inspired networks useful for different problems related to classification, overfitting, data preprocessing, and others. Multilayer perceptron with backpropagation is one of the most commonly used ANN algorithms. ANN model has divided into three sections: Input, Output, and Hidden layers. The ANN has been employed several times for the semantic segmentation process.

Metrics for evaluating the models
The metrics used in the paper to evaluate the model are accuracy concerning training and test data, MeanIoU, and Kappa Coefficient.

Accuracy
Accuracy is the metric to identify the amount of correct prediction produced by a model. The accuracy value for multiclass classification ranges between 0 to 1, where 1 stands for maximum accuracy. This value is based on the match percentage between the actual class and the predicted class. The accuracy is explained, by the given formula (5): Number of correct prediction = total number of predictions Accuracy (5)

Mean Intersection-over-Union
Mean Intersection-over-Union is a metric used for identifying the variations in the true and computed semantic segmentation of images. It initially calculates the IoU value for each semantic class, which follows Semantic segmentation of landcover for cropland mapping and area estimation using Machine Learning techniques the computation of average over classes. The predicted and true classes are shown with a confusion matrix weighted by a sample weight that combinedly calculates the MeanIoU value. It is given, by the formula (6): true positive IoU false positive true positive false negative (6) where true_positive are the pixels which are actually correct and also identified as correct pixels, false_ positive are the pixels which are incorrect in actual sense yet identified as correct pixels, while false_ negative are the pixels which are incorrect and also identified as incorrect pixels.

Cohen's kappa Coefficient
Cohen's kappa Coefficient(k) is a quantitative mark used to estimate the accordance between the two evaluators that is used to classify I items into D different categories. It measures the reliability of the two raters rating the same thing, get agreed by chance. It is given as (7): where, P 0 is the observed relative agreement between evaluators, P e is the hypothetical probability of accordance by chance with the use of observed data. The value of k will be one, if there is total agreement between raters and k=0, if there is no agreement and only possible by chance.

EXP ERIMENTATION AND RESULTS
Initially, the image data given in Fig. 2 gets converted to greyscale, which has passed to different filters or kernels for the feature extraction process. The resultant data frame of reduced feature vectors illustrated in Fig. 4 and their annotated masks are provided to the machine learning models: Random Forest, SVM, and ANN. The segmentation result with reduced features set for the three models has given in Table 2. These models are evaluated based on training accuracy, test accuracy, MeanIoU value, and Kappa Coefficient score. A comparison between only the Gabor filter and all the other filters has depicted, in Table 2 despite the high importance of Gaussian filter in feature detection as shown in Fig. 4 because the Gabor filters are capable of providing both weights and directionality as it is a combination of Gaussian and Sinusoidal term where Gaussian provides only the weights and sine components provide the directionality. Therefore, the Gabor filter is found as most fit and suitable for image processing, as it combines both weights and directionality. The Gabor filters work for the detection of texture and edges, whereas Gaussian filters are applied for blurring the images and reducing the noise to detect the edges only.

Semantic segmentation of landcover for cropland mapping and area estimation using Machine Learning techniques
The table has divided into two columns where one column represents the metrics obtained using only Gabor filters for the three models, and the other column represents the metrics obtained using all the other filters including, Gabor with reduced dimensions. It can be seen from the table that the evaluation metrics of the different models under all the filters with reduced dimensionality show much better results than using only Gabor filters. The test accuracy, Mean IoU value, and kappa score of Random Forest are superior to the other two experimented models on the dataset after the feature extraction process. Hence Random Forest can be considered best for semantic segmentation with nearly 90% test accuracy and a maximum MeanIoU value of about 93%.
The evaluated result of the model for semantic segmentation on some of the images has illustrated in Fig. 5. Fig. 5 shows the original image, annotated mask as ground truth, segmented images under Gabor filter, and segmented images under all the filters with reduced dimensionality. Thus, it is observed from the Figure that segmented pictures under all the filters with reduced dimensionality show better and clear segmentation results. These results are nearly equivalent to the ground truth, where blue regions represent the buildings, the yellow region represents the cropland, the red region shows the water, and the blue region represents the background or uncultivated areas. The area covered under the three labeled regions like buildings, cropland, and water is estimated in terms of several pixels in the region as given in Table 3. The table exhibits the labels, corresponding region names, and the area value under the four pictures shown in Fig. 5. The segmentation result of the three models for some of the sample images, compared and shown in Fig. 6. Thus, it is observed from the results that Random Forest derives much better semantic segmentation results compared with the other two models. The result is compared to the other methods also implemented in the literature for semantic segmentation of land covers or other similar datasets, as explained in Table 4. Thus, it is identified from Table 4, that our method also performed well for the semantic segmentation of land covers, irrespective of the limitation of our self-constructed dataset size and collection.    Semantic segmentation of landcover for cropland mapping and area estimation using Machine Learning techniques

CONCLUSION
The increasing demand for food production and supply requires the expansion of cultivated land. This problem needs the identification of barren and uncultivated lands that can be used, for further cultivation and production. Thus the paper has focused on the methodology that could segment the landcover areas collected from the google satellite images into different regions for cropland mapping. It presented a method for semantic segmentation of landcover for cropland mapping, where the landcover encompasses croplands, river, building, and uncultivated lands as background. The paper has applied seven filters or kernels for the feature extraction process. The machine learning models such as ANN, Random Forest, and SVM, with a reduced set of features, are used for semantic segmentation. Random Forest was identified as a better semantic segmentation technique with 99% training and 90% test accuracy. The accuracy is further validated with the MeanIoU value and Kappa score, which are a well-suited metric for semantic segmentation evaluation that produces a confusion matrix with a 93% score value and nearly 69% kappa score for Random Forest, better than the other two techniques. The Random Forest performs substantially better than SVM and ANN. The method has shown promising results in semantic segmentation of landcover for cropland mapping. The research finds difficulty in finding the open-source database of landcover areas with the annotated mask also, working with satellite images is time and space-consuming. Thus, the paper has considered limited data for the work that can be expanded and applied to the large dataset. The future work will try to generate more satellite images data with their corresponding annotated masks that could be employed, for more machine learning and deep learning algorithms.

COMPLIANCE WITH ETHICAL STANDARDS
All the Authors of the paper declares that he/she has no conflict of interest.

DATA AVAILABILITY
The self-constructed data that support the findings of this study are openly available in journal's data repository.

AUTHOR CONTRIBUTION
Lingwal S.L. (surabhi.lingwal@gmail.com) works as the leading contributor of the paper including paper writing, model definition, training and tuning of experiment. Bhatia K.K. (komal_bhatia1@rediffmail.com) designed the framework of the model and contribute for data collection and data processing. Singh M. (mstomer2000@yahoo.com) contribute in model definition and framework designing and performs the result analysis.