ECATI 2020: International Conference on Emerging Applications and Technologies for Industry 4.0 Virtual, Nigeria, August 11-13, 2020 |
Conference website | http://www.eati2020.com |
Abstract registration deadline | June 23, 2020 |
Submission deadline | June 23, 2020 |
Neuro-Fuzzy Based Diagnosis Model For Soybean Diseases.
Ibrahim Rahmon Olutayo Ajayi Monday Eze
Computer Science Department Federal University of Computer Science Department
Babcock University, Agriculture,Abeokuta. Babcock University,
Ilishan-Remo,Nigeria. Nigeria. Ilishan-Remo,Nigeria.
Rahmon.ibrahim@yahoo.com olutayoajayi@gmail.com ezem@babcock.edu.ng
Jonah Joshua
Computer Science Department,
Babcock University,
Ilishan-Remo,Nigeria.
joshuaj@babcock.edu.ng
Abstract: Soybean is a leguminous crop that is highly rich in protein and has a high commercial value. However, it is susceptible to attacks by fungi, bacteria and viruses due to its leafy and bushy nature. Research has shown decline in the production of soybean as a result of attacks by pests and disease infections such as 2-4-d-injury and herbicide injury. Previous studies adopted a single Artificial Intelligence (AI) technique such as fuzzy logic or neural network to classify soybean diseases but without the capacity to determine the severity of the disease. This study developed a diagnosis system for determining severity of soybean disease infections using neuro-fuzzy system.A total dataset of 1,000 soybean was obtained for simulation. Five hundred (500) of this were trained using Back-Propagation Algorithm in MATLAB. Of the remaining 500, two hundred and fifty (250) each were assigned for testing and checking. The results showed three dimensional curve that depicted mapping of two selected input variables, which were generated from the simulation of the proposed model and thus helped to determine the severity of the output of a particular disease. Confusion matrix gave the performance accuracy of 91.75% for herbicide injury disease and 90.32% for 2-4-d injury disease.The study concluded that AI should be used in identifying the severity of soybean disease. It was recommended that this study be adopted by agronomist and agricultural institutes of Nigeria to identify the severity of soybean disease infections using neuro-fuzzy system.
Keywords: Neural Network, Fuzzy system, Artificial Intelligence.
1.1 Introduction
Most existing fuzzy-based expert system for diagnosis of crop diseases only focused on disease classification that is to assign a particular crop to the specific small set of classes of disease. But, the intensity level of those diseases is fairly ignored.
In order to tackle this problem and develop a more robust model, the learning algorithm (Back propagation) of neural network is adopted to compute and construct parameters needed to tune
the membership function appropriately for fuzzy system in determining the intensity level of disease infected soybeans plant.
1.2 Aim and Objectives of the Study
The aim of this study is to develop an interactive neuro-fuzzy based diagnosis system for identifying specific disease and determining the degree of damage (intensity level) perpetrated on a soybean plant. The specific objectives includes to;
Design a neuro-fuzzy based diagnosis of soy beans related diseases
Implement and evaluate the proposed system with performance evaluation metrics.
Scope of the Study
This study is strictly restricted within the scope of using a particular type of ANFIS as one of the types of neural-fuzzy to develop a system that capture specific input parameters as symptom on leaves,leaf halo,leaf spot size, root, area-damaged pod and stem of soybean plant and identify disease-type the plants were infected with, along with the extent of degree of the infection as output.
1.3 Significance of the Study
Obviously, indigenous methods used by farmers in diagnosis of crop diseases are measured in forms of linguistic values and vagueness. The relevance of this work provides the benefits of;
Efficient frameworks for software developers and domain knowledge experts in agriculture in developing robust expert systems that consume the strength of neural network and fuzzy logic together to diagnose soybeans diseases.
A promising decision support system for soybeans farmers for diagnosis of soybeans bacteria diseases and computation of its intensity level.
2.1 Literature Review: Examining the contribution of [4] toward the diagnosis of fish related disease through the aid of intelligent system with embedded 400 rules-bases and graphical user interface for users was a contribution to knowledge. It was a flexible web-based application used in diagnosing fresh water fish diseases. The system could only identify disease-type but did not have capacity in determining the intensity rate of the disease.
The proposed work of [6] equally used rule-base knowledge as a power-engine house for the expert system developed to diagnose attack of pests of honeybees on crops and provide suitable treatments. It was a Boolean logic approach method and lack capacity to provide comprehensive and specific treatment for various degrees of attack. Pests and diseases affecting bountiful yields of tomato production were seriously tackled by integral intelligent system developed by [5]. The rule-base system was implemented to prevent, diagnose and control possible attack of tomato crop with pests and diseases. It was more useful for farmers who could read and access internet. But, the system was too wordy and difficult for farmers who were illiterates.
[2] focused their work on diagnosis of cassava plant diseases. They proposed the development of fuzzy expert system for predicting cassava plant diseases. Matlab version 9 was used as a fuzzy tool to develop the system. They employed 18 rules for cassava mossaic, 27 rules for the cassava brown streak and 27 rules for cassava bacteria blight. These rules were used for the classification and prediction of cassava plant diseases.
[1] developed a fuzzy logic system for diagnosing various types of chilli plant disease. The architecture of fuzzy logic system was also illustrated. The knowledge base system holds the symptoms for chilli disease. Matlab was used for the rules viewer, rule editor, membership function editor, fuzzy inference system editor and fuzzy modeling
Research Methodology:
3.1 Dataset Description
In this study, more than one thousand (1000) records of diseased soybean plants were collected and the assistance of agriculture experts was employed in the domain of crop planting for proper interpretation of the dataset
The dataset consists of six categorical nominal attributes with-order and two classes of disease. There are 1,300 dataset with 270 incomplete records. The six attributes which includes; area-damaged; leaves, leaf-halo, leaf-spot-size, root and stem are arranged in first six columns while rows consist of thousands of soya bean plants. The last column is the corresponding two classes of disease, one disease for a row. The two classes of disease are 2-4-d-injury and Herbicide injury.
3.2 Input Variables
In any dataset, the input variables are the most important parameters, which are subjected to investigation by farmers into the system in order to form basis for disease diagnosis. The description of each attribute (input variable) is presented in Table 3.1
Table 3.1: Description of Attributes of Dataset for soya bean
1
The root
A part that collects nutrients from the soil to the other parts
2
Leaf-halo
Light colour on a leaf
Number
Attributes
Description
3
The leaves
The greenish parts of the plant where
photosynthesis take place.
4
Stem
Above-ground stalk that serves as a channel through which nutrients pass to leaves and other organs
5
Leaf-Spot-Size
Indicated spot on the leaf
6
Area – damaged pod
The area being infected on a seed case of soybean plant
Six (6) attributes as stated in Table 3.1 are used as input parameters for the symptoms. In the context of this work, the input parameters are referred to as linguistic variables. The linguistic values (fuzzy set) adopted for the design are stated in Table 3.2.
Table 3.2: Description of Fuzzy-set adopted for the attributes in the original dataset.
Attributes
Linguistic Values
Area-Damaged-pod
Scattered, low area, upper area, whole field
Leaves
Normal, abnormal
Leaf-halo
Absent, yellow, No yellow
Leaf-spot-size
, ,
Root
Normal, Rotten, Gall – Cysts
Stem
Normal, Abnormal
For example, a damage on the soybean pod could either scattered all over the pod or low area of the pod, or upper area or the whole field. The leaves of a soybean plant could either be normal or abnormal. The dataset in Appendix A is in form of nominal-with-order dataset as shown in Table 3.3.
Table 3.3: Interpretation of Linguistic values in form of digits.
Ordered nominal values
0
1
2
3
Area-damaged-pod
Scattered (0)
Low area (1)
Upper area (2)
Whole field (3)
Leaves
Normal (0)
Abnormal (1)
Leaf-Halo
Absent (0)
Yellow (1)
No Yellow (2)
Leafspot-size
(0)
(1)
(2)
Root
Normal (0)
Rotten (1)
Gall – Cyst (2)
Stem
Normal (0)
Abnormal (1)
For example, the first record in the dataset in Appendix A as presented below is interpreted as follows:
area.dam
Leaves
leaf.halo
leafspot.size
Roots
Stem
Class
0
0
0
2
0
1
1
The area_damaged_pod is scattered with indication of “0”, the leaves are normal with indication of “0”, the leaf _halo is absent with indication of “0”, the leafspot size has the value of “2”, the roots are normal with indication of “0”, the stem is abnormal with indication of “1” and the class of disease is 2-4-d-injury with indication of “1” but with the indication of “2” the class of disease will be herbicide injury.
IF area_damaged_pod = 0 AND leaves = 0 AND the leaf _halo = 0 AND leafspot size = 2 AND roots = 0 AND stem = 1 THEN class of disease = 1
This implies that the soybean plant has 2-4-d-injury disease.
Description of the Modeling Tool
The proposed Neuro-Fuzzy system was developed using Adaptive Neuro Fuzzy Inference System box of technical programing language known as MATLAB as presented in Fig. 3.1. The use of MATLAB guarantees result accuracy and still remain the best tool for system training and testing within short time (Maryam &Laya, 2016).In this work, the following steps are involved in modeling with ANFIS editor in MATLAB with respect to soybean disease classification
Step 1: The collection of symptoms from various soybean plants as input and target output in pairs as indicated in Appendix A are allocated for training and testing.
Step 2: The dataset was saved in MS-Excel file format and imported into the workspace of MATLAB by using unimport command
Step 3: To display ANFIS editor dialogue box, anfisedit command was typed in the MATLAB command area.
Step 4: In the ANFIS editor environment, by clicking “Load Data” command button, the data is loaded from the specified dataset for training and testing, and also to be plotted on the plot region.
Step 5: To view the structure and model of the proposed system based on the input and output, the “Generate FIS” command and structure button are clicked respectively.
Step 6: Under “Train FIS” section group, one can select effective in-built algorithm that integrates back-propagation and least square method known as hybrid method. In addition, the training epoch's number and error tolerance will be chosen.
Step 7: By clicking “Train now” button, FIS model will be trained while the membership function parameters will also be adjusted and the training data error will be plotted in the region.
Step 8: Under the “Test FIS” section group, Test button will be clicked to validate the trained FIS.
Fig. 3.1: ANFIS Editor Environment Source:Matlab,(2008).
3.4. ANFIS Parameter-Settings for the Proposed System
ANFIS hybridizes the learning capacity of neural network with if-then-rules of fuzzy logic to learn and design the most fitted membership function for a given set of data and thereby map the inputs with appropriate output. The functionality of if-then-rules is based on fuzzy inference system called Takagi-Sugeno and normally consists of five layers.
In this work, Layer-one consists of seventeen (17) nodes and accepts the linguistic variable symptoms (area-dam, leaves, leaf-halo, leafspotsize, root and stem) as input parameters. All the nodes in this layer are adaptive nodes that produce membership grade of the inputs to layer two, expressed as follow;
Zj1 = μKj (c), j = 1, 2 ------------------------------- (1)
Zj1 = μLj-2(d), j = 3, 4 ------------------------------- (2)
Where c and d are the linguistic variables as input into node j, where K and L represent linguistic labels or values as presented in Table 3.3.μKjμLj-2are the membership functions that measure the degree of the intensity of linguistic variables c and d. Layer-one is the fuzzification layer.
Layer-two has fixed nodes in which the incoming signals from Layer-one are multiplied and the product generated. The output of every node indicates the strength and firing level of the rule. The operation of this layer is mathematically given below
Zj2 = wj = μKj (c) μLj(d) j = 1, 2 -------------------------- (3)
Layer-three: In the third layer, its output is a normalized capacity, based on the computation of ratio of each jth rule to the computation of sum of all other rules' strengths.
Zj3 = wj = pj/(pj + p2)j = 1, 2 -------------------------------- (4)
The fourth-layer performs its operation by multiplying the output of layer three (firing strength) with the polynomial of first order. The mathematical expression is presented in equation (5) below.
Zj4 = pjfj = pj(pjc + qjd +rj) j = 1, 2 ----------------------- (5)Layer-five is the last fixed node, which produces the output bysumming all incoming signals as presented below.
Zj5 = ∑jpjfj = (∑jpjfj)/(∑jpj) i = 1, 2 --------------------- (6)
a1
a2
a3
a4
Layer 1
Ʃ
W1F1
W2F2
W3F3
W3F3
W11
W22
W33
W44
W1
W2
W3
W4
b1
b2
c1
c2
C3
d1
d2
d3
e1
e2
e3
f1
f2
Layer 4
Layer 3
Layer 2
Area damaged Spot
Leaves
Leaf halo
Leaf spot size
Root
Stem
Layer 5
In this work, the following parameters are carefully selected with assigned values.
Proportions of dataset used for the simulation
A total of 1000 soybean dataset was obtained for simulation in this study. The dataset contains the first six (6) columns for input variables and last column for target output. The dataset was divided into three workspaces: training, testing and checking.
Five hundred (500) data were assigned for training, Two hundred and fifty (250) records were selected from the dataset for testing and Two hundred and fifty (250) recordswereallocated for checking.
Partitioning of Data Space
Grid partition was used as one of the parameter to divide the data space into regular sub spaces. The justification for this option is simply because of the few membership functions contained in the dataset and in order to have less simulation time during training and testing of the dataset.
Optimization Method
The hybrid optimization method which combines back propagation and least square algorithm together in order to estimate both the premise and consequent parameters formed in layer 1 and 4 sequentially was adopted.
In the forward pass, the consequent parameter P in equation (6) could be expressed as
A = XP, where
A = a column vector that has output of numerical values as presented in appendix C
X = a row of training vector as indicated in Appendix C
P = is the consequent parameter to be computed by using equation (7)
Where
the row vector of
= the element of
P* = least square estimate.
3.5. Proposed Neuro-Fuzzy Model
The diagram in Figure 3.3 below consists of five stages: input stage, fuzzification, rule base, inference engine, and defuzzification. The first stage allows the crisp inputs such as manifested symptoms on soybean plants which includearea damagedspot, leaves, leaf halo, leaf spot size, root, and stem to be passed into the fuzzification stage of second step for the membership function type (Gaussian) to compute the degree of fuzzification.
The adoption of back-propagation technique is to subject the inference engine for training and tune it to select most appropriate rule from the rule-base. Back propagation algorithm was used to effectively train the inference engine for the appropriate selection of rule base. The defuzzifier then converts the linguistic output generated by neural network to crisp output for classification
Figure 3.3: Conceptual Diagram of Neuro-Fuzzy Model for Soyabean Disease Diagnosis (Proposed Model).
DATA ANALYSIS,RESULTS AND DISCUSSION OF FINDINGS.
The proportions of data extracted from Soybean Large Dataset (SLD) as described in chapter three was simulated for results acquisition. The interpretation of the results obtained and evaluation performances of the model using confusion matrixare presented in this chapter.
4.1 Assumptions
According to the agriculture experts, the data stored in the Soybean Large Dataset (SLD) are obtainedon the assumptions of stable climatic conditions. The Institute of Agricultural Research and Training, ObafemiAwolowo University, Moore Plantation Ibadan, recorded visible symptoms of 2-4-d-injury and herbicide-injury diseases on soybean plants.
4.2 Output of the Membership Function Plots of the Input Variable
The Neural Network of the adaptive neuro-fuzzy inference system (ANFIS) triggers hybridization algorithm (backpropagation and least square methods) as described in equation 7 of chapter three, to tune membership functions at the fuzzification stage. The membership function plots for fuzzy set for the six input variables; Area damaged, Leaves, Leaf_halo, Leaf Spotsize, Root and Stem are presented in Figure 4.1,4.2,4.3.4.4,4.5and 4.6 respectively. The shape of the fuzzy mapping depends on the linguistic values of each input variable in the dataset.
The four fuzzy sets are displayed on the membership function graph of the Input variable (Area-Damaged) are scattered, low-area, upper-area and whole field as shown in Figure 4.1
Figure 4.1: Membership function plots for Area-Damaged
From Figure 4.2, two fuzzy sets which include normal and abnormal are displayed on the plots as linguistic values of the input variable called Leaves.
Figure 4.2: Membership function plots for Leaves
The graphical membership function for the linguistic values or fuzzy sets adopted for Leaf_halo variable are absent, yellow and No-yellow as presented in Figure 4.3
Figure 4.3: Membership function plots for Leaf-halo
The graphical membership function for the linguistic values or fuzzy sets adopted for Leaf-spot-size variable are high, higher and highest as presented in Figure 4.4
Figure 4.4: Membership function plots for Leaf-Spotsize
The shape of the fuzzy mapping for root as one of the input variables and its fuzzy sets; normal, rooten and gallcyst are displayed on the graph of membership function plots as shown in Figure 4.5.
Figure4.5: Membership function plots for Root
From Figure 4.6, normal and abnormal are displayed as linguistic values on the graph of membership function plots of the input variable.
Figure 4.6: Membership function plots for Stem
4.3 Results of the Trained and Tested Dataset
Figure 4.7 depicts five hundred (500) data- pairs selected and loaded for training in order for the model to recognize the shape of the dataset.
Figure 4.7: Training- Data loaded with six input variables
After the loaded training-data was simulated as shown in Figure 4.8, the following results are generated:
Number of nodes: 907
Number of linear parameters: 432
Number of nonlinear parameters: 34
Total number of parameters: 466
Number of training data pairs: 500
Number of fuzzy rules: 432
Training Error of 5.5258e-07 as shown in Figure 4.8, is equal to 0.00000055258 is being generated after 3 epochs.
Figure 4.8: Trained-Data and Computation of Training Error
Two hundred and fifty 250 dataset are loaded on the workspace as testing-dataset as shown in Figure 4.9. The loaded Testing-Dataset was used to verify the accuracy of the model with Trained-Dataset as presented in Figure 4.10, an average of Testing-Error 0.12884 was generated by the model.
Figure 4.9: Testing-Dataset loaded with six input variables
Figure 4.10: Computation of Average Testing Error
4.4 Interpretation of the fuzzy Inference System (FIS) Results
After simulation, the rule viewer diagram presented in In Figure 4.11 explains the interactions between the first six columns that consist of the six input variables, one variable per column, using IF rule and the last plot which is the seventh column that correspond to the THEN part of each rule in the system. The computation of the aggregates weighted value of the seventh column which is the output, highly depends on the input values of the six columns. The Input variables (Area damaged, Leaves, Leaf_halo, Leaf Spotsize, Root and Stem) and their corresponding values are displayed at the top of the columns. Also, the aggregate of the defuzzified linguistic values for each record in the dataset is presented as a single output at the top of the last column. The rule editor of IF THEN expression of the simulated modelis depicted in Figure 4.12 having four hundred and forty two 432 fuzzy rules.
Figure 4.11: Rule Viewer of the Proposed Model
Figure 4.12: Rule Editor of the Proposed Model
Out of one thousand 1000 soyabeans dataset obtained, two hundred and fifty 250 datasets are used as checking-data against another two hundred and fifty dataset from the original dataset, in order to validate the accuracy of the model.
4.5 The Results of the Surface Viewer Plot
The three dimensional curve that depicts mapping of two selected input variablesto generate one output was presented for each two-input one output case from Figure 4.13 through to Figure 4.24.The purpose of the mapping is to determine the severity level or the output of a particular disease when the values of two different input variables change.
It is obvious from the surface view of each mapping that the severity of either herbicide-injury or 2-4-d injury disease increases when the values of any two input variables increase. It clearly shows that both exponential and logistic growth exist in the mapping. In Figure 4,13, the exponential growth only occurs when the values of Area-Damaged of soybean crop is two (2) which represents upper area of the crop or three (3) representing whole-field then the severity of the disease is high. Among the six input variables that determine the highest severity level of the disease Area-Damaged and Roots are mostly significant than other variables.
The Results of the Surface Viewer Plot
The three dimensional curve that depicts mapping of two selected input variables to generate one output was presented for each two-input one output case from Figure 4.1 through to Figure 4.5.The purpose of the mapping is to determine the severity level or the output of a particular disease when the values of two different input variables change.
It is obvious from the surface view of each mapping that the severity of either herbicide-injury or 2-4-d injury disease increases when the values of any two input variables increase. It clearly shows that both exponential and logistic growth exist in the mapping. In Figure 4,1, the exponential growth only occurs when the values of Area-Damaged of soybean crop is two (2) which represents upper area of the crop or three (3) representing whole-field then the severity of the disease is high.
Figure 4.1: Surface viewer plot of severity of Leaves and Area-Damaged
Figure 4.2: Surface viewer plot of severity of Leaf-Halo and Area-Damaged
Figure 4.3: Surface viewer plot of severity of Leaf-Spotsize and Area-Damaged
Figure 4.4: Surface viewer plot of severity of Root and Area-Damaged
Figure 4.5:Surface viewer plot of severity of Stem and Area-Damaged
Summary and Conclusion
In the diagnosis of soybean diseases, the use of expert system without the consolidated artificial intelligent techniques such as fuzzy logic and neural network, would result to inefficient system that lack capacity to tackle some unambiguous and vagueness inherent in the soybean diseases.
This work has successfully developed a model that combined the knowledge-base and reasoning features of fuzzy logic (FL), with self-learning capacity of artificial neural network (ANN) to classify soybean diseases into two classes; herbicide injury and 2-4-d injury diseases, on the basis of their symptoms. The simulation of the proposed model also revealed the graphical nature of exponential growth that exists among the combination of two different symptoms and their corresponding output. An improved accuracy performance of the proposed model is achieved as a result of the integrity of the dataset obtained and the choice of artificial intelligence technique (Adaptive Neuro-Fuzzy Inference System) adopted for the work.
References
[1] Amosa,B.,Ateko,B., Ugwu,J.,&Adegoke, M. (2018). Fuzzy logic Expert System for the diagnosis of Chilli Diseases. IOSR Journal of Computer Engineering 20(6),52-64.
[2] Awoyelu,I.O & Adebisi,R.O.(2015) A Predictive Fuzzy Expert System for Diagnosis of Cassava Plant Diseases. Global Journal of Science Frontier Research: C Biological Science, 15(6),20-25.
[3] Dugje, I.Y., Omoigui, L.O., Ekeleme, F.,Bandyopadhyay, R., Lava-Kuma, P.& Kamara, A. Y. (2009). Farmers’ guide to soybean production in northern Nigeria. IITA, Ibadan, Nigeria
[4] Li, D., Fu, Z. &Duan, Y. (2002). Fish-Expert: A web-based expert system for fish disease diagnosis.Expert systems with applications, 23(3), 311-320.
[5] López-Morales, V., López-Ortega, O., Ramos-Fernández, J.& Muñoz, M. (2008). JAPIEST: An integral intelligent system for the diagnosis and control of tomatoes diseases and pests in hydroponic greenhouses.Expert systems with applications, 35(4),1506-1512.
[6] Mahaman, B.D., Passam, H.C., Sideridis, A.B., &Yialouris, C. P. (2003). DIARES-IPM: A diagnostic advisory rule-based expert system for integrated pest management in solanaceous crop systems. Agricultural Systems, 76(3),1119-1135.
[7] Swanson, A. &Rajalahti, A. (2010). An overview of agricultural mechanization and its environmental management in Nigeria.Agricultural engineering International, the CIGRE journal, 9(6), 6-18.
[8] Yialouris, C.P. &Sideridis, A.B. (2010). An expert system for tomato diseases. Computers and electronics in agriculture, 14(1), 61-76.
N