Application of Machine Learning in the Validation of the RIDS Software—A Tool for Assessing the Maturity Levels of Smallholder Farmer Organizations

Hanningtone Simiyu; Joseph Tanui

doi:10.24018/ejmath.2025.6.3.390

Research Article

Hanningtone Simiyu

Jomo Kenyatta University of Agriculture and Technology (JKUAT), Kenya

* Corresponding author

Joseph Tanui

World Agroforestry Centre (ICRAF) UN Avenue, Kenya

$DOI icon$ 10.24018/ejmath.2025.6.3.390

Read Counter
1162

Downloads
62

Citations

Share

Submitted 2025-01-20
Published 2025-06-22

Read counter = 1162 times

Abstract

Smallholder farmers in sub-Saharan Africa face a range of challenges, including limited access to global markets, difficulty in coping with swiftly revolutionizing value chains and distribution channels, and varying and distinct consumer preferences. Farmers have therefore increasingly opted to participate in collective action through Smallholder Farmer Organizations (SFOs). However, the methodology for evaluating the effectiveness of these SFOs remains challenging. This paper presents and validates the Rural Institutions Diagnostics Software (RIDS) - an automation of a participatory methodology for evaluating SFOs. Further, it develops an Artificial Neural Network (ANN) model and employs it not only in predicting the maturity levels of SFOs based on their internal governance, management, capacity, resilience and leadership structures but also determines whether the methodology in RIDS is reproducible by the model. Data collected through stratified random sampling from Kenya, Uganda and Tanzania, with 268 patterns of input-output vectors was used. The best performance based on the cross-entropy error was achieved with 94% patterns classified correctly from testing and 97% from validation. The findings validate the RIDS’ tool’s potential for use in evaluating SFOs’ maturity, hence it recommends it for application by governments, researchers, development partners and other relevant practitioners in the agricultural sector.

Keywords: Artificial neural networks classification model capacity needs assessment grassroots rural institutions maturity levels

Introduction

Smallholder farmers in sub-Saharan Africa pervasively suffer from poverty and food insecurity due to a range of factors, including diminishing crop yields, land degradation, inadequate infrastructure, and a lack of institutional and governmental support [1]. Farmers are further weakened by a lack of capacity, bargaining power and inadequate solutions to their challenges [2]. Despite these challenges, agricultural production, predominantly managed by smallholder farmers, remains the economic backbone of most countries in sub-Saharan Africa. Substantial investments have been undertaken by governments in the region in the agricultural sector, particularly in crop production, yet this has done little to address these challenges [3]. With forecasts predicting a continuation of these trends, against a background of an ever-increasing population density, sub-Saharan countries will continue to be at risk of severe food shortages.

In order to address these challenges, farmers are in need of capacity development interventions that are well-targeted so as to derive value out of existing investments made, to strengthen their institutional culture and ensure their viability [4]. Conventional capacity development activities that engage smallholder farmers in technical capacity and skill development, however, often overlook the knowledge, skills and social infrastructure that are necessary for them to integrate agro-ecological techniques into their farming systems. Smallholder farmers also tend to lack adequate market awareness regarding their produce, thereby subjecting them to market exploitation by way of poor market prices for their products. It is for this reason that many farmers elect to participate in groups through collective action as a means of improving their livelihoods, confidence and bargaining power.

Collective action is fostered through the strengthening of Smallholder Farmer Organizations (SFOs), which through voluntary membership offers farmers services that help them with their farming operations - including negotiating product prices, gathering market data, acquiring access to farm inputs, services, and loans, offering technical support, and processing and selling agricultural goods. In spite of the existing evidence on the effectiveness and the role of collective action in fostering rural development among smallholder farmers, empirical evidence on the efficiency of the approach remains limited [5]. Furthermore, the majority of the literature on collective action analyzes the performance of SFOs qualitatively [6], [7]. Assessing the performance of SFOs internally using various indicators and determinants of their performance and delivery, not just output resources and activities, has also proved challenging. This is due to the complex nature of the arrangements and the diversity both within SFOs and the environment in which they operate. Limited knowledge also exists on how the collective action can be fostered, supported and sustained. This poses a challenge in capacity needs assessment and building, and subsequently the strengthening of SFOs.

Against this background, the World Agroforestry Centre (ICRAF) initiated a project aimed at designing and testing a framework for Strengthening Rural Institutions (SRI). The project identified the role for robustly performing smallholder farmer systems to address the key development challenges of increasing food productivity, sustainable production systems, poverty alleviation and capacity enhancement in sub-Saharan Africa. The project thus set out with a view to strengthen SFOs. The strengthening of the SFOs was endeavored through a model centered on rural institutional capacity building, enterprise and platform development. The Capacity Needs Assessment component of the model involved evaluating existing gaps within the SFOs in terms of social infrastructure, knowledge, skills, strengths, weaknesses, opportunities, threats, assets and other necessary elements required for the organizations to achieve their pre-specified objectives. The governance, leadership and management structures of different SFOs as well as their resilience and capacity development processes were evaluated and classified into three levels of maturity. In view of the challenges that exist in the evaluation of such institutions’ performance as presented in this study, a participatory methodology was developed and administered to participating SFOs. The methodology was automated into software (The Rural Institutions Diagnostics Software- RIDS) and is suitable for application in: Identification of the maturity levels of SFOs, Identification of existing gaps within SFOs and for Baseline surveys of SFOs. One of the fundamental requirements of such a new tool developed is validation, which requires that studies be conducted to demonstrate that the process consistently conforms to the requirements and a comparison be made between its outputs and the results from the experiments. Hence, the objective of this paper is twofold. First, we provide empirical evidence on the accuracy of the software in line with the methodology. Secondly, we develop and apply ANN classification models to understand the relationship between the factors identified to affect SFOs and their effectiveness in service delivery. This is based on the three classes of maturity developed. We further determine whether the methodology in the automated tool is mathematically reproducible by the model. The study was thus guided by the following research questions:

1. What is the degree of accuracy of the RIDS software in relation to the methodology used in the survey?

2. How can the maturity levels of SFOs be modelled using Artificial Neural Networks (ANN)?

3. Is the RIDS software reproducible by the ANN model?

Background

SFO’s Maturity

Several definitions of SFO maturity have been fronted. According to Barham and Chitemi [8], the maturity of a SFO is assessed based on the age of the institution. It is notable, however, that the age of a SFO is not solely and necessarily a determinant of performance. While old SFOs may perform well, more recently formed institutions may equally perform well. Likewise, neither does age represent the capacity of individual members of the SFO, nor does it address measures of fatigue or enthusiasm within an institution to achieve its objectives. Therefore, it is difficult to establish a relationship between SFO maturity and its age [9]. Farmer institutions’ maturity has also been defined as the effectiveness of the organization to sustain collective action, a measure not influenced by age [10]. In this study, the later definition is adopted and SFO maturity further defined in terms of its internal structural functionality. To establish a scale of measure to determine the level of maturity, the study adopted three levels of maturity, namely Novice, Intermediate and Mature categories as defined in the methodology.

Assessing SFO’s Maturity

Several variables have been used in assessing the performance of SFOs. The most widely used variable is the organization’s governance structure. Formal rules, legal personality, participatory decision-making, democratic control of the organization by members, and clear and consistent rules to establish norms of behavior by officials and members, have all been used as indicators of governance [5], [11], [12]. Group composition in terms of size of farmer group membership, gender parity, proportion of women and youth in membership, is also considered to have a significant influence on the performance of SFOs [5], [6], [13], [14]. The existence of boundary partners with external linkage and support to the SFO, including government ministries and extension staff, other farmer organizations, local political authority, non-governmental organizations, religious organizations, agricultural research institutes, private sector, agribusinesses and other donor agencies also affect the performance of SFOs [5], [7], [15]. In addition to these indicators, the internal management and structure of a SFO is also important in determining its performance. Ragasa [5] identified soft skill attributes including management training, internal interactions among members and family influence in decision-making as key management elements that would affect farmer group performance. Other studies, however, have found no distinction between an institution’s governance structure and its management system, and subsequently tend to combine the two variables. Table I provides a summary of factors that have been used in assessing the performance of SFOs across a range of studies.

Table I. Variables Used in Assessing SFOS and Models Applied
Variable	Indicators	Model/Empirical analysis
Governance	• Formal rules, • legal personality, • participatory decision-making, • democratic control of the organization by members, • clear and consistent rules to establish norms of behavior by officials and members	• Ordinary least squares, • Regression analysis, • Qualitative analysis, • Structural Equation Models (SEM)	[5], [12], [16]
Group composition	• Size of farmer group membership, • gender parity, • proportion of women and youth in membership	• Linear regression, • Generalized Linear and Latent Models	[5], [6], [13], [17]
External linkage and support	Links with staff or agricultural monitors from the Ministry of Agriculture; other farmer organizations; local political authority; non-Governmental organizations; religious organizations; agricultural research institute; private companies and agribusinesses and other donor agencies	Qualitative analysis	[5], [7], [15].
Leadership	• Gender proportion in leadership, • education, • age, • trust	• Descriptive statistics, • Likert scale	[9], [18].
Participation	• Participatory leadership, • member involvement in group activities	• Multiple linear regression analysis, • descriptive statistics	[19]–[21]
Management	• Management training, • internal interactions among members and management, • family influence in decision-making	Principal component analysis (PCA)	[5]

It is worth noting that not all indicators are relevant to all research sites in which the SFOs are based. For instance, the effect of gender on a gender specific SFO may be significantly different from that on a mixed gender SFO. The effect of age on leadership will differ between a youth SFO and a mixed-age SFO. The implication of such complexities comes in two folds. First, it calls for the need to involve SFO members and their respective stakeholders in the identification of indicators to be used in evaluation, taking into account the uniqueness of the environment within which the organizations operate and the nature of the organizations. Secondly, there is a need to generate scores for the chosen indicators according to their degree of relevance and importance to the organization and in the research site as well. It is for this reason that the methodology adopted in this tool was participatory.

To guide and accommodate the diverse indicators of performance chosen by SFOs, broad categories of indicators namely: governance, management, leadership, capacity development and resilience were developed under which the indicators of performance chosen by farmer groups would fall.

Empirically, one of the most widely used techniques in the evaluation of SFOs performance is the linear regression model [22]. The Structural equations model (SEM) has also been applied extensively in evaluating SFOs due to the correlated nature of variables affecting such organizations [5]. It was noted, however, that one of the key assumptions in which linear regression models are founded is that the independent variables’ errors are independently and identically normally distributed, which is rarely the case with the variables affecting SFOs. The model also requires the existence of a dependent variable measured independently from the independent variables and that the dependent variable be continuous or, if not, at least close to continuous. For our case, however, the methodology involves the assessment of SFOs and classifying them into three categories. Thus, the outcome is strictly a categorical variable hence a linear regression model would be less effective. Though there are other regression models that could be used in classification problems, such as the logistic model, where the different classes are treated as the dependent variable, these are regression models like the SEM model, which still require the existence of an independent set of classes being treated as the dependent variable.

To the contrary:

• It is the presumed “independent variables” that are directly used in determining the SFOs categories in the proposed approach.

• The methodology purely aimed at measuring and classifying as opposed to regressing.

Other techniques also exist that are normally used in grouping observations into various categories. This includes cluster analysis, discriminant analysis and the use of the normal/density curve. Cluster analysis is suitable in grouping individuals/observations according to similarities and dissimilarities in the variables, which are multivariate in nature. In the assessment of SFO maturity, the intent is to classify farmer groups on the basis of their total scores from the variables used. This implies that the classification is not just on the basis of similarity or dissimilarities in characteristics, but the values of the scores computed. A normal curve was subsequently adopted and cutoffs taken from both sides of the mean. One of the main advantages of using the normal curve for classification is its ability to shift position according to the prevailing scores of the time.

The Rural Institutions Diagnostics Software

Through the RIDS software, SFO members and the relevant stakeholders jointly choose and rank indicators of performance that are unique to the research site, according to their degree of relevance and significance. This automatically generates an electronic questionnaire that is based on the chosen indicators to be employed specifically on SFOs in that site to ensure local relevance. The ranking also automatically generates scores for the indicators based on their magnitude and frequencies. This happens by the researcher first validating the data to ensure that each indicator is properly ranked, and for each of the ranks, frequencies are calculated. A product of the frequencies with the equivalent rank is then computed and the scores from the product are aggregated to give a total score for each criterion.

The standard Multi-Criteria Analysis (MCA) index for each criterion is then determined by standardizing the total scores obtained. The MCA indices for individual groups are then determined for each Criterion, and the total MCA index per principle is obtained by aggregating the MCA indices of all the criteria. Finally, the three Maturity levels, being: Beginner, Intermediate and Mature, are defined using a normal curve with the cut-off points calculated as illustrated in Table II, defining the range for each category.

Table II. SFO Maturity Levels
Maturity level	Range
Novice	MCA index ≤ A
Intermediate	A < MCA index ≤ C
Mature	C < MCA index

Where A and C are cut-off points given by

(1)

\begin{array}{rcl} A = μ - δ z \end{array}

(2)

\begin{array}{rcl} C = μ + δ z \end{array}

where

$z$ is standard score at $θ %$ confidence interval which indicates how many standard deviations an MCA index is above or below the mean for $90 \leq θ \leq 100$ .

$A$ is the lower cut-off point at $θ %$ level of significance.

$C$ is the upper cut-off point at $θ %$ level of significance.

$μ$ is the mean while $δ$ is standard deviation. Other than classifying the SFOs into the three maturity levels, the tool provides descriptive statistics and also performs analysis of variance based on the scores generated with a view of depicting whether significant differences exist between different categories/groups of SFOs.

Materials and Methods

Study Site and Survey

This study used data from 268 SFOs selected through stratified random sampling from the study sites of Embu and Bungoma in Kenya, Kapchorwa and Masindi in Uganda, and Pemba and Lushoto in Tanzania as illustrated in Fig. 1. Summarized details of the SFOs sampled in each site are presented in Table III.

Table III. Distribution of SFOS in the Survey
Country	Kenya		Uganda		Tanzania
Site	Bungoma	Embu	Kapchorwa	Masindi	Lushoto	Pemba
No. of SFOs	60	42	79	19	27	41
Total	102		98		68

The Artificial Neural Networks Model

ANN (Artificial Neural Networks) has been applied in solving a wide array of problems in the agricultural sector. [23] applied ANN in analyzing agro-economic factors determining the adoption of Rice-Fish Farming in the north of Iran. ANNs have also been widely applied in the classification of agricultural companies [24]. An ANN model differs from other conventional regression models in that it is predicated on the notion that it emulates the way a biological brain interprets data. The following three categories apply to its applications: predicting, categorizing, and identifying statistical patterns.

For data sets having non-linear relationships, an ANN model represents a promising modeling technique [25]. The number of hidden nodes is a crucial factor in developing an ANN model since it determines the model’s complexity [26]. The following procedures were used in this study’s ANN modeling process:

1. As indicated in Table IV, a database comprising 268 SFOs and their ranks was initially constructed and arbitrarily split into three groups, with 50% going toward training, 25% toward testing, and the remaining 25% toward validation.

2. The fundamental structure and architecture of the ANN model was then created, utilizing an input layer, an output layer, and the quantity of hidden layers..

3. The ANN model was then trained and tested to provide the optimal structure, including determining the optimal number of hidden nodes and iterations.

4. A comparison between the statistical accuracy of the computations made during the training, testing, and validation phases followed.

5. The last step was to determine whether the statistical precision from the training, testing, and validation sets were comparable. If not, the procedure was repeated, beginning with the third step. Otherwise, it was thought that an ANN model with a suitable structure for the intended model had been produced.

Table IV. Distribution of Data in Modeling
Task	Training	Testing	Validation
Proportion of SFOs	50%	25%	25%

ANN Structure Set-UP

In ANN modelling, the most important step is establishing the network architecture [27]. An ideal ANN design is often determined by determining the number of layers and selecting the number of nodes in each layer, albeit there is no integrated theory for this process [25]. For this study, there were five numeric inputs and one categorical value (Novice, intermediate, mature) to classify. A 1-of-3 encoding of the categorical data was thus done as illustrated in Table V.

Table V. Encoding of the Categorical Data
Governance	Management	Leadership	Capacity	Resilience	Novice	Intermediate	Mature
15.93353	22.01312	25.08677	28.59652	10.19393	0	1	0
10.89659	17.54728	14.00623	14.8031	5.140315	0	1	0
48.60435	46.35176	46.47462	46.17649	23.13422	0	0	1
22.95129	19.76522	11.204	14.55767	1.875407	1	0	0
19.24072	20.64598	10.90635	16.24883	5.09569	0	1	0

With this output encoding, the neural network’s output layer would have three neurons. The cross-entropy error function was used as a measure of error as opposed to the normal sum of squared deviations. This is due to the binary coding adopted of which the normal SSE would give false and opposite figures. The cross-entropy error is given by (3):

(3)

\begin{array}{rcl} E^{c} = - \sum_{α = 1}^{134} \sum_{k = 1}^{3} t_{α k} \ln ({\tilde{t}}_{α k}) \end{array}

where 134 is the number of observations in the training data, $t_{α k}$ is the target value for the $k^{t h}$ class of $α^{t h}$ observation (either 1 or 0), and ${\tilde{t}}_{α k}$ is the network’s $k^{t h}$ output for the $α^{t h}$ pattern. ${\tilde{t}}_{α k}$ is equal to the neural network’s estimate of the probability that the $α^{t h}$ pattern is in the $k^{t h}$ class.

Another conundrum in the ANN structure is determining how many hidden layers are necessary for neural networks to identify complicated relationships and capture nonlinear patterns in the data. Fortunately, previous studies have demonstrated that, with adequate connection weights, one hidden layer can approximate any continuous function [28].

As identified in the background of this study, numerous studies have used many variables in evaluating SFOs. In the current study, the variables have been categorized into five broad classes while introducing the SFO’s resilience as well. Five input variables and three categories/classes of farmer groups were thus listed as shown in (4):

(4)

\begin{array}{rcl} Rank = {ANN}_{5 - NH - 3} {\begin{cases} X_{1} = Governance \\ X_{2} = Management \\ X_{3} = Leadership \\ X_{4} = Capacity building \\ X_{5} = Resilience \end{cases}} \end{array}

The 5-NH-3 label thus denotes that there are 5 inputs and 3 output nodes, where NH indicates the number of hidden nodes which must be determined.

The relation in (5) was used to transform all the input variables to values between 0 and 1:

(5)

\begin{array}{rcl} x_{i α}^{*} = \frac{x_{i α} - min (x_{i})}{max (x_{i}) - min (x_{i})} \end{array}

where $i = 1, 2, \dots, 5$ and $α = 1, 2, \dots, 134$ .

This ensures faster training by preventing larger figure from overriding smaller ones [27].

The input to the $j^{th}$ hidden node is then obtained using (6):

(6)

\begin{array}{rcl} x_{j} = w_{j 0} + \sum_{i = 1}^{5} w_{j i} x_{i α}^{*} \end{array}

where:

$j = 1, 2, \dots, 5$ (the number of hidden nodes) while $w$ denotes the weights connecting the $i^{t h}$ input to the $j^{t h}$ hidden node.

The bipolar activation function also known as the logistic activation function is applied to adjust the data within the network to between 0 and 1, so that:

(7)

\begin{array}{rcl} y_{j} = Φ_{h} (x_{j}) = \frac{1}{1 + e^{(- β x_{j})}} \end{array}

where $y_{j}$ - Output of the $j^{t h}$ hidden node for $j = 1, 2, \dots, 5$ $β$ - Learning rate $x_{j} = w_{j 0} + \sum_{i = 1}^{5} w_{j i} x_{i α}^{*}$ , the input to the $j^{t h}$ hidden node and $w_{j i}$ for $j = 1, 2, \dots, 5$ are the weights connecting the $i^{t h}$ input node and the $j^{t h}$ hidden node.

This is because the unipolar activation function is very practical with Bernoulli or binomial group of data in which case our data does not belong.

The output $y_{j}$ from the hidden node then passes a signal to the output node $(k)$ . The net input to the $k^{t h}$ output node would thus be:

(8)

\begin{array}{rcl} z_{k} = θ_{k 0} + \sum_{i = 1}^{5} θ_{k j} y_{j} \end{array}

where $θ_{j}, j = 1, 2, \dots, 5$ are the weights connecting the $j^{t h}$ hidden node to the output node, $k = 1, 2, 5$ and a bias to the output node for $j = 0$ .

Depending on the number of classes in the classification problem, there are two primary approaches to solving classification problems with multilayer feed-forward neural networks. A neural network with a single logistic output can be used to solve the classification issue if it is binary. The likelihood that the input data falls into one of the two classes is estimated by this output. However, a different strategy is used when more than two classes are involved. Assigning logistic output to every class in the classification issue is one of the most widely used strategies. The class linked to the output perceptron with the highest probability for each input pattern is assigned by the network. However, since the sum of the individual class probabilities for each input is less than one, which is necessary for any acceptable multivariate probability distribution, this method generates erroneous probabilities. The network’s outputs were subjected to the Softmax activation function in order to circumvent this issue and guarantee that they met the mathematical specifications of multivariate classification probabilities. The incoming signal from the output node $(z_{k})$ is thus transformed using the Softmax activation function to scale the output $(y_{k})$ :

(9)

\begin{array}{rcl} y_{k} = γ (z_{k}) = \frac{e^{z} k}{\sum_{k = 1}^{3} e^{z} k} \end{array}

The Softmax activation function ensures that the all outputs conform to the following requirements for multivariate probabilities:

(10)

\begin{array}{l} (i) \to 0 \leq y_{k} \leq 1, for all k = 1, 2, 3 \\ and \\ (ii) \to \sum_{k = 1}^{3} y_{k} = 1 \end{array}

A pattern is thus assigned to the $k^{t h}$ class when $y_{k}$ is the largest among all the 3 classes.

Training of the Network

When training faced forward networks, the SSE is typically utilized. With this approach, the weights are changed to minimize the SSE between the output goal $y = (y_{1}, \dots, y_{134})$ and the targets.

The SSE is defined as in (11):

(11)

\begin{aligned} s^{2} (x_{i α}; β) & = \sum {(y_{i} - γ (x_{i α}; β))}^{2} \\ = \sum {(y_{i} - γ (x_{i α}; w, θ))}^{2} \\ = \sum {(y_{i} - γ (θ_{0} + \sum_{j = 1}^{H} θ_{j} Φ_{h} (w_{j 0} + \sum_{i = 1}^{5} w_{j i} x_{i α}^{*})))}^{2} \end{aligned}

There are various methods of minimizing this function, namely:

1. Back Propagation

2. Quasi-Newton Method

3. Simulated Annealing Method

The back propagation method is suitable for the unipolar activation function. Since the bipolar activation function was used, the Quasi Newton method was adopted in minimizing the SSE.

Determining the Optimal Number of Hidden Nodes

Several guidelines have been developed by researchers for approximately determining the required numbers of hidden nodes. These include:

1. Taking the number of hidden nodes to be 0.75 of the number of input units [29]. This is the methodology used in the research. Starting with the first guess, the iterative technique was used to determine the number of hidden nodes in the hidden layer. When the model’s efficiency did not significantly increase, the training process was terminated, and the model’s generalization capabilities were examined.

2. Setting the number of hidden nodes between the average and the sum of the input and output nodes [30].

3. Fixing an upper limit and working backwards. The upper limit of the number of hidden nodes in a single layer network may be taken as $2 Z + 1$ , where $Z$ Z is the number of variables in the input layer [31], [32].

Results and Discussion

Once a process that consistently conforms to the requirements has been developed, its validation requires that a comparison be made between its outputs and the results from the experiments. information it purports to capture. Validation in the context of scientific research design and investigation refers to determining whether or not a study can scientifically address the questions it is meant to address. According to Taylor [33], validation is the documentation of evidence that a process consistently conforms to requirements and this requires that once such a process is obtained, studies are conducted to demonstrate that this is the case.

Among the many statistical tools available for validation are control charts, capability studies, designed experiments, analysis of means, failure mechanisms and effects, and mistake proofing. However, the choice of a validation tool depends on the type of instrument to be validated and the researcher’s taste. The validation in this study was done in three phases:

1. Analysis of Means (ANOM): In the validation of a new process, the determination of whether a process or tool is reproducible by another process is vital [33]. This is achieved by carrying out an analysis of means between outputs of the two instruments to determine whether significant differences exist between the instruments. The instruments to be analyzed simultaneously include the new tool developed and the one thought to perform the same process. In this study, an analysis of means between the neural network model developed and the RIDS software was undertaken. This helped determine if the methodology in the RIDS software was reproducible by the model.

2. Accuracy of the Automated Tool (RIDS): This section documents evidence on the accuracy of the automated tool (RIDS) in regards to:

a) Classification Accuracy – verification of the classification rate of the software.

b) ANOVA outputs in which validation was also necessary.

3. Simple Failure Mode and Effects Analysis (FMEA): This is the identification of potential failures in the automated tool.

The ANN Model

To determine the optimal values for varying numbers of hidden nodes and learning rates for the back propagation algorithm, many trials were carried out throughout the training phase of the modeling process. Regarding the number of iterations for different learning rates and the number of hidden nodes, the optimal cross entropy error parameter was $6.1 \times 10^{- 05}$ . Initially, the learning rate was adjusted while maintaining a fixed number of hidden nodes. Next, while maintaining a constant learning rate, the number of hidden nodes was changed. The optimal learning rate and number of hidden nodes were found to be $0.01$ and $10 N H N$ respectively. Since the scores obtained were used to categorize a SFO into either a Novice, Intermediate or Mature group, the model had three output nodes representing the three categories of SFOs each with a bias $b_{k}$ for $k = 1, 2, 3$ . An R visual of the model developed is given by Fig. 2.

The results from training, testing and validation were compared to verify consistency in the model developed and are summarized in Table VI. In the training, a $100 %$ classification rate is in most cases expected since the same data segment that is used to train the model is ran repeatedly with the achieved weights. This is evident by the fact that all 134 SFOs were accurately classified in training, the majorities of which were intermediates (99). In ANN testing 3 Novice groups and 1 mature group were misclassified whereas 1 Intermediate and 1 mature group were misclassified in validation.

Table VI. Confusion Matrix on Results from ANN Training, Testing and Validation
		ANN training			ANN testing			ANN validation
		N	I	M	N	I	M	N	I	M
Original output	N	13	0	0	2	0	0	11	1	1
	I	0	99	0	3	51	1	0	49	0
	M	0	0	22	0	0	10	0	0	6

The accuracy, the recall or true rates and the precision were used to gauge the model. The accuracy, which is the proportion of the total number of predictions that were accurately classified by the model, was obtained by dividing the sum of the main diagonal by the entire total of elements in that segment. Results indicate that the accuracy in testing was 94.03% and 97.06% in the validation of the ANN model with an average accuracy of 95.56%. The performance measures from training, testing and validation of the model are summarized in Table VII.

Table VII. Confusion Matrix on ANN Performance Measures in Training, Testing and Validation
		ANN testing			ANN validation
		Accuracy	True rate	Precision	Accuracy	True rate	Precision
Expected output	N	94.03%	1.0000	40%	97.06%	0.8462	100%
	I		0.9273	100%		1.0000	98%
	M		1.0000	90.9%		1.0000	85.71%
		ANN overall results			ANN overall performance
		B	I	M	Accuracy	TR	Precision
Expected output	N	13	1	1	95.56%	0.8667	81.25%
	I	3	100	1		0.9615	99.00%
	M	0	0	16		1.0000	88.88%

The true rates column, also known as the recall narrows down to individual SFOs categories and is the proportion of each class of SFOs that was correctly classified. The lowest true rate for the model was 0.8462, still depicting a good predictive model.

The ANN Model and the RIDS Software

The neural network model developed was applied in the validation of the RIDS software through an analysis of means of the two outputs. Fig. 3 illustrates an ANOM chart for the RIDS and ANN model produced at alpha = 0.05. The central line denotes the total average. The instruments’ means, plotted as deviations from the overall average, are compared with upper decision limits (UDL-1.0383) and lower decision limits (LDL-0.9044) to identify which are significantly different from the overall mean (in this case, the means corresponding to the RIDS and ANN models, respectively). As shown in Fig. 3, there was no significant difference between the two instruments; hence, the RIDS instrument is reproducible mathematically by the model. This adds weight to the existence of a logical mathematical relationship between the Indicators of performance and the maturity levels.

Accuracy of the RIDS Software

In the case of a predictive model such as the ANN model in this study, deviations are obviously expected and thus the model developer strives to minimize the differences between the predictions and the measured values. Other tools are designed to strictly reproduce a particular output, e.g. when a manual process is translated to an electronic process or is computerized with the steps that were being conducted manually adopted. In such a case, the accuracy of the automated tool needs to be exactly 100%, i.e. a replication of the manually generated results. This implies that even a slight deviation from the original values is erroneous. It is therefore important that in any validation process, one understands the nature of the device, model or tool and incorporates this in the validation.

Classification Accuracy

A confusion matrix was used to compare the output generated by the RIDS software with the manual output generated from the survey. The results are summarized in Table VIII. The comparison shows that indeed the RIDS software conforms to the methodological requirements. The accuracy of the tool was 100%, hence a perfect replica and automation of the manual process.

Table VIII. Confusion Matrix on Performance Measures of the Automated Rids Software
		RIDS output			Performance measures
		N	I	M	Accuracy	TR	Precision
Expected Output	N	28	0	0	100%	1	100%
	I	0	202	0	100%	1	100%
	M	0	0	38	100%	1	100%

Accuracy in ANOVA and PCA Output

The RIDS software is also programmed to perform an analysis of variance with a view of establishing whether significant differences exist between different categories of SFOs, maturity levels, countries, regions and research sites. Table IX gives results for analysis of variance between the three countries involved in the survey. The default alpha value in the RIDS software is 0.05. It is evident that significant differences existed between the performance of farmer groups in Kenya and Tanzania, as well as Tanzania and Uganda. However, there was no significant difference for farmer groups in Kenya and Uganda. Through the means and the mean differences, it is also apparent that Tanzanian farmer groups performed better (19.0885) followed by farmer groups in Uganda (13.5762), with those in Kenya being the least performers (11.4498). The ANOVA outputs were incorporated in the Software so as to aid a great deal in depicting differences in performance between SFOs. The software allows for a selection of what categories the ANOVA would be based on.

Table IX. Rids Anova Output
Source	df	SS	MS	F	Pr > F
Between groups	2	1127.84	563.92	13.68	0.00
Within groups	107	4410.06	41.26
Total	109	553.90	605.14
Group	Mean difference	Lower bound	Upper bound	P
Kenya-Tanzania	−7.6387	−10.7125	−4.5649	0.0000
Kenya-Uganda	−2.1264	−5.2002	0.9474	0.1731
Tanzania-Uganda	5.5123	2.6665	8.3581	0.0002

The RIDS tool allows for analysis of variance to be done according to maturity level, country, region and the individual research sites all of which were verified. The RIDS ANOVA outputs for all the 12 possible variable-filter combinations were verified to ensure their accuracy using the R/SPSS software as summarized in Table X.

Table X. SUmmary of the Validity of the Various Anova Outputs in Rids
		Variable
		Maturity	Country	Region	Sites
Filter	Maturity		✓	✓	✓
	Country	✓			✓
	Region	✓	✓		✓
	Site	✓

Failure Mode and Effects Analysis (FMEA)

Failure mode and effects analysis is a systematically structured way to detect and address potential failures and resultant effects of a system [34]. The methodology is more practiced in the medical field due to the potential consequences that could result from the failure of a system. For a newly formed process, it identifies potential bottlenecks or unintended consequences of a system prior to its implementation. Unlike Root Cause Analysis (RCA), FMEA is proactive rather than reactive as it asks “How would the system fail” rather than “Why did the system fail”.

An FMEA was conducted on the RIDS software to identify the potential failures in the system and their causes and the possible control measures in the application of the tool. The results are summarized in Table XI.

Table XI. SImple Fmea For The Rids Software
Section	Failure mode	Failure causes	Failure effects	Actions to reduce the odds of failure
Reports	PCA	A group failing to respond to two consecutive criteria hence rendering a zero score for the two criteria.	Lack of results for principal component analysis	The researcher should ensure all questionnaires are duly filled and also ensure uniformity of the criteria per research site.
Reports	ANOVA	When there is not enough data for one or more categories selected	Lack of results for analysis of variance	The researcher should have a substantive number of subjects/questionnaire.Other failure in this section such as when maturity levels are selected as variables may not be controlled as this classification depends on overall site performance.

Though the FMEA was not as pronounced as that conducted on medical instruments where the odds of occurrence of failures and effects are usually necessary, it aided in identifying the potential areas of failure as summarized in Table XI. It was noted, however, that the failures are mainly instigated by poor structures of data being run through the system. Researchers applying the system should therefore take into account the actions stated above to reduce the odds of occurrence of these failures.

Maturity Levels and Variable Scores

The RIDS methodology implies that, on average, a SFO at a higher level of maturity will have a higher score in terms of governance, management, leadership, capacity development and resilience. Fig. 4 shows the quartiles for each of the five variables in each of the three maturity levels.

Novice groups underscore in all five variables, whilst mature groups perform better in all five variables. The results corroborate with the findings made by SRI in the Capacity needs analysis survey aimed at assessing and enhancing the capacity of SFOs in which attention was drawn to whether there was a difference in the capacity needs between the Novice groups, considered to have lower capacities compared to the mature groups. Though some needs were common across the maturity levels, there were variations on the extent of the needs per the maturity level. For instance, the Novice groups were in more need for basic skills such as business planning, while the mature groups were in more need of financial management.

Conclusion

This paper presents empirical evidence on the existence of a logical statistical relationship between governance, management, leadership, capacity development and resilience and the efficiency of SFOs, and consequently their levels of maturity defined in terms of their effectiveness. Moreover, it validates the automated methodology for assessing SFOs and classifying them into three distinct maturity levels. Results indicate that indeed the software conforms to the requirements and the methodology designed. The analysis of means also indicate that the software is reproducible by an ANN model with particular weights and architecture thus adding weight to the existence of a mathematical logic in the methodology used as well as a logical relationship between the five variables and the maturity levels. The tool also has an advantage of being self-adjusting in terms of parameters because of its application of the normal curve in classification which can shift position according to the changes in the variable scores. This renders the tool time-adaptive in the assessment of the farmer groups.

RIDS proved effective in assessing the capacity needs for development of SFOs as areas in which the SFOs were deficient could be easily identified and capacity development interventions directed towards these areas. For instance, the Novice groups are in more need for basic skills such as business planning while the mature groups are in more need for financial management. This study therefore advocates for the use of the tool (RIDS) for assessment of SFOs to help direct capacity building for stronger SFOs and greater collective action benefits. Studies on the relationship between farmer group maturity and group/members’ incomes, food security, among other related indicators of livelihoods are areas for future research.

References

Kavoi JM, Mwangi JG, Kamau GM. Challenges faced by small land holder farmer regarding decision making in innovative agricultural development: an empirical analysis from kenya. Int J Agric Ext. 2014;2(2):101–8.
Google Scholar

Anh NH, Cuong TH, Nga BT. Production and marketing constraints of dairy farmers in Son La milk value chain, Vietnam. Greener J Business Manage Business Studies. 2013;3(1):031–7.
Google Scholar

Tanui J, Wendy M, Laura G, Douglas B, Verrah O, Mieke B, et al. Strategies for effective capacity building of grassroots communities. Nairobi, Kenya: World Agroforestry Centre; 2014, pp. 162.
Google Scholar

Tukahirwa J, Tenywa MKR, K R, N H. Scaling sustainable land management (SLM) innovations: Insights and lessons from rural grassroots initiatives in Eastern Africa. 4th Biannual Landcare Conference, pp. 12–6. Limpopo-South Africa, 2009 Jul.
Google Scholar

Ragasa C, Golana J. The role of rural producer organizations for agricultural service provision in fragile states development strategy and governance division. In International Food Policy Research Institute (IFPRI). 2012: IFPRI discussion papers 1235 [cited 2025 Jun 10]. 2012 Dec. Available from: https://ideas.repec.org/p/fpr/ifprid/1235.html.
Google Scholar

Shiferaw B, Obare G, Muricho G. Rural institutions and producer organizations in imperfect markets: experiences from producer marketing groups in semi-arid Eastern Kenya. CAPRi. 2006;60.
Google Scholar

Ramdwar MN a, Ganpat WG, Bridgemohan P. Exploring the barriers and opportunities to the development of farmers’ groups in selected caribbean countries. Int J Rural Manag. 2013 Oct 21;9(2):135–49.
Google Scholar

Barham J, Chitemi C. Collective action initiatives to improve marketing performance: lessons from farmer groups in Tanzania. Food Policy. 2009 Feb;34(1):53–9.
Google Scholar

Sonam T, Martwanna N. Performance of smallholder dairy farmers’ groups in the East and West central regions of Bhutan: members’ perspective. J Agric Ext Rural Dev. 2011 Jan 5;4(1):23–9.
Google Scholar

Westermann O, Ashby J, Pretty J. Gender and social capital: the importance of gender differences for the maturity and effectiveness of natural resource management groups. World Dev. 2005 Nov;33(11):1783–99.
Google Scholar

Ampaire EL, Machethe CL, Birachi E. The role of rural producer organizations in enhancing market participation of smallholder farmers in Uganda: enabling and disabling factors. Afr J Trop Agric. 2013;1(3):030–6.
Google Scholar

Araral E. What explains collective action in the commons? Theory and evidence from the philippines. World Dev. 2009 Mar;37(3):687–97.
Google Scholar

Pandolfelli L, Meinzen-Dick R, Dohrn S. Introduction gender and collective action: motivations, effectiveness and impact. J Int Dev. 2008;20(1):1–11.
Google Scholar

Fischer E, Qaim M. Linking smallholders to markets: determinants and impacts of farmer collective action in Kenya. World Dev. 2012;40(6):1255–68.
Google Scholar

Shiferaw B, Hellin J, Muricho G. Improving market access and agricultural productivity growth in Africa: what role for producer organizations and collective action institutions? Food Secur. 2011 Nov 26;3(4):475–89.
Google Scholar

Ampaire E, Development R, Africa S. Factors influencing effectiveness in second-tier marketing RPOs in Uganda. 2012;14:1–6.
Google Scholar

Fischer E, Qaim M. Smallholder farmers and collective action: what determines the intensity of participation? J Agric Econ. 2014 Jan 22;65(3):683–702.
Google Scholar

Njoku M. Factors influencing role performance of community based organisations in agricultural development. 2009;4(6):313–7.
Google Scholar

Ray B, Bhattacharya RN. Transaction costs, collective action and survival of heterogeneous co-management institutions: case study of forest management organisations in West Bengal. India J Dev Stud. 2011 Feb;47(2):253–73.
Google Scholar

Markelova H, Meinzen-Dick R, Hellin J, Dohrn S. Collective action for smallholder market access. Food Policy. 2009 Feb;34(1):1–7.
Google Scholar

Latif M, Tariq JA. Performance assessment of irrigation management transfer from government-managed to farmermanaged irrigation system : a case study. Wiley Intersci. 2009;286(July 2008):275–86.
Google Scholar

Mushtaq S, Dawe D, Lin H, Moya P. An assessment of collective action for pond management in Zhanghe Irrigation System (ZIS). China Agric Syst. 2007 Jan;92(1–3):140–56.
Google Scholar

Allahyari MS, Noorhosseini SA. Agro-economic factors determining on adoption of rice-fish farming. An Application for Artificial Neural Networks. 2014;1(2):151–6.
Google Scholar

Vladimir K, Oldrich T, Eliska S. Classification of companies with theassistance of self-learning neural networks Klasifikace podniku za pomoci samou ˚ cících se neuronovtfytfch sítí. ˇ Agric Econ. 2010;56(2):51–8.
Google Scholar

Asnaashari A, Edward AM, Bahram G, Donald T. Forecasting watermain failure using artificial neural network modelling. Can Water Res J. 2013;38(1):37–41.
Google Scholar

Despagne F, Massart DL. Tutorial review: neural networks in multivariate calibration. The Analyst. 1998;123(11):157R–78R.
Google Scholar

Maier HR, Dandy G. Neural networks for the prediction and forecasting of water resources variables: a review of modeling issues and applications. Environ Model Software. 2000;15(1):101–24.
Google Scholar

Shahin MA, Jaksa MB, Maier H. Artificial neural network based settlement prediction formula for shallow foundations on granular soils. Aust Geomech. 2002;37(4):45–52.
Google Scholar

Salchenberger LM, Cinar EM, Lash N. Neural networks: a new tool for predicting thrift failures. Decis Sci. 2007;23(4):899–916.
Google Scholar

Hellin J, Lundy M, Meijer M. Farmer organization, collective action and market access in Meso-America. Food Policy. 2009 Feb;34(1):16–22.
Google Scholar

Hecht-Nielsen R. Theory of the back-propagation neural network. International Joint Conference on Neural Networks, pp. 593–606, Washington, DC: IEEE TAB Neural Network Committee, 1989.
Google Scholar

Caudill M. Neural networks primer. AI Expert. 1988;3(6):53–9.
Google Scholar

Taylor WA. Methods and tools for process validation. Taylor Enterprises. 1998 [cited 2025 Jun 10]. Available from: https://variation.com/methods-and-tools-for-process-validation/.
Google Scholar

Latino JR. Optimizing FMEA and RCA efforts in health care. J Healthc Risk Manag. 2004;24(3):21–8.
Google Scholar

Downloads

PDF
HTML
EPUB
JATS XML

How to Cite

Application of Machine Learning in the Validation of the RIDS Software—A Tool for Assessing the Maturity Levels of Smallholder Farmer Organizations. (2025). European Journal of Mathematics and Statistics, 6(3), 6-18. https://doi.org/10.24018/ejmath.2025.6.3.390

Issue

Vol. 6 No. 3 (2025)

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] Kavoi JM, Mwangi JG, Kamau GM. Challenges faced by small land holder farmer regarding decision making in innovative agricultural development: an empirical analysis from kenya. Int J Agric Ext. 2014;2(2):101–8.
Google Scholar

[2] Anh NH, Cuong TH, Nga BT. Production and marketing constraints of dairy farmers in Son La milk value chain, Vietnam. Greener J Business Manage Business Studies. 2013;3(1):031–7.
Google Scholar

[3] Tanui J, Wendy M, Laura G, Douglas B, Verrah O, Mieke B, et al. Strategies for effective capacity building of grassroots communities. Nairobi, Kenya: World Agroforestry Centre; 2014, pp. 162.
Google Scholar

[4] Tukahirwa J, Tenywa MKR, K R, N H. Scaling sustainable land management (SLM) innovations: Insights and lessons from rural grassroots initiatives in Eastern Africa. 4th Biannual Landcare Conference, pp. 12–6. Limpopo-South Africa, 2009 Jul.
Google Scholar

[5] Ragasa C, Golana J. The role of rural producer organizations for agricultural service provision in fragile states development strategy and governance division. In International Food Policy Research Institute (IFPRI). 2012: IFPRI discussion papers 1235 [cited 2025 Jun 10]. 2012 Dec. Available from: https://ideas.repec.org/p/fpr/ifprid/1235.html.
Google Scholar

[6] Shiferaw B, Obare G, Muricho G. Rural institutions and producer organizations in imperfect markets: experiences from producer marketing groups in semi-arid Eastern Kenya. CAPRi. 2006;60.
Google Scholar

[7] Ramdwar MN a, Ganpat WG, Bridgemohan P. Exploring the barriers and opportunities to the development of farmers’ groups in selected caribbean countries. Int J Rural Manag. 2013 Oct 21;9(2):135–49.
Google Scholar

[8] Barham J, Chitemi C. Collective action initiatives to improve marketing performance: lessons from farmer groups in Tanzania. Food Policy. 2009 Feb;34(1):53–9.
Google Scholar

[9] Sonam T, Martwanna N. Performance of smallholder dairy farmers’ groups in the East and West central regions of Bhutan: members’ perspective. J Agric Ext Rural Dev. 2011 Jan 5;4(1):23–9.
Google Scholar

[10] Westermann O, Ashby J, Pretty J. Gender and social capital: the importance of gender differences for the maturity and effectiveness of natural resource management groups. World Dev. 2005 Nov;33(11):1783–99.
Google Scholar

[11] Ampaire EL, Machethe CL, Birachi E. The role of rural producer organizations in enhancing market participation of smallholder farmers in Uganda: enabling and disabling factors. Afr J Trop Agric. 2013;1(3):030–6.
Google Scholar

[12] Araral E. What explains collective action in the commons? Theory and evidence from the philippines. World Dev. 2009 Mar;37(3):687–97.
Google Scholar

[13] Pandolfelli L, Meinzen-Dick R, Dohrn S. Introduction gender and collective action: motivations, effectiveness and impact. J Int Dev. 2008;20(1):1–11.
Google Scholar

[14] Fischer E, Qaim M. Linking smallholders to markets: determinants and impacts of farmer collective action in Kenya. World Dev. 2012;40(6):1255–68.
Google Scholar

[15] Shiferaw B, Hellin J, Muricho G. Improving market access and agricultural productivity growth in Africa: what role for producer organizations and collective action institutions? Food Secur. 2011 Nov 26;3(4):475–89.
Google Scholar

[16] Ampaire E, Development R, Africa S. Factors influencing effectiveness in second-tier marketing RPOs in Uganda. 2012;14:1–6.
Google Scholar

[17] Fischer E, Qaim M. Smallholder farmers and collective action: what determines the intensity of participation? J Agric Econ. 2014 Jan 22;65(3):683–702.
Google Scholar

[18] Njoku M. Factors influencing role performance of community based organisations in agricultural development. 2009;4(6):313–7.
Google Scholar

[19] Ray B, Bhattacharya RN. Transaction costs, collective action and survival of heterogeneous co-management institutions: case study of forest management organisations in West Bengal. India J Dev Stud. 2011 Feb;47(2):253–73.
Google Scholar

[20] Markelova H, Meinzen-Dick R, Hellin J, Dohrn S. Collective action for smallholder market access. Food Policy. 2009 Feb;34(1):1–7.
Google Scholar

[21] Latif M, Tariq JA. Performance assessment of irrigation management transfer from government-managed to farmermanaged irrigation system : a case study. Wiley Intersci. 2009;286(July 2008):275–86.
Google Scholar

[22] Mushtaq S, Dawe D, Lin H, Moya P. An assessment of collective action for pond management in Zhanghe Irrigation System (ZIS). China Agric Syst. 2007 Jan;92(1–3):140–56.
Google Scholar

[23] Allahyari MS, Noorhosseini SA. Agro-economic factors determining on adoption of rice-fish farming. An Application for Artificial Neural Networks. 2014;1(2):151–6.
Google Scholar

[24] Vladimir K, Oldrich T, Eliska S. Classification of companies with theassistance of self-learning neural networks Klasifikace podniku za pomoci samou ˚ cících se neuronovtfytfch sítí. ˇ Agric Econ. 2010;56(2):51–8.
Google Scholar

[25] Asnaashari A, Edward AM, Bahram G, Donald T. Forecasting watermain failure using artificial neural network modelling. Can Water Res J. 2013;38(1):37–41.
Google Scholar

[26] Despagne F, Massart DL. Tutorial review: neural networks in multivariate calibration. The Analyst. 1998;123(11):157R–78R.
Google Scholar

[27] Maier HR, Dandy G. Neural networks for the prediction and forecasting of water resources variables: a review of modeling issues and applications. Environ Model Software. 2000;15(1):101–24.
Google Scholar

[28] Shahin MA, Jaksa MB, Maier H. Artificial neural network based settlement prediction formula for shallow foundations on granular soils. Aust Geomech. 2002;37(4):45–52.
Google Scholar

[29] Salchenberger LM, Cinar EM, Lash N. Neural networks: a new tool for predicting thrift failures. Decis Sci. 2007;23(4):899–916.
Google Scholar

[30] Hellin J, Lundy M, Meijer M. Farmer organization, collective action and market access in Meso-America. Food Policy. 2009 Feb;34(1):16–22.
Google Scholar

[31] Hecht-Nielsen R. Theory of the back-propagation neural network. International Joint Conference on Neural Networks, pp. 593–606, Washington, DC: IEEE TAB Neural Network Committee, 1989.
Google Scholar

[32] Caudill M. Neural networks primer. AI Expert. 1988;3(6):53–9.
Google Scholar

[33] Taylor WA. Methods and tools for process validation. Taylor Enterprises. 1998 [cited 2025 Jun 10]. Available from: https://variation.com/methods-and-tools-for-process-validation/.
Google Scholar

[34] Latino JR. Optimizing FMEA and RCA efforts in health care. J Healthc Risk Manag. 2004;24(3):21–8.
Google Scholar