Introduction
A proper experimental design plays a vital role in developing every product or process. An excellent experimental design requires a strong understanding of the system we are studying.1 When considering factorial designs, there may be variable, independent or dependent factors. Once we identify them, we analyze and design experiments to know by which technique we can have a maximum response. Optimization is the widely used process to find the best one among all the available alternatives.2 Several optimization techniques are utilized these days to easily understand and find the most suitable outcome. Response Surface Methodology (RSM) is one of most frequently used experimental designs for optimization.3 RSM play a significant role in analyzing, designing and developing new processes and products. It is a collection of statistical and mathematical techniques used to set up a series of experiments to fit an empirical model and to determine the optimum conditions on the model input variables that can offer the maximum/minimum response within a region of interest.4, 5 RSM is widely used in situations where multiple input variables influence the performance measure of a process.6
RSM attempts to correlate a response to the levels of a number of different variables or factors that influence it through appropriate experiment design and analysis. RSM makes use of a more than one polynomial regression equation equations to fit functional relationships between factors and response values. Regression analysis optimizes process parameters and predicts response values.7 With a fine viewpoint for predictive model creation, RSM delivers greater result reproducibility and process improvement.8 Since RSM can analyze the effects of multiple factors and their interactions on more than one response variables, it is often used in various optimization scenarios.
Approximating response functions and the experimental strategy of RSM
If the researcher is interested in a system that has the response y that depends on a configurable input variable., ξ1, ξ2… ξk.5, 6 The relationship is
Where
η= f(x₁, x₂, x₃,....,xₖ)
We must approximate the true response function f because its form is unknown. In general, a low-degree quadratic model can approximate such a relationship in some comparatively small regions of the form's independent variable space. A first-order or second-order model is typically used.5, 6, 9
When there are two independent variables, the first-order model can be written as coded variables.
η = β₀ + β1x₁ + β2x₂
Because it only includes the main effects of the two variables, this model is known as the main effects model.5, 6
The first-order model is unsuitable for analyzing maximum, minimum, and ridgelines. When there is an interaction among these variables, it is can be incorporate it into the model as shown below.
η = β₀ + β1x₁ + β2x₂ + β12x₁x₂
When the interaction term is added, a curvature will be generated into the response function. A first order model is insufficient in this scenario because of the strong actual response surface. A second order will almost certainly be needed in these cases.5, 6
η = β₀ + β1x₁ + β2x₂ +β11x2₁ + β22x22 + β12x₁x₂
The second-order model is really pliable, and the variables in the second-order model are simple to approximate. Second-order models depict quadratic surfaces such as the minimum, maximum, saddle point, ridge, and stationary point.5, 6, 9
The response can be graphically represented as contour plots and three-dimensional space which help illustrate the configuration of the response surface. Regrettably, because the response surface extends farther than three dimensions, graphs are hard to use when you have more than two independent variables involved. The three-dimensional response surface plot helps us to determine the independent variables interaction effects (factors). On the other hand,the two-dimensional contour visually displays the response values.10
Major steps involved in RSM
Identification of the problem: Confirm the case in which you'd like to employ RSM, which is usually when various input variables may affect the products or process's quality attributes.1
The decision of factor levels by screening experiments: A method of monitoring should be employed to determine the factors that have a significant impact on interest responses.5, 11
Determination of the independent variable (factors): These are input variables that can be changed independently.5, 6, 11
Determination of the dependent variable (responses): The response is the performance measure or quality characteristic.5, 11
Selection of the appropriate experimental design: The appropriate experimental design is an important aspect of using RSM. The quantity of runs and blocks, as well as the experimental points used, differ between these designs.11
Selection of a regression model: The closely resembling model should be based on data collected from the identified process or system. The design with the highest precision for efficient utilization and the simplest form for easy operation is typically preferred.11
Mathematical–statistical treatment of data: After gathering information for each experimental point of a chosen design, a mathematical formula must be fit to characterize the response's behavior based on the levels of values studied. As once information has been collected, the Least Squares method is employed to calculate the parameters in the quadratic formula.6
Verification of the fitted model: One must evaluate whether the model accurately represents the dependence between the dependent and independent variables using standard techniques such as residual analysis, prediction error sum of squares (PRESS) residuals, and testing of the inadequate fit using analysis of variance (ANOVA).11
Graphical presentation of the model equation: The surface response plot can be used to illustrate the expected model equation. The predictive models are used to build contours and response surfaces within the range of observations.11
Prediction of optimal operating conditions: The creation of the optimal control variable settings that result in a maximum/minimum response over a specific area of focus.9
Optimization of the model: Optimization provides additional information about the level combinations of the independent variables that will produce the best product/process features.11
Validation of model: Optimization provides additional information about the level combinations of the independent variables that will produce the best product/process features.
Design of experiments in RSM
The design of experiments (DOE) is an approach for determining the correlation between processing parameters and process output. DoE seeks to identify design variables with serious influences for further investigation.11 The most prevalent first-order designs are 2k factorial, simplex and Plackett-Burman, while the most prevalent second-order designs are central composite, 3k factorial, and Box-Behnken.9
The 2k factorial design
In a 2k factorial design, each of the variables can be evaluated at two levels, and that can be programmed the values -1, 1 that correlate to the lower and higher levels of each parameter. These designs, known as screening designs, are frequently employed when the main effects and interactions are assumed to be roughly linear in the interval of interest.12
The plackett–burman design
The Plackett-Burman design, like the 2k design, allows for two levels for each of the k control variables but necessitates far fewer experimental runs, especially if k is large.7 As a result, it costs less than the 2k design. These designs are used to investigate n-1 variables in experiments conducted, recommending experimental methods for over seven factors, particularly for nx4 experiments. Because the number of design points equals the amount of variables approximated in the model, these designs are regarded as saturated.9
The simplex design
With n = k + 1 points, the simplex design is also a saturated design.9, 13 Its design points are at the vertices of a k-dimensional regular-sided figure, defined by the property that every two points form an angle with the design centre that is cos = -1/k.9
The 3k factorial design
The 3k factorial design is composed of all permutations of the levels of the k control variables, each of which includes three levels. For such a design, the number of trial runs is 3k, which can be extremely large for a large k. The cost of conducting such an experiment can be reduced by fractions of a 3k design.9, 14
The central composite design (CCD)
The most commonly preferred design is a Box-Wilson central composite design(CCD).5, 15 In CCD, the point in the centre is the design space's called central points, factorial points were the with factor levels written as -1, +1 and axial points that are symmetrically arranged on the coordinate system's axes with respect to the central point.15 Central composite designs are advantageous in sequential trials since they frequently allow you to expand on prior factorial assessments by introducing axial and centre points.
The box–behnken design (BBD)
Box and Behnken define three levels for each factor, each of which is made up of a particular subset of the factorial combinations from the 3k factorial design.5, 9, 14 The impact of the various design parameters can be analyzed sequentially with these models if the other elements are kept constant while the first factors are examined. The Box Behnken design is popular in industrial research as it is a low-cost design that necessitates only three levels for every element, with configurations of 1, 0, 1.5, 14
Choice of a response surface design
Permits sequential assembly: Designs have often been used sequentially, i.e. it should have the ability to perform runs sequentially and to move in the space of variables.16
Robustness to intense observations and violations of normal theory assumptions: A design is regarded as robust if it aids in reducing the impact of non-ideal circumstances on analysis of data.4, 5, 6
The ability to conduct experiments in group’s blocks: In most experiments, the available experimental units are grouped into blocks with more or less identical characteristics to remove the blocking effect from the experimental error.4
Improved delectability of lack of fit: These are designs that induce a certain level of sensitivity to the fitted model's potential insufficiency.5, 16
Guarantees rotatability: Onc can consider a design is rotatable if Var[(x)] is constant at all points along the surface of a hyper sphere centred at the origin.4, 5, 9.
Higher-order design extensibility: These designs frequently include a lack of fit detection that aids in determining whenever a higher-order model is required. A few of these models do not meet the requirements for higher-order extensibility.4, 5
Merely a few experimental runs are sufficient: A few experimental runs enable us to conduct experiments in a cost-effective and time-effective manner. Also, provide the maximum amount of information with the minimum effort.4, 5, 6
RSM tools and components
Design of experiments (DoE): One of the essential aspects of RSM. DoE aims to select the most appropriate points where the response should be well examined.17
Experimental domain: the field of experiment bound by the upper and lower limits of the independent variables.14, 18
Runs: These are a series of tests that form an experiment.5, 19
Variables: The quantity which can have a variety of values in a particular problem.19
Independent variables (factors): These are input variables that can be changed independently of each other.14
Dependent variables (responses): These are output variables that are influenced by several independent variables.14
Design points: The values of the factors at which the experiment is conducted.20
Experimental design: This is the specific system of experiments defined by a matrix created with the different level combinations of the independent variables.14, 16
Design space: The range of values in which the factors vary.
Residual: It is the contrast between the calculated and observed results for a limited set of conditions. An excellent and mathematical model which is well fitted to the experimental data must have low residual values.14, 18
Levels of a variable: These are different values of a variable at which the experiments must be carried out.14
Controlled experiment: In this study where treatments are imposed on experimental units in order to observe a response.5
Effect: It is the change produced in reaction generated by modifying the factor's values. The relationship between various factors and levels can be described in this way.19
Interaction: The cumulative effect of two or more variables (factors) on a response is described by interaction, similar to effect.19
DOE matrix: A collection of encoding settings of combination process variables at a level whose effect on the output is of interest. The combination is arranged in a matrix design.9
Response surface: Represents the mean response at any given level of the factors in the design space.5
Center point: used to measure process stability/variability and check for the curvature of the response surface.19
Contour plot: Geometric illustration of a response obtained by plotting one independent variable against another, while holding the magnitude of response and other variables constant.5
Some widely used software for the designing of experiments
Design expert: Design–expert is a statistical software package from Stat-Ease Inc. that is focused solely to the execution of design of studies (DOE).7, 17, 21
ECHIP: ECHIP is a state-of-the-art software package that offers a user-friendly interface for conducting statistically planned experiments.22
Nemrodw: This software provides a wide choice of experiment matrices to perfectly satisfy your needs while taking into account your experimental constraints, both technical and financial.23
Minitab: Minitab is a programme for statistical analysis. It can be used for both learning and conducting statistical research.24
Systat: Systat provides an unparalleled selection of scientific and technical graphing possibilities. Your results will be more meaningful if you create individual graphs.25
Graphpad prism: GraphPad Prism is a commercially available scientific 2D graphing and statistics software for both Windows and Macintosh systems.26
Multi-simplex: Multi-simplex is Windows-based software for the successive design of experiments and optimization. MultiSimplex is mainly utilized to enhance the quality of products, productivity of processes and execution of analytical instruments.27
SAS: SAS is a command-driven statistical analysis and data visualization software programme. It is only compatible with Windows operating systems.28
Applications of RSM
Response surface methodology is used as a statistical tool for optimization.3
It is efficient in improving existing studies and products because RSM yields the maximum amount of information with the minimum effort.29
RSM is important in designing, developing, and examining specific scientific studies and products.29
RSM is used to figure out the topography of the response surface and to determine the region with the best response.6
The RSM can be used with various large-scale simulation systems, including Bio War, ORA, Vista, Construct, and DyNet.9
Advantages and disadvantages of RSM
Advantages of RSM
A relatively small number of trials can yield a tremendous amount of knowledge in a cost effective manner.
Can be used to determine the interaction effects of the independent input parameters.
The data-driven model equation can be utilized to illustrate the different combinations of independent input factors that affect the outcome of a process/product.
Both experimental and numerical responses can be approximated using RSM.30
To maintain a high level of efficiency in terms of cost, time, and any other restrictions.
Compared to the Taguchi and one factorial method, the RSM technique appears to be more promising in mathematical modeling for forecasting responses.5
Disadvantages of RSM
It cannot be utilized to explain why an interaction has developed.31
This method necessitates the selection of appropriate operating parameter ranges, and the optimization result is limited to specific scales.
RSM is not good at foretelling prospective outcomes for a system operated outside the range of a particular study.5
RSM cannot operate with larger models.6
The more responses you receive, the more likely you will receive poor optimization results.32
Discussion
Finding a condition with the best output for a system is the primary purpose of Optimization. Validity evaluation of the optimum conditions estimated through RSM is a crucial factor in the RSM approach. The process of Optimization of variables comprises mainly seven different steps4, 5, 6, 14 such as picking of responses, picking of variables and assigning codes to them, development of experimental designs, regression analysis, followed by formation of a quadratic polynomial, i.e. response development and creation of a 2D contour plot or 3D surface of the examined response surface and validation of optimum operating conditions.
RSM as a tool for optimization: A live demonstration taken from the research manuscripts
Example 1
This experiment aimed to develop and optimize bisoprolol fumarate matrix tablets for sustained release application using response surface methodology based on 23 factorial designs.
The study looked at the impact of independent factors (calcium alginate, Carbopol 943 and HPMC K4M) on cumulative drug release after 6 hours (R6h, %) and hardness (kg/cm2) as optimization response parameters.33 The 23 factorial design proposed a total of 8 trial formulations of bisoprolol fumarate matrix tablets for three independent variables and the Design-Expert 8.0.6.1 software-generated appropriate polynomial model equations incorporating individual main factors and interaction factors.33 The following mathematical model equation involving independent variables and their interactions for various measured responses obtained by 23 factorial designs was used to model the impact of various independent variables on measured responses:33
Y = b0 +b1A+b2B +b3C + b4AB +b5AC +b6BC
Where Y is the dependent variable, b0 is the intercept, b1, b2, b3, b4, b5, b6, and b7 are regression coefficients, A, B, and C are independent variables and AB, AC, and BC are interactions between variables The importance of the model and individual response parameters was estimated using one-way ANOVA. Different analytical models, such as zero-order, first-order, Higuchi, and Korsmeyer-Peppas, were used to evaluate the in vitro drug release data from various bisoprolol fumarate matrix tablets kinetically.33
The optimized bisoprolol matrix tablets were made by direct compression method utilizing one of the selected optimal process variable settings given by the experimental design for evaluating the optimization potential of these models generated based on the results of the 23 factorial designs. A = 15.28mg, B = 32.12mg, and C = 30.31mg were chosen as the optimal process variable values for the formulation of optimized bisoprolol matrix tablets.33 The numerical analysis was performed to acquire the optimal values of responses based on the desirability criterion by the help of Design expert 8.0.6.1 software, which led to developing optimized bisoprolol fumarate matrix tablets (FO). The optimized bisoprolol fumarate matrix tablets (F-O) showed R6h of 41.61 ± 1.97% and hardness of 4.65 ± 0.07kg/cm2 within small error values (less than 5), indicating that mathematical models achieved from the 23 a factorial design was well fitted.33
Example 2
The purpose of this study is to use permeate and its lactose as a sugar substitute, as well as to incorporate the beneficial permeate compounds into an optimal orange juice formulation.34 Milk permeate, a waste product of dairy companies, was used in the production of orange juice as a less expensive water and sugar substitution.34 The heated and unheated permeate samples were incubated with the glycosidase enzyme at three different temperatures (35, 40, and 45°C), three different time intervals (60, 150, and 240 min), and three different enzyme levels (0%, 0.1%, and 0.2%). The MilkoScan analyzer was used to determine the degree of hydrolysis.34 The orange juice was then optimized using a mixture of sugar and hydrolyzed permeate with specific Brix using RSM statistical design.34 After 8 weeks of storage, the physicochemical properties and sensory evaluation were measured.34 The effects of three qualitative factors and one nominal factor on permeate lactose hydrolysis, as well as the effects of storage time and permeate amount, were investigated using response surface methodology in an orange beverage prepared using a treatment from the first stage. Following implementation, the data was subjected to variance and regression analysis, with the Fisher distribution used to determine significant effects.34
Example 3
In this particular study RSM is used for optimizing arginine deiminase (ADI) production medium for Enterococcus faecium sp. GR7.7 For improving enzyme activity and cell densities in the LAB isolate, E. faecium sp. GR7, the parameters including fermentation media and environmental conditions were optimized using independent experiments and RSM (central composite design) using Design Expert Software trial version 8.0.2 statistical software (State-Ease Inc., Minneapolis, MN, USA).7 A polynomial model derived from a multiple regression technique was studied at five different levels for the factors, namely tryptone, lactose, and arginine, as well as four constant variables, so that interactions among these variables at different levels could be studied for two responses, namely ADI activity and biomass.7 In CCD, a total of 20 experiments were used to estimate curvature and interaction effects of selected variables, and the significance of the obtained model was checked using the test and the goodness of fit by multiple correlation as well as determination coefficients.7 To illustrate the relationships between experimental and predicted values, all design matrices were generated and analyzed using Design-Expert 8.0.2, and the results were displayed as 2D contour plots. In this study, result of bioprocess was optimized for future scale up of ADI production process in E. faecium sp. GR7.7
Conclusion
Previously, to optimize a process/product, the influence of one-parameter modifications on a response is examined while others are kept constant. The main disadvantage of this method is it does not consider the interactive effects among the variables, which is crucial to find the output-input relationship. This method can also not explain the factors' full effect on the response. In addition, this strategy increases the number of experiments required to complete the research, resulting in higher costs and time. A subsequent RSM approach includes executing the relevant experimental design, estimating the coefficient in the relevant response surface equation, verifying the equation's validity to explain the fit, and reviewing the response surface to identify and evaluate the regions of interest. The above-mentioned applications of RSM from three different research manuscripts briefly explain the role played by RSM in different contexts. In the optimization of bisoprolol fumarate matrix tablets for sustained drug release (SDR), the applicability of factorial design in the development of pharmaceutical formulation and the link between the independent variables and the responses to them was well examined.33 RSM is used to study the effects of three qualitative factors and one nominal factor in permeate lactose hydrolysis and to evaluate the effect of the storage time and the permeated amount in the orange beverage prepared by using selected treatments.34 In the third context, a CCD statistical strategy was successfully used to determine optimum values of significant response factors, resulting in a 15-fold increase in ADI production in RSM-optimized media over basal media in E. faecium sp. GR7.7 All these studies depict the importance of RSM to improve, develop and optimize processes/products in the fields of science and industry.
Future Directions
RSM is widely used as an alternative methodology for reducing variance and improving processes.35 While computer-generated design technology has helped individuals interested in creating RSM designs, adjustments are required to consider design robustness rather than design optimality.36 Box and Wilson developed and characterized response surface methodology as an experimental strategy that has been used successfully in various settings, especially in industrial sciences and chemical engineering. One of the most promising future directions for simulation-oriented RSM research and development appears to be the integration of induced- correlation methods with the method of control variants.36 RSM may also be used as an laborsaving method for model assessment and validation, particularly for modern computational multi-agent large-scale social network platforms that are drastically being used to model and simulate complex social networks.6 In its broadest sense, RSM has become the epicenter of industrial experimentation in this short period.29 RSM is expanding into domains that need the usage of generalized linear models (GLMs), and the user will find it difficult or impossible to apply optimal RSM designs in these areas.34 Nowadays, various professionals in biological, biomedical and the rapidly growing biopharmaceutical area are frequently drawing attention with response surface ideas. The substantial increase in the number of different sorts of practitioners interested in RSM will keep increasing due to its fascinating range of applications and advantages over other methods.