The statistical procedure of regression analysis studies the relation between two or more numerical values. Often, researchers use the methodology in its simplest form, by analyzing relationships between one independent and one dependent variable. And, while this process is important for observing past and current trends, such patterns in existing data can be utilized to make meaningful predictions.
This last aspect of regression analysis, predicting future relationships, makes the technique incredibly important for market research purposes. Market researchers often assign their intended outcome, such as sales, as the dependent variable and treat the tools in which they achieve that outcome as the independent variable. Possessing knowledge on the relationship between these two variables can influence decisions to pursue further the independent variable tested, which can lead to a gain in revenue.
Regression analysis is also essential to the many fields of social science. The flexible model is a reliable way for scientists to visualize their hypotheses with empirical data for testing and presenting their research. As with market research, the regression line can also be used for making predictions in social science.
Just from these two examples, it is clear that regression analysis has varying applications, but, even with the input of different variables for a collection of purposes, the procedure should usually follow the same methodology. A new standard document, ASTM E3080-17 – Standard Practice for Regression Analysis, has been published to identify all concepts to be used in this method.
The method covered in ASTM E3080-17 is that of a basic straight-line regression, in which the dependent variable (X) is correlated with the independent variable (Y), which is always to be treated as a random variable. The standard gives the following example of a complete regression plot:
This graph indicates the effect that weld diameter has on shear strength, and, incorporating the data that goes into the calculations preceding the scatter plot, it plots confidence (CI) and prediction interval (PI) limits together, with 95 percent CI and PI.
As detailed in ASTM E3080-17, the regression analysis practitioner, in pursuit of finding the relationship between X and Y, most commonly uses a simple linear relationship in the form of Y = α + β X + ε. In this, α and β are model coefficients, and ε is a random error term representing variation in the observed value of Y at given X and is assumed to have a mean of 0 and some unknown standard deviation σ.
The regression problem is intended to determine estimates of the coefficients α and β that “best” fit the data and allow estimation of σ. Another important concept to this methodology is the coefficient of determination (r2), which indicates meaning for the relationship between X and Y.
The specific methodology considered in ASTM E3080-17 is the method of least squares, is used to estimate the model parameters α and β. The form of the best fitting line in this analysis is denoted Y = a + bX, where a and b are the estimates of α and β, respectively.
Further guidance on terminology, concepts, and executing the regression analysis procedure is addressed in the standard, walking the user through the complete process.
ASTM E3080-17 – Standard Practice for Regression Analysis is now available on the ANSI Webstore.
1. ASTM Standard E3080, 2016, “Regression Analysis Standard Practice,” ASTM International, West Conshohocken, PA, 2016, DOI: 10.1520/E3080-16, www.astm.org.