Statistical Approach To Performance Comparison Of Predictive Algorithms: Application In Resistance Spot Welding
Resistance Spot Welding (RSW) is the dominant process to fabricate body closures and structural components in automotive industry. RSW is a complex process with inconsistent data and highly non-linear relation between process parameters. Several machine learning algorithms have been used to construct predictive models to assess weldability condition of RSW joints. However, to the best of our knowledge, a comprehensive analysis to compare performance of RSW weldability predictive models is lacking. In this investigation, a statistical framework is developed to assess performance superiority (high-accuracy and low-variability) of several machine learning algorithm(s) in RSW applications. First, machine learning algorithms popular in RSW literature are selected and pooled. As our contribution to this pool, a state-of-the-art Deep Neural Net (DNN) algorithm is added. Second, using a ten-fold cross-validation scheme, predictive models are constructed using Advanced High Strength Steel (AHSS) welding data from a major automotive original equipment manufacturer. Third, using Monte Carlo statistical simulation analysis, original and bootstrapped test sets are applied to the pool of constructed models to generate sampling distribution of the estimates, i.e. accuracy measure for each algorithm. Finally, statistical comparative experiments are used to determine the superior predictive algorithm(s) with results that indicate that the DNN model outperforms other models. Our study shows that the DNN model improves accuracy and variability by approximately 19% and 7% on average, respectively. As an improvement for the research for the case of big data scenarios, DATAVIEW a big data infrastructure is used instead of traditional data analytics framework that is developed based on R. DATAVIEW and R scientifically compared by developing full-factorial statistical experiments. Our research indicate that DATAVIEW significantly outperforms R in terms of computational costs and performance efficiency.