Multiple regression analysis of copper prices

Multiple regression analysis of copper prices
Fundamental: Copper prices are determined by a lot of fundamentals like dollar index, copper consumption, housing index, industrial production, and stock of copper in the world, copper ore production and import of copper by different countries. From past few years, China and USA have been the largest importer of copper in the world and imports quantity also has an impact on copper prices. This study determines the impact of different variables on copper prices using multiple regression analysis.
In this regression analysis, copper price is the dependant variable and dollar index (DX), China imports of copper, USA imports of copper, total stock of copper and world consumption of copper are the independent variables. The variables selected are based on the correlation analysis, the variables which are least correlated are taken into the analysis as dependant variables.
The process flow for the analysis is as under:

Data partition: In data partition, the entire data is divided into two partitions, 80% being the training data, 10% testing and 10% is validating data. After partitioning data, the insight analysis of the data is done using enterprise miner. The data is checked for the assumptions of linear regressions like normality, detecting outliers and transformation of variables before putting it to the regression analysis.

The above table shows the transformation of different variables used in analysis. It can be seen that after transformation the skewness of data has decreased to a great extent in most of the variables. After transformation of the variable the outlier filter is run which would remove the outliers in some of the variables as seen in the distribution analysis.
The regression analysis when run gave the following results:

The regression analysis when run gave results where Dollar Index came out to be having most impact on the copper prices followed by imports of copper by China. Total consumption of copper doesn’t clear the t-test here and thus cannot be classified as a variable having impact on the total stock. Also, fundamentally Chinese imports and dollar index already discounts the impact of total consumption as USA and China are the largest users of copper.
Time also stands out to be a variable having some impact on copper prices. Total stock of copper has the least impact on the prices. Overall the regression model came out to explain close to 90% of variations in the copper prices.

The SAS System 23:51 Friday, October 31, 2008 7

The DMREG Procedure

Model Information

Training Data Set _EMSPDE.SP_DGM00001.DATA
DMDB Catalog EMPROJ.SP_DGM00001
Target Variable PRIC_9VG (Prices: Maximize normality)
Target Measurement Level Interval
Error Normal
Link Function Identity
Number of Model Parameters 5
Number of Observations 102

Analysis of Variance

Sum of
Source DF Squares Mean Square F Value Pr > F

Model 4 35.966193 8.991548 218.47 <.0001 Error 97 3.992232 0.041157 Corrected Total 101 39.958425 Model Fit Statistics R-Square 0.9001 Adj R-Sq 0.8960 AIC -320.5435 BIC -318.0333 SBC -307.4186 C(p) 5.0000 Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error t Value Pr > |t|

Intercept 1 -12.2845 6.0880 -2.02 0.0464
CHIN_SE3 1 -0.00175 0.000234 -7.48 <.0001 DX_3RMO7 1 -1.9050 0.2611 -7.30 <.0001 WRCO_904 1 2.0530 0.4243 4.84 <.0001 Time 1 0.00744 0.00143 5.21 <.0001 In order to remove the variable world consumption form the list of independent variables, the regression analysis was run again with 4 variables and thus the final model is selected. The F-test gives a value of 218.47 with a p-value which is significantly low, which rejects the null hypothesis of model not being a good fit for the analysis. Hence we accept the alternate hypothesis of model being a good fit. The above table gives the results for the different partitions of the data. And the results are quite close for average standard error and maximum absolute error. Thus it is concluded that china imports and dollar index are top 2 variables having an impact on the prices of copper. These variables are followed by time factor and world total stocks of copper. Jarque-Bera test for normality of residual: Taking the JB test for the residuals, we test the normality assumption for residuals. The assumption of normal distribution for residuals is important and JB test has accepted the null hypothesis of residuals being normally distributed and thus the assumption holds good. White heteroskedasticity test for residuals: The white test checks the heteroskedasticity or non-constant variance in the residual terms. If there is heteroskedasticity, the forecasting becomes a problem using the model because the error terms keep changing. Thus error terms should lie between a range. The test takes a null hypothesis of homoskedasticity and the test should accept the null hypothesis with high p-value and small F-statistic. In this test the assumption holds good for the homoskedasticity of the residuals. White Heteroskedasticity Test: F-statistic 1.556945 Probability 0.102774 Obs*R-squared 20.68987 Probability 0.109848 Test Equation: Dependent Variable: RESID^2 Method: Least Squares Date: 11/04/10 Time: 10:20 Sample: 1 127 Included observations: 127 Newey-West HAC Standard Errors & Covariance (lag truncation=4) Variable Coefficient Std. Error t-Statistic Prob. C -83.13552 332.3502 -0.250144 0.8029 CHINA -0.031927 0.020330 -1.570442 0.1191 CHINA^2 -3.55E-07 6.41E-07 -0.554688 0.5802 CHINA*LW 0.002098 0.001367 1.534175 0.1278 CHINA*LDX 0.000573 0.000788 0.726770 0.4689 CHINA*TIME -2.93E-06 5.33E-06 -0.550778 0.5829 LW 10.22270 46.34195 0.220593 0.8258 LW^2 -0.426168 1.641817 -0.259571 0.7957 LW*LDX 0.288200 1.327384 0.217119 0.8285 LW*TIME -0.002887 0.009718 -0.297043 0.7670 LDX 6.001456 18.48112 0.324734 0.7460 LDX^2 -1.069046 0.839763 -1.273033 0.2056 LDX*TIME -0.006501 0.007509 -0.865759 0.3885 TIME 0.072793 0.137160 0.530716 0.5967 TIME^2 -6.93E-06 1.89E-05 -0.367772 0.7137 R-squared 0.162912 Mean dependent var 0.040863 Adjusted R-squared 0.058276 S.D. dependent var 0.047923 S.E. of regression 0.046506 Akaike info criterion -3.187958 Sum squared resid 0.242230 Schwarz criterion -2.852030 Log likelihood 217.4353 F-statistic 1.556945 Durbin-Watson stat 1.147569 Prob(F-statistic) 0.102774