Multiple regression analysis of copper prices
Fundamental: Copper prices are determined by a lot of fundamentals like dollar index, copper consumption, housing index, industrial production, and stock of copper in the world, copper ore production and import of copper by different countries. From past few years, China and USA have been the largest importer of copper in the world and imports quantity also has an impact on copper prices. This study determines the impact of different variables on copper prices using multiple regression analysis.
In this regression analysis, copper price is the dependant variable and dollar index (DX), China imports of copper, USA imports of copper, total stock of copper and world consumption of copper are the independent variables. The variables selected are based on the correlation analysis, the variables which are least correlated are taken into the analysis as dependant variables.
The process flow for the analysis is as under:
Data partition: In data partition, the entire data is divided into two partitions, 80% being the training data, 10% testing and 10% is validating data. After partitioning data, the insight analysis of the data is done using enterprise miner. The data is checked for the assumptions of linear regressions like normality, detecting outliers and transformation of variables before putting it to the regression analysis.
The above table shows the transformation of different variables used in analysis. It can be seen that after transformation the skewness of data has decreased to a great extent in most of the variables. After transformation of the variable the outlier filter is run which would remove the outliers in some of the variables as seen in the distribution analysis.
The regression analysis when run gave the following results:
The regression analysis when run gave results where Dollar Index came out to be having most impact on the copper prices followed by imports of copper by China. Total consumption of copper doesn’t clear the t-test here and thus cannot be classified as a variable having impact on the total stock. Also, fundamentally Chinese imports and dollar index already discounts the impact of total consumption as USA and China are the largest users of copper.
Time also stands out to be a variable having some impact on copper prices. Total stock of copper has the least impact on the prices. Overall the regression model came out to explain close to 90% of variations in the copper prices.
The SAS System 23:51 Friday, October 31, 2008 7
The DMREG Procedure
Model Information
Training Data Set _EMSPDE.SP_DGM00001.DATA
DMDB Catalog EMPROJ.SP_DGM00001
Target Variable PRIC_9VG (Prices: Maximize normality)
Target Measurement Level Interval
Error Normal
Link Function Identity
Number of Model Parameters 5
Number of Observations 102
Analysis of Variance
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 4 35.966193 8.991548 218.47 <.0001 Error 97 3.992232 0.041157 Corrected Total 101 39.958425 Model Fit Statistics R-Square 0.9001 Adj R-Sq 0.8960 AIC -320.5435 BIC -318.0333 SBC -307.4186 C(p) 5.0000 Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error t Value Pr > |t|
Intercept 1 -12.2845 6.0880 -2.02 0.0464
CHIN_SE3 1 -0.00175 0.000234 -7.48 <.0001
DX_3RMO7 1 -1.9050 0.2611 -7.30 <.0001
WRCO_904 1 2.0530 0.4243 4.84 <.0001
Time 1 0.00744 0.00143 5.21 <.0001
In order to remove the variable world consumption form the list of independent variables, the regression analysis was run again with 4 variables and thus the final model is selected. The F-test gives a value of 218.47 with a p-value which is significantly low, which rejects the null hypothesis of model not being a good fit for the analysis. Hence we accept the alternate hypothesis of model being a good fit.
The above table gives the results for the different partitions of the data. And the results are quite close for average standard error and maximum absolute error. Thus it is concluded that china imports and dollar index are top 2 variables having an impact on the prices of copper. These variables are followed by time factor and world total stocks of copper.
Jarque-Bera test for normality of residual: Taking the JB test for the residuals, we test the normality assumption for residuals. The assumption of normal distribution for residuals is important and JB test has accepted the null hypothesis of residuals being normally distributed and thus the assumption holds good.
White heteroskedasticity test for residuals: The white test checks the heteroskedasticity or non-constant variance in the residual terms. If there is heteroskedasticity, the forecasting becomes a problem using the model because the error terms keep changing. Thus error terms should lie between a range. The test takes a null hypothesis of homoskedasticity and the test should accept the null hypothesis with high p-value and small F-statistic. In this test the assumption holds good for the homoskedasticity of the residuals.
White Heteroskedasticity Test:
F-statistic
1.556945
Probability
0.102774
Obs*R-squared
20.68987
Probability
0.109848
Test Equation:
Dependent Variable: RESID^2
Method: Least Squares
Date: 11/04/10 Time: 10:20
Sample: 1 127
Included observations: 127
Newey-West HAC Standard Errors & Covariance (lag truncation=4)
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
-83.13552
332.3502
-0.250144
0.8029
CHINA
-0.031927
0.020330
-1.570442
0.1191
CHINA^2
-3.55E-07
6.41E-07
-0.554688
0.5802
CHINA*LW
0.002098
0.001367
1.534175
0.1278
CHINA*LDX
0.000573
0.000788
0.726770
0.4689
CHINA*TIME
-2.93E-06
5.33E-06
-0.550778
0.5829
LW
10.22270
46.34195
0.220593
0.8258
LW^2
-0.426168
1.641817
-0.259571
0.7957
LW*LDX
0.288200
1.327384
0.217119
0.8285
LW*TIME
-0.002887
0.009718
-0.297043
0.7670
LDX
6.001456
18.48112
0.324734
0.7460
LDX^2
-1.069046
0.839763
-1.273033
0.2056
LDX*TIME
-0.006501
0.007509
-0.865759
0.3885
TIME
0.072793
0.137160
0.530716
0.5967
TIME^2
-6.93E-06
1.89E-05
-0.367772
0.7137
R-squared
0.162912
Mean dependent var
0.040863
Adjusted R-squared
0.058276
S.D. dependent var
0.047923
S.E. of regression
0.046506
Akaike info criterion
-3.187958
Sum squared resid
0.242230
Schwarz criterion
-2.852030
Log likelihood
217.4353
F-statistic
1.556945
Durbin-Watson stat
1.147569
Prob(F-statistic)
0.102774