class Statsample::Regression::Multiple::MatrixEngine
Pure Ruby Class for Multiple Regression Analysis, based on a covariance or correlation matrix.
Use Statsample::Regression::Multiple::RubyEngine if you have a Dataset, to avoid setting all details.
Remember: NEVER use a Covariance data if you have missing data. Use only correlation matrix on that case.
Example:
matrix=[[1.0, 0.5, 0.2], [0.5, 1.0, 0.7], [0.2, 0.7, 1.0]] lr=Statsample::Regression::Multiple::MatrixEngine.new(matrix,2)
Pure Ruby Class for Multiple Regression Analysis, based on a covariance or correlation matrix.
Use Statsample::Regression::Multiple::RubyEngine if you have a Dataset, to avoid setting all details.
Remember: NEVER use a Covariance data if you have missing data. Use only correlation matrix on that case.
Example:
matrix=[[1.0, 0.5, 0.2], [0.5, 1.0, 0.7], [0.2, 0.7, 1.0]] lr=Statsample::Regression::Multiple::MatrixEngine.new(matrix,2)
Attributes
Number of cases
Hash of mean for predictors. By default, set to 0
Mean for criteria. By default, set to 0
Standard deviation of criterion Only useful for Correlation Matrix, because by default is set to 1
Public Class Methods
Create object
# File lib/statsample/regression/multiple/matrixengine.rb, line 36 def initialize(matrix,y_var, opts=Hash.new) matrix.extend Statsample::CovariateMatrix raise "#{y_var} variable should be on data" unless matrix.fields.include? y_var if matrix._type==:covariance @matrix_cov=matrix @matrix_cor=matrix.correlation @no_covariance=false else @matrix_cor=matrix @matrix_cov=matrix @no_covariance=true end @y_var=y_var @fields=matrix.fields-[y_var] @n_predictors=@fields.size @predictors_n=@n_predictors @matrix_x= @matrix_cor.submatrix(@fields) @matrix_x_cov= @matrix_cov.submatrix(@fields) raise LinearDependency, "Regressors are linearly dependent" if @matrix_x.determinant<1e-15 @matrix_y = @matrix_cor.submatrix(@fields, [y_var]) @matrix_y_cov = @matrix_cov.submatrix(@fields, [y_var]) @y_sd=Math::sqrt(@matrix_cov.submatrix([y_var])[0,0]) @x_sd=@n_predictors.times.inject({}) {|ac,i| ac[@matrix_x_cov.fields[i]]=Math::sqrt(@matrix_x_cov[i,i]) ac; } @cases=nil @x_mean=@fields.inject({}) {|ac,f| ac[f]=0.0 ac; } @y_mean=0.0 @name=_("Multiple reggresion of %s on %s") % [@fields.join(","), @y_var] opts_default={:digits=>3} opts=opts_default.merge opts opts.each{|k,v| self.send("#{k}=",v) if self.respond_to? k } result_matrix=@matrix_x_cov.inverse * @matrix_y_cov if matrix._type==:covariance @coeffs=result_matrix.column(0).to_a @coeffs_stan=coeffs.collect {|k,v| coeffs[k]*@x_sd[k].quo(@y_sd) } else @coeffs_stan=result_matrix.column(0).to_a @coeffs=standarized_coeffs.collect {|k,v| standarized_coeffs[k]*@y_sd.quo(@x_sd[k]) } end @total_cases=@valid_cases=@cases end
Public Instance Methods
# File lib/statsample/regression/multiple/matrixengine.rb, line 100 def cases raise "You should define the number of valid cases first" if @cases.nil? @cases end
Hash of b or raw coefficients
# File lib/statsample/regression/multiple/matrixengine.rb, line 123 def coeffs assign_names(@coeffs) end
Standard Error for coefficients. Standard error of a coefficients depends on
-
Tolerance of the coeffients: Higher tolerances implies higher error
-
Higher r2 implies lower error
Reference:¶ ↑
-
Cohen et al. (2003). Applied Multiple Reggression / Correlation Analysis for the Behavioral Sciences
# File lib/statsample/regression/multiple/matrixengine.rb, line 161 def coeffs_se out={} #mse=sse.quo(df_e) coeffs.each {|k,v| out[k]=@y_sd.quo(@x_sd[k])*Math::sqrt( 1.quo(tolerance(k)))*Math::sqrt((1-r2).quo(df_e)) } out end
Value of constant
# File lib/statsample/regression/multiple/matrixengine.rb, line 118 def constant c=coeffs @y_mean - @fields.inject(0){|a,k| a + (c[k] * @x_mean[k])} end
Standard error for constant. This method recreates the estimaded variance-covariance matrix using means, standard deviation and covariance matrix. So, needs the covariance matrix.
# File lib/statsample/regression/multiple/matrixengine.rb, line 178 def constant_se return nil if @no_covariance means=@x_mean #means[@y_var]=@y_mean means[:constant]=1 sd=@x_sd #sd[@y_var]=@y_sd sd[:constant]=0 fields=[:constant]+@matrix_cov.fields-[@y_var] # Recreate X'X using the variance-covariance matrix xt_x=Matrix.rows(fields.collect {|i| fields.collect {|j| if i==:constant or j==:constant cov=0 elsif i==j cov=sd[i]**2 else cov=@matrix_cov.submatrix(i..i,j..j)[0,0] end cov*(@cases-1)+@cases*means[i]*means[j] } }) matrix=xt_x.inverse * mse matrix.collect {|i| Math::sqrt(i) if i>0 }[0,0] end
t value for constant
# File lib/statsample/regression/multiple/matrixengine.rb, line 170 def constant_t return nil if constant_se.nil? constant.to_f / constant_se end
Degrees of freedom for error
# File lib/statsample/regression/multiple/matrixengine.rb, line 141 def df_e cases-@n_predictors-1 end
Degrees of freedom for regression
# File lib/statsample/regression/multiple/matrixengine.rb, line 137 def df_r @n_predictors end
Multiple correlation, on random models.
# File lib/statsample/regression/multiple/matrixengine.rb, line 114 def r Math::sqrt(r2) end
Get R^2 for the regression For fixed models is the coefficient of determination. On random models, is the 'squared-multiple correlation' Equal to
-
1-(|R| / |R_x|) or
-
Sum(b_i*r_yi) <- used
# File lib/statsample/regression/multiple/matrixengine.rb, line 110 def r2 @n_predictors.times.inject(0) {|ac,i| ac+@coeffs_stan[i]* @matrix_y[i,0]} end
Total sum of squares
# File lib/statsample/regression/multiple/matrixengine.rb, line 132 def sst @y_sd**2*(cases-1.0) end
Hash of beta or standarized coefficients
# File lib/statsample/regression/multiple/matrixengine.rb, line 128 def standarized_coeffs assign_names(@coeffs_stan) end
Tolerance for a given variable defined as (1-R^2) of regression of other independent variables over the selected
Reference:¶ ↑
# File lib/statsample/regression/multiple/matrixengine.rb, line 149 def tolerance(var) return 1 if @matrix_x.column_size==1 lr=Statsample::Regression::Multiple::MatrixEngine.new(@matrix_x, var) 1-lr.r2 end