class Statsample::Factor::ParallelAnalysis
Performs Horn's 'parallel analysis' to a principal components analysis to adjust for sample bias in the retention of components. Can create the bootstrap samples using random data, using number of cases and variables, parameters for actual data (mean and standard deviation of each variable) or bootstrap sampling for actual data.
Description¶ ↑
“PA involves the construction of a number of correlation matrices of random variables based on the same sample size and number of variables in the real data set. The average eigenvalues from the random correlation matrices are then compared to the eigenvalues from the real data correlation matrix, such that the first observed eigenvalue is compared to the first random eigenvalue, the second observed eigenvalue is compared to the second random eigenvalue, and so on.” (Hayton, Allen & Scarpello, 2004, p.194)
Usage¶ ↑
*With real dataset*
# ds should be any valid dataset pa=Statsample::Factor::ParallelAnalysis.new(ds, :iterations=>100, :bootstrap_method=>:data)
*With number of cases and variables*
pa=Statsample::Factor::ParallelAnalysis.with_random_data(100,8)
Reference¶ ↑
-
Hayton, J., Allen, D. & Scarpello, V.(2004). Factor Retention Decisions in Exploratory Factor Analysis: a Tutorial on Parallel Analysis. Organizational Research Methods, 7 (2), 191-205.
-
O'Connor, B. (2000). SPSS and SAS programs for determining the number of components using parallel analysis and Velicer's MAP test. Behavior Research Methods, Instruments, & Computers, 32(3), 396-402.
-
Liu, O., & Rijmen, F. (2008). A modified procedure for parallel analysis of ordered categorical data. Behavior Research Methods, 40(2), 556-562.
Performs Horn's 'parallel analysis' to a principal components analysis to adjust for sample bias in the retention of components. Can create the bootstrap samples using random data, using number of cases and variables, parameters for actual data (mean and standard deviation of each variable) or bootstrap sampling for actual data.
Description¶ ↑
“PA involves the construction of a number of correlation matrices of random variables based on the same sample size and number of variables in the real data set. The average eigenvalues from the random correlation matrices are then compared to the eigenvalues from the real data correlation matrix, such that the first observed eigenvalue is compared to the first random eigenvalue, the second observed eigenvalue is compared to the second random eigenvalue, and so on.” (Hayton, Allen & Scarpello, 2004, p.194)
Usage¶ ↑
*With real dataset*
# ds should be any valid dataset pa=Statsample::Factor::ParallelAnalysis.new(ds, :iterations=>100, :bootstrap_method=>:data)
*With number of cases and variables*
pa=Statsample::Factor::ParallelAnalysis.with_random_data(100,8)
Reference¶ ↑
-
Hayton, J., Allen, D. & Scarpello, V.(2004). Factor Retention Decisions in Exploratory Factor Analysis: a Tutorial on Parallel Analysis. Organizational Research Methods, 7 (2), 191-205.
-
O'Connor, B. (2000). SPSS and SAS programs for determining the number of components using parallel analysis and Velicer's MAP test. Behavior Research Methods, Instruments, & Computers, 32(3), 396-402.
-
Liu, O., & Rijmen, F. (2008). A modified procedure for parallel analysis of ordered categorical data. Behavior Research Methods, 40(2), 556-562.
Attributes
Bootstrap method. :random
used by default
-
:random
: uses number of variables and cases for the dataset -
:data
: sample with replacement from actual data.
Show extra information if true
Dataset. You could use mock vectors when use bootstrap method
Dataset with bootstrapped eigenvalues
Number of random sets to produce. 50 by default
Correlation matrix used with :raw_data . :correlation_matrix
used by default
Number of eigenvalues to calculate. Should be set for Principal Axis Analysis.
Name of analysis
Perform analysis without actual data.
Percentil over bootstrap eigenvalue should be accepted. 95 by default
Uses smc on diagonal of matrixes, to perform simulation of a Principal Axis analysis. By default, false.
Public Class Methods
# File lib/statsample/factor/parallelanalysis.rb, line 62 def initialize(ds, opts=Hash.new) @ds=ds @fields=@ds.fields @n_variables=@fields.size @n_cases=ds.cases opts_default={ :name=>_("Parallel Analysis"), :iterations=>50, # See Liu and Rijmen (2008) :bootstrap_method => :random, :smc=>false, :percentil=>95, :debug=>false, :no_data=>false, :matrix_method=>:correlation_matrix } @use_gsl=Statsample.has_gsl? @opts=opts_default.merge(opts) @opts[:matrix_method]==:correlation_matrix if @opts[:bootstrap_method]==:parameters opts_default.keys.each {|k| send("#{k}=", @opts[k]) } end
# File lib/statsample/factor/parallelanalysis.rb, line 24 def self.with_random_data(cases,vars,opts=Hash.new) require 'ostruct' ds=OpenStruct.new ds.fields=vars.times.map {|i| "v#{i+1}"} ds.cases=cases opts=opts.merge({:bootstrap_method=> :random, :no_data=>true}) new(ds, opts) end
Public Instance Methods
Perform calculation. Shouldn't be called directly for the user
# File lib/statsample/factor/parallelanalysis.rb, line 122 def compute @original=Statsample::Bivariate.send(matrix_method, @ds).eigenvalues unless no_data @ds_eigenvalues=Statsample::Dataset.new((1..@n_variables).map{|v| "ev_%05d" % v}) @ds_eigenvalues.fields.each {|f| @ds_eigenvalues[f].type=:scale} if bootstrap_method==:parameter or bootstrap_method==:random rng = Distribution::Normal.rng end @iterations.times do |i| begin puts "#{@name}: Iteration #{i}" if $DEBUG or debug # Create a dataset of dummy values ds_bootstrap=Statsample::Dataset.new(@ds.fields) @fields.each do |f| if bootstrap_method==:random ds_bootstrap[f]=@n_cases.times.map {|c| rng.call}.to_scale elsif bootstrap_method==:data ds_bootstrap[f]=ds[f].sample_with_replacement(@n_cases) else raise "bootstrap_method doesn't recogniced" end end ds_bootstrap.update_valid_data matrix=Statsample::Bivariate.send(matrix_method, ds_bootstrap) matrix=matrix.to_gsl if @use_gsl if smc smc_v=matrix.inverse.diagonal.map{|ii| 1-(1.quo(ii))} smc_v.each_with_index do |v,ii| matrix[ii,ii]=v end end ev=matrix.eigenvalues @ds_eigenvalues.add_case_array(ev) rescue Statsample::Bivariate::Tetrachoric::RequerimentNotMeet => e puts "Error: #{e}" if $DEBUG redo end end @ds_eigenvalues.update_valid_data end
Number of factor to retent
# File lib/statsample/factor/parallelanalysis.rb, line 83 def number_of_factors total=0 ds_eigenvalues.fields.each_with_index do |f,i| if (@original[i]>0 and @original[i]>ds_eigenvalues[f].percentil(percentil)) total+=1 else break end end total end