class Statsample::Factor::ParallelAnalysis

Performs Horn's 'parallel analysis' to a principal components analysis to adjust for sample bias in the retention of components. Can create the bootstrap samples using random data, using number of cases and variables, parameters for actual data (mean and standard deviation of each variable) or bootstrap sampling for actual data.

Description

“PA involves the construction of a number of correlation matrices of random variables based on the same sample size and number of variables in the real data set. The average eigenvalues from the random correlation matrices are then compared to the eigenvalues from the real data correlation matrix, such that the first observed eigenvalue is compared to the first random eigenvalue, the second observed eigenvalue is compared to the second random eigenvalue, and so on.” (Hayton, Allen & Scarpello, 2004, p.194)

Usage

*With real dataset*

# ds should be any valid dataset
pa=Statsample::Factor::ParallelAnalysis.new(ds, :iterations=>100, :bootstrap_method=>:data)

*With number of cases and variables*

pa=Statsample::Factor::ParallelAnalysis.with_random_data(100,8)

Reference

Performs Horn's 'parallel analysis' to a principal components analysis to adjust for sample bias in the retention of components. Can create the bootstrap samples using random data, using number of cases and variables, parameters for actual data (mean and standard deviation of each variable) or bootstrap sampling for actual data.

Description

“PA involves the construction of a number of correlation matrices of random variables based on the same sample size and number of variables in the real data set. The average eigenvalues from the random correlation matrices are then compared to the eigenvalues from the real data correlation matrix, such that the first observed eigenvalue is compared to the first random eigenvalue, the second observed eigenvalue is compared to the second random eigenvalue, and so on.” (Hayton, Allen & Scarpello, 2004, p.194)

Usage

*With real dataset*

# ds should be any valid dataset
pa=Statsample::Factor::ParallelAnalysis.new(ds, :iterations=>100, :bootstrap_method=>:data)

*With number of cases and variables*

pa=Statsample::Factor::ParallelAnalysis.with_random_data(100,8)

Reference

Attributes

bootstrap_method[RW]

Bootstrap method. :random used by default

  • :random: uses number of variables and cases for the dataset

  • :data : sample with replacement from actual data.

debug[RW]

Show extra information if true

ds[R]

Dataset. You could use mock vectors when use bootstrap method

ds_eigenvalues[R]

Dataset with bootstrapped eigenvalues

iterations[RW]

Number of random sets to produce. 50 by default

matrix_method[RW]

Correlation matrix used with :raw_data . :correlation_matrix used by default

n_variables[RW]

Number of eigenvalues to calculate. Should be set for Principal Axis Analysis.

name[RW]

Name of analysis

no_data[RW]

Perform analysis without actual data.

percentil[RW]

Percentil over bootstrap eigenvalue should be accepted. 95 by default

smc[RW]

Uses smc on diagonal of matrixes, to perform simulation of a Principal Axis analysis. By default, false.

use_gsl[RW]

Public Class Methods

new(ds, opts=Hash.new) click to toggle source
# File lib/statsample/factor/parallelanalysis.rb, line 62
def initialize(ds, opts=Hash.new)
  @ds=ds
  @fields=@ds.fields
  @n_variables=@fields.size
  @n_cases=ds.cases
  opts_default={
    :name=>_("Parallel Analysis"),
    :iterations=>50, # See Liu and Rijmen (2008)
    :bootstrap_method => :random,
    :smc=>false,
    :percentil=>95, 
    :debug=>false,
    :no_data=>false,
    :matrix_method=>:correlation_matrix
  }
  @use_gsl=Statsample.has_gsl?
  @opts=opts_default.merge(opts)
  @opts[:matrix_method]==:correlation_matrix if @opts[:bootstrap_method]==:parameters
  opts_default.keys.each {|k| send("#{k}=", @opts[k]) }
end
with_random_data(cases,vars,opts=Hash.new) click to toggle source
# File lib/statsample/factor/parallelanalysis.rb, line 24
def self.with_random_data(cases,vars,opts=Hash.new)
  require 'ostruct'
  ds=OpenStruct.new
  ds.fields=vars.times.map {|i| "v#{i+1}"}
  ds.cases=cases
  opts=opts.merge({:bootstrap_method=> :random, :no_data=>true})
  new(ds, opts)
end

Public Instance Methods

compute() click to toggle source

Perform calculation. Shouldn't be called directly for the user

# File lib/statsample/factor/parallelanalysis.rb, line 122
def compute
  
  
  @original=Statsample::Bivariate.send(matrix_method, @ds).eigenvalues unless no_data        
  @ds_eigenvalues=Statsample::Dataset.new((1..@n_variables).map{|v| "ev_%05d" % v})
  @ds_eigenvalues.fields.each {|f| @ds_eigenvalues[f].type=:scale}
  if bootstrap_method==:parameter or bootstrap_method==:random
    rng = Distribution::Normal.rng
  end
  
  @iterations.times do |i|
    begin
      puts "#{@name}: Iteration #{i}" if $DEBUG or debug
      # Create a dataset of dummy values
      ds_bootstrap=Statsample::Dataset.new(@ds.fields)
      
      @fields.each do |f|
        if bootstrap_method==:random
          ds_bootstrap[f]=@n_cases.times.map {|c| rng.call}.to_scale
        elsif bootstrap_method==:data
          ds_bootstrap[f]=ds[f].sample_with_replacement(@n_cases)
        else
          raise "bootstrap_method doesn't recogniced"
        end
      end
      ds_bootstrap.update_valid_data
      
      matrix=Statsample::Bivariate.send(matrix_method, ds_bootstrap)
      matrix=matrix.to_gsl if @use_gsl
      if smc
          smc_v=matrix.inverse.diagonal.map{|ii| 1-(1.quo(ii))}
          smc_v.each_with_index do |v,ii| 
            matrix[ii,ii]=v
          end
      end
      ev=matrix.eigenvalues
      @ds_eigenvalues.add_case_array(ev)
    rescue Statsample::Bivariate::Tetrachoric::RequerimentNotMeet => e
      puts "Error: #{e}" if $DEBUG
      redo
    end
  end
  @ds_eigenvalues.update_valid_data
end
number_of_factors() click to toggle source

Number of factor to retent

# File lib/statsample/factor/parallelanalysis.rb, line 83
def number_of_factors
  total=0
  ds_eigenvalues.fields.each_with_index do |f,i|
    if (@original[i]>0 and @original[i]>ds_eigenvalues[f].percentil(percentil))
      total+=1
    else
      break
    end
  end
  total
end