class Statsample::DominanceAnalysis

Dominance Analysis is a procedure based on an examination of the R<sup>2</sup> values for all possible subset models, to identify the relevance of one or more predictors in the prediction of criterium.

See Budescu(1993), Azen & Budescu (2003, 2006) for more information.

Use

a=1000.times.collect {rand}.to_scale
b=1000.times.collect {rand}.to_scale
c=1000.times.collect {rand}.to_scale
ds={'a'=>a,'b'=>b,'c'=>c}.to_dataset
ds['y']=ds.collect{|row| row['a']*5+row['b']*3+row['c']*2+rand()}
da=Statsample::DominanceAnalysis.new(ds,'y')
puts da.summary

Output:

Report: Report 2010-02-08 19:10:11 -0300
Table: Dominance Analysis result
------------------------------------------------------------
|                  | r2    | sign  | a     | b     | c     |
------------------------------------------------------------
| Model 0          |       |       | 0.648 | 0.265 | 0.109 |
------------------------------------------------------------
| a                | 0.648 | 0.000 | --    | 0.229 | 0.104 |
| b                | 0.265 | 0.000 | 0.612 | --    | 0.104 |
| c                | 0.109 | 0.000 | 0.643 | 0.260 | --    |
------------------------------------------------------------
| k=1 Average      |       |       | 0.627 | 0.244 | 0.104 |
------------------------------------------------------------
| a*b              | 0.877 | 0.000 | --    | --    | 0.099 |
| a*c              | 0.752 | 0.000 | --    | 0.224 | --    |
| b*c              | 0.369 | 0.000 | 0.607 | --    | --    |
------------------------------------------------------------
| k=2 Average      |       |       | 0.607 | 0.224 | 0.099 |
------------------------------------------------------------
| a*b*c            | 0.976 | 0.000 | --    | --    | --    |
------------------------------------------------------------
| Overall averages |       |       | 0.628 | 0.245 | 0.104 |
------------------------------------------------------------

Table: Pairwise dominance
-----------------------------------------
| Pairs | Total | Conditional | General |
-----------------------------------------
| a - b | 1.0   | 1.0         | 1.0     |
| a - c | 1.0   | 1.0         | 1.0     |
| b - c | 1.0   | 1.0         | 1.0     |
-----------------------------------------

Reference:

Dominance Analysis is a procedure based on an examination of the R<sup>2</sup> values for all possible subset models, to identify the relevance of one or more predictors in the prediction of criterium.

See Budescu(1993), Azen & Budescu (2003, 2006) for more information.

Use

a=1000.times.collect {rand}.to_scale
b=1000.times.collect {rand}.to_scale
c=1000.times.collect {rand}.to_scale
ds={'a'=>a,'b'=>b,'c'=>c}.to_dataset
ds['y']=ds.collect{|row| row['a']*5+row['b']*3+row['c']*2+rand()}
da=Statsample::DominanceAnalysis.new(ds,'y')
puts da.summary

Output:

Report: Report 2010-02-08 19:10:11 -0300
Table: Dominance Analysis result
------------------------------------------------------------
|                  | r2    | sign  | a     | b     | c     |
------------------------------------------------------------
| Model 0          |       |       | 0.648 | 0.265 | 0.109 |
------------------------------------------------------------
| a                | 0.648 | 0.000 | --    | 0.229 | 0.104 |
| b                | 0.265 | 0.000 | 0.612 | --    | 0.104 |
| c                | 0.109 | 0.000 | 0.643 | 0.260 | --    |
------------------------------------------------------------
| k=1 Average      |       |       | 0.627 | 0.244 | 0.104 |
------------------------------------------------------------
| a*b              | 0.877 | 0.000 | --    | --    | 0.099 |
| a*c              | 0.752 | 0.000 | --    | 0.224 | --    |
| b*c              | 0.369 | 0.000 | 0.607 | --    | --    |
------------------------------------------------------------
| k=2 Average      |       |       | 0.607 | 0.224 | 0.099 |
------------------------------------------------------------
| a*b*c            | 0.976 | 0.000 | --    | --    | --    |
------------------------------------------------------------
| Overall averages |       |       | 0.628 | 0.245 | 0.104 |
------------------------------------------------------------

Table: Pairwise dominance
-----------------------------------------
| Pairs | Total | Conditional | General |
-----------------------------------------
| a - b | 1.0   | 1.0         | 1.0     |
| a - c | 1.0   | 1.0         | 1.0     |
| b - c | 1.0   | 1.0         | 1.0     |
-----------------------------------------

Reference:

Public Class Methods

new(input, dependent, opts=Hash.new) click to toggle source

Creates a new DominanceAnalysis object Parameters:

  • input: A Matrix or Dataset object

  • dependent: Name of dependent variable. Could be an array, if you want to

    do an Multivariate Regression Analysis. If nil, set to all
    fields on input, except criteria
# File lib/statsample/dominanceanalysis.rb, line 102
def initialize(input, dependent, opts=Hash.new)
  @build_from_dataset=false
  if dependent.is_a? Array
    @regression_class= MULTIVARIATE_REGRESSION_CLASS
    @method_association=:r2yx
  else
    @regression_class= UNIVARIATE_REGRESSION_CLASS
    @method_association=:r2
  end
  
  @name=nil
  opts.each{|k,v|
    self.send("#{k}=",v) if self.respond_to? k
  }
  @dependent=dependent
  @dependent=[@dependent] unless @dependent.is_a? Array
  
  @predictors ||= input.fields-@dependent
  
  @name=_("Dominance Analysis:  %s over %s") % [ @predictors.flatten.join(",") , @dependent.join(",")] if @name.nil?
  
  if input.is_a? Statsample::Dataset
    @ds=input
    @matrix=Statsample::Bivariate.correlation_matrix(input)
    @cases=Statsample::Bivariate.min_n_valid(input)
  elsif input.is_a? ::Matrix
    @ds=nil
    @matrix=input
  else
    raise ArgumentError.new("You should use a Matrix or a Dataset")
  end
  @models=nil
  @models_data=nil
  @general_averages=nil
end
predictor_name(variable) click to toggle source
# File lib/statsample/dominanceanalysis.rb, line 88
def self.predictor_name(variable)
  if variable.is_a? Array
    sprintf("(%s)", variable.join(","))
  else
    variable
  end
end

Public Instance Methods

average_k(k) click to toggle source

Hash with average for each k size model.

# File lib/statsample/dominanceanalysis.rb, line 286
def average_k(k)
  return nil if k==@predictors.size
  models=md_k(k)
  averages=@predictors.inject({}) {|a,v| a[v]=[];a}
  models.each do |m|
    @predictors.each do |f|
      averages[f].push(m.contributions[f]) unless m.contributions[f].nil?
    end
  end
  get_averages(averages)
end
compute() click to toggle source

Compute models.

# File lib/statsample/dominanceanalysis.rb, line 138
def compute
  create_models
  fill_models
end
conditional_dominance() click to toggle source
# File lib/statsample/dominanceanalysis.rb, line 255
def conditional_dominance
  pairs.inject({}){|a,pair| a[pair]=conditional_dominance_pairwise(pair[0], pair[1])
  a
  }
end
conditional_dominance_pairwise(i,j) click to toggle source

Returns 1 if i cD k, 0 if j cD i and 0.5 if undetermined

# File lib/statsample/dominanceanalysis.rb, line 218
def conditional_dominance_pairwise(i,j)
  dm=dominance_for_nil_model(i,j)
  return 0.5 if dm==0.5
  dominances=[dm]
  for k in 1...@predictors.size
    a=average_k(k)
    if a[i]>a[j]
        dominances.push(1)
    elsif a[i]<a[j]
        dominances.push(0)
    else
      return 0.5
        #dominances.push(0.5)
    end                 
  end
  final=dominances.uniq
  final.size>1 ? 0.5 : final[0]            
end
dominance_for_nil_model(i,j) click to toggle source
# File lib/statsample/dominanceanalysis.rb, line 187
def dominance_for_nil_model(i,j)
  if md([i]).r2>md([j]).r2
    1
  elsif md([i]).r2<md([j]).r2
    0
  else
    0.5
  end           
end
general_averages() click to toggle source
# File lib/statsample/dominanceanalysis.rb, line 297
def general_averages
  if @general_averages.nil?
    averages=@predictors.inject({}) {|a,v| a[v]=[md([v]).r2];a}
    for k in 1...@predictors.size
      ak=average_k(k)
      @predictors.each do |f|
        averages[f].push(ak[f])
      end
    end
    @general_averages=get_averages(averages)
  end
  @general_averages
end
general_dominance() click to toggle source
# File lib/statsample/dominanceanalysis.rb, line 260
def general_dominance
  pairs.inject({}){|a,pair| a[pair]=general_dominance_pairwise(pair[0], pair[1])
  a
  }
end
general_dominance_pairwise(i,j) click to toggle source

Returns 1 if i gD k, 0 if j gD i and 0.5 if undetermined

# File lib/statsample/dominanceanalysis.rb, line 237
def general_dominance_pairwise(i,j)
  ga=general_averages
  if ga[i]>ga[j]
    1
  elsif ga[i]<ga[j]
    0
  else
    0.5
  end                 
end
get_averages(averages) click to toggle source

For a hash with arrays of numbers as values Returns a hash with same keys and value as the mean of values of original hash

# File lib/statsample/dominanceanalysis.rb, line 280
def get_averages(averages)
  out={}
  averages.each{|key,val| out[key]=val.to_vector(:scale).mean }
  out
end
md(m) click to toggle source
# File lib/statsample/dominanceanalysis.rb, line 266
def md(m)
  models_data[m.sort {|a,b| a.to_s<=>b.to_s}]
end
md_k(k) click to toggle source

Get all model of size k

# File lib/statsample/dominanceanalysis.rb, line 270
def md_k(k)
  out=[]
  @models.each{|m| out.push(md(m)) if m.size==k }
  out
end
models() click to toggle source
# File lib/statsample/dominanceanalysis.rb, line 142
def models
  if @models.nil?
    compute
  end
  @models
end
models_data() click to toggle source
# File lib/statsample/dominanceanalysis.rb, line 149
def models_data
  if @models_data.nil?
    compute
  end
  @models_data
end
pairs() click to toggle source
# File lib/statsample/dominanceanalysis.rb, line 247
def pairs
  models.find_all{|m| m.size==2}
end
report_building(g) click to toggle source
# File lib/statsample/dominanceanalysis.rb, line 312
def report_building(g)
  compute if @models.nil?
  g.section(:name=>@name) do |generator|
    header=["","r2",_("sign")]+@predictors.collect {|c| DominanceAnalysis.predictor_name(c) }
    
    generator.table(:name=>_("Dominance Analysis result"), :header=>header) do |t|
      row=[_("Model 0"),"",""]+@predictors.collect{|f|
        sprintf("%0.3f",md([f]).r2)
      }
      
      t.row(row)
      t.hr
      for i in 1..@predictors.size
        mk=md_k(i)
        mk.each{|m|
          t.row(m.add_table_row)
        }
        # Report averages
        a=average_k(i)
        if !a.nil?
            t.hr
            row=[_("k=%d Average") % i,"",""] + @predictors.collect{|f|
                sprintf("%0.3f",a[f])
            }
            t.row(row)
            t.hr
            
        end
      end
      
      g=general_averages
      t.hr
      
      row=[_("Overall averages"),"",""]+@predictors.collect{|f|
                sprintf("%0.3f",g[f])
      }
      t.row(row)
    end
    
    td=total_dominance
    cd=conditional_dominance
    gd=general_dominance
    generator.table(:name=>_("Pairwise dominance"), :header=>[_("Pairs"),_("Total"),_("Conditional"),_("General")]) do |t|
      pairs.each{|pair|
        name=pair.map{|v| v.is_a?(Array) ? "("+v.join("-")+")" : v}.join(" - ")
        row=[name, sprintf("%0.1f",td[pair]), sprintf("%0.1f",cd[pair]), sprintf("%0.1f",gd[pair])]
        t.row(row)
      }
    end
  end
end
total_dominance() click to toggle source
# File lib/statsample/dominanceanalysis.rb, line 250
def total_dominance
  pairs.inject({}){|a,pair| a[pair]=total_dominance_pairwise(pair[0], pair[1])
  a
  }
end
total_dominance_pairwise(i,j) click to toggle source

Returns 1 if i D k, 0 if j dominates i and 0.5 if undetermined

# File lib/statsample/dominanceanalysis.rb, line 197
def total_dominance_pairwise(i,j)
  dm=dominance_for_nil_model(i,j)
  return 0.5 if dm==0.5
  dominances=[dm]
  models_data.each do |k,m|
    if !m.contributions[i].nil? and !m.contributions[j].nil?
      if m.contributions[i]>m.contributions[j]
          dominances.push(1)
      elsif m.contributions[i]<m.contributions[j]
          dominances.push(0)
      else
        return 0.5
          #dominances.push(0.5)
      end
    end
  end
  final=dominances.uniq
  final.size>1 ? 0.5 : final[0]
end