class Statsample::Test::UMannWhitney

U Mann-Whitney test¶ ↑

Non-parametric test for assessing whether two independent samples of observations come from the same distribution.

Assumptions¶ ↑

The two samples under investigation in the test are independent of each other and the observations within each sample are independent.
The observations are comparable (i.e., for any two observations, one can assess whether they are equal or, if not, which one is greater).
The variances in the two groups are approximately equal.

Higher differences of distributions correspond to to lower values of U.

U Mann-Whitney test¶ ↑

Non-parametric test for assessing whether two independent samples of observations come from the same distribution.

Assumptions¶ ↑

The two samples under investigation in the test are independent of each other and the observations within each sample are independent.
The observations are comparable (i.e., for any two observations, one can assess whether they are equal or, if not, which one is greater).
The variances in the two groups are approximately equal.

Higher differences of distributions correspond to to lower values of U.

Constants

MAX_MN_EXACT: Max for m*n allowed for exact calculation of probability

Attributes

name[RW]

Name of test

r1[R]

Sample 1 Rank sum

r2[R]

Sample 2 Rank sum

t[R]

Value of compensation for ties (useful for demostration)

u[R]

U Value

u1[R]

Sample 1 U (useful for demostration)

u2[R]

Sample 2 U (useful for demostration)

Public Class Methods

distribution_permutations(n1,n2) click to toggle source

Generate distribution for permutations. Very expensive, but useful for demostrations

# File lib/statsample/test/umannwhitney.rb, line 78
def self.distribution_permutations(n1,n2)
  base=[0]*n1+[1]*n2
  po=Statsample::Permutation.new(base)
  
  total=n1*n2
  req={}
  po.each do |perm|
    r0,s0=0,0
    perm.each_index {|c_i|
      if perm[c_i]==0
        r0+=c_i+1
        s0+=1
      end
    }
    u1=r0-((s0*(s0+1)).quo(2))
    u2=total-u1
    temp_u= (u1 <= u2) ? u1 : u2
    req[perm]=temp_u
  end
  req
end

new(v1,v2, opts=Hash.new) click to toggle source

Create a new U Mann-Whitney test Params: Two Statsample::Vectors

# File lib/statsample/test/umannwhitney.rb, line 118
def initialize(v1,v2, opts=Hash.new)
  @v1=v1
  @v2=v2
  @n1=v1.valid_data.size
  @n2=v2.valid_data.size
  data=(v1.valid_data+v2.valid_data).to_scale
  groups=(([0]*@n1)+([1]*@n2)).to_vector
  ds={'g'=>groups, 'data'=>data}.to_dataset
  @t=nil
  @ties=data.data.size!=data.data.uniq.size        
  if(@ties)
    adjust_for_ties(ds['data'])
  end
  ds['ranked']=ds['data'].ranked(:scale)
  
  @n=ds.cases
    
  @r1=ds.filter{|r| r['g']==0}['ranked'].sum
  @r2=((ds.cases*(ds.cases+1)).quo(2))-r1
  @u1=r1-((@n1*(@n1+1)).quo(2))
  @u2=r2-((@n2*(@n2+1)).quo(2))
  @u=(u1<u2) ? u1 : u2
  opts_default={:name=>_("Mann-Whitney's U")}
  @opts=opts_default.merge(opts)
  opts_default.keys.each {|k|
    send("#{k}=", @opts[k])
  }
    
end

u_sampling_distribution_as62(n1,n2) click to toggle source

U sampling distribution, based on Dinneen & Blakesley (1973) algorithm. This is the algorithm used on SPSS.

Parameters:

n1: group 1 size
n2: group 2 size

Reference: ¶ ↑

Dinneen, L., & Blakesley, B. (1973). Algorithm AS 62: A Generator for the Sampling Distribution of the Mann- Whitney U Statistic. Journal of the Royal Statistical Society, 22(2), 269-273

# File lib/statsample/test/umannwhitney.rb, line 31
def self.u_sampling_distribution_as62(n1,n2)

  freq=[]
  work=[]
  mn1=n1*n2+1
  max_u=n1*n2
  minmn=n1<n2 ? n1 : n2
  maxmn=n1>n2 ? n1 : n2
  n1=maxmn+1
  (1..n1).each{|i| freq[i]=1}
  n1+=1
  (n1..mn1).each{|i| freq[i]=0}
  work[1]=0
  xin=maxmn
  (2..minmn).each do |i|
    work[i]=0
    xin=xin+maxmn
    n1=xin+2
    l=1+xin.quo(2)
    k=i
    (1..l).each do |j|
      k=k+1
      n1=n1-1
      sum=freq[j]+work[j]
      freq[j]=sum
      work[k]=sum-freq[n1]
      freq[n1]=sum
    end
  end
  
  # Generate percentages for normal U
  dist=(1+max_u/2).to_i
  freq.shift
  total=freq.inject(0) {|a,v| a+v }
  (0...dist).collect {|i|
    if i!=max_u-i
      ues=freq[i]*2
    else
      ues=freq[i]
    end
    ues.quo(total)
  }
end

Public Instance Methods

probability_exact() click to toggle source

Exact probability of finding values of U lower or equal to sample on U distribution. Use with caution with m*n>100000. Uses ::u_sampling_distribution_as62

# File lib/statsample/test/umannwhitney.rb, line 162
def probability_exact
  dist=UMannWhitney.u_sampling_distribution_as62(@n1,@n2)
  sum=0
  (0..@u.to_i).each {|i|
    sum+=dist[i]
  }
  sum
end

probability_z() click to toggle source

Assuming H_0, the proportion of cdf with values of U lower than the sample, using normal approximation. Use with more than 30 cases per group.

# File lib/statsample/test/umannwhitney.rb, line 202
def probability_z
  (1-Distribution::Normal.cdf(z.abs()))*2
end

z() click to toggle source

Z value for U, with adjust for ties. For large samples, U is approximately normally distributed. In that case, you can use z to obtain probabily for U.

Reference: ¶ ↑

SPSS Manual

# File lib/statsample/test/umannwhitney.rb, line 187
def z
  mu=(@n1*@n2).quo(2)
  if(!@ties)
    ou=Math::sqrt(((@n1*@n2)*(@n1+@n2+1)).quo(12))
  else
    n=@n1+@n2
    first=(@n1*@n2).quo(n*(n-1))
    second=((n**3-n).quo(12))-@t
    ou=Math::sqrt(first*second)
  end
  (@u-mu).quo(ou)
end