# U Mann-Whitney test¶ ↑

Non-parametric test for assessing whether two independent samples of observations come from the same distribution.

## Assumptions¶ ↑

• The two samples under investigation in the test are independent of each other and the observations within each sample are independent.

• The observations are comparable (i.e., for any two observations, one can assess whether they are equal or, if not, which one is greater).

• The variances in the two groups are approximately equal.

Higher differences of distributions correspond to to lower values of U.

### Constants

MAX_MN_EXACT

Max for m*n allowed for exact calculation of probability

### Attributes

name[RW]

Name of test

r1[R]

Sample 1 Rank sum

r2[R]

Sample 2 Rank sum

t[R]

Value of compensation for ties (useful for demostration)

u[R]

U Value

u1[R]

Sample 1 U (useful for demostration)

u2[R]

Sample 2 U (useful for demostration)

### Public Class Methods

distribution_permutations(n1,n2) click to toggle source

Generate distribution for permutations. Very expensive, but useful for demostrations

```# File lib/statsample/test/umannwhitney.rb, line 78
def self.distribution_permutations(n1,n2)
base=[0]*n1+[1]*n2
po=Statsample::Permutation.new(base)

total=n1*n2
req={}
po.each do |perm|
r0,s0=0,0
perm.each_index {|c_i|
if perm[c_i]==0
r0+=c_i+1
s0+=1
end
}
u1=r0-((s0*(s0+1)).quo(2))
u2=total-u1
temp_u= (u1 <= u2) ? u1 : u2
req[perm]=temp_u
end
req
end```
new(v1,v2, opts=Hash.new) click to toggle source

Create a new U Mann-Whitney test Params: Two Statsample::Vectors

```# File lib/statsample/test/umannwhitney.rb, line 118
def initialize(v1,v2, opts=Hash.new)
@v1=v1
@v2=v2
@n1=v1.valid_data.size
@n2=v2.valid_data.size
data=(v1.valid_data+v2.valid_data).to_scale
groups=(([0]*@n1)+([1]*@n2)).to_vector
ds={'g'=>groups, 'data'=>data}.to_dataset
@t=nil
@ties=data.data.size!=data.data.uniq.size
if(@ties)
end
ds['ranked']=ds['data'].ranked(:scale)

@n=ds.cases

@r1=ds.filter{|r| r['g']==0}['ranked'].sum
@r2=((ds.cases*(ds.cases+1)).quo(2))-r1
@u1=r1-((@n1*(@n1+1)).quo(2))
@u2=r2-((@n2*(@n2+1)).quo(2))
@u=(u1<u2) ? u1 : u2
opts_default={:name=>_("Mann-Whitney's U")}
@opts=opts_default.merge(opts)
opts_default.keys.each {|k|
send("#{k}=", @opts[k])
}

end```
u_sampling_distribution_as62(n1,n2) click to toggle source

U sampling distribution, based on Dinneen & Blakesley (1973) algorithm. This is the algorithm used on SPSS.

Parameters:

• `n1`: group 1 size

• `n2`: group 2 size

## Reference: ¶ ↑

• Dinneen, L., & Blakesley, B. (1973). Algorithm AS 62: A Generator for the Sampling Distribution of the Mann- Whitney U Statistic. Journal of the Royal Statistical Society, 22(2), 269-273

```# File lib/statsample/test/umannwhitney.rb, line 31
def self.u_sampling_distribution_as62(n1,n2)

freq=[]
work=[]
mn1=n1*n2+1
max_u=n1*n2
minmn=n1<n2 ? n1 : n2
maxmn=n1>n2 ? n1 : n2
n1=maxmn+1
(1..n1).each{|i| freq[i]=1}
n1+=1
(n1..mn1).each{|i| freq[i]=0}
work[1]=0
xin=maxmn
(2..minmn).each do |i|
work[i]=0
xin=xin+maxmn
n1=xin+2
l=1+xin.quo(2)
k=i
(1..l).each do |j|
k=k+1
n1=n1-1
sum=freq[j]+work[j]
freq[j]=sum
work[k]=sum-freq[n1]
freq[n1]=sum
end
end

# Generate percentages for normal U
dist=(1+max_u/2).to_i
freq.shift
total=freq.inject(0) {|a,v| a+v }
(0...dist).collect {|i|
if i!=max_u-i
ues=freq[i]*2
else
ues=freq[i]
end
ues.quo(total)
}
end```

### Public Instance Methods

probability_exact() click to toggle source

Exact probability of finding values of U lower or equal to sample on U distribution. Use with caution with m*n>100000. Uses ::u_sampling_distribution_as62

```# File lib/statsample/test/umannwhitney.rb, line 162
def probability_exact
dist=UMannWhitney.u_sampling_distribution_as62(@n1,@n2)
sum=0
(0..@u.to_i).each {|i|
sum+=dist[i]
}
sum
end```
probability_z() click to toggle source

Assuming H_0, the proportion of cdf with values of U lower than the sample, using normal approximation. Use with more than 30 cases per group.

```# File lib/statsample/test/umannwhitney.rb, line 202
def probability_z
(1-Distribution::Normal.cdf(z.abs()))*2
end```
z() click to toggle source

Z value for U, with adjust for ties. For large samples, U is approximately normally distributed. In that case, you can use z to obtain probabily for U.

## Reference: ¶ ↑

```# File lib/statsample/test/umannwhitney.rb, line 187
def z
mu=(@n1*@n2).quo(2)
if(!@ties)
ou=Math::sqrt(((@n1*@n2)*(@n1+@n2+1)).quo(12))
else
n=@n1+@n2
first=(@n1*@n2).quo(n*(n-1))
second=((n**3-n).quo(12))-@t
ou=Math::sqrt(first*second)
end
(@u-mu).quo(ou)
end```