class Statsample::Vector

Collection of values on one dimension. Works as a column on a Spreadsheet.

Usage

The fast way to create a vector uses Statsample::VectorShorthands#to_vector or Statsample::VectorShorthands#to_scale.

v=[1,2,3,4].to_vector(:scale)
v=[1,2,3,4].to_scale

Collection of values on one dimension. Works as a column on a Spreadsheet.

Usage

The fast way to create a vector uses Statsample::VectorShorthands#to_vector or Statsample::VectorShorthands#to_scale.

v=[1,2,3,4].to_vector(:scale)
v=[1,2,3,4].to_scale

Attributes

data[R]

Original data.

data_with_nils[R]

Original data, with all missing values replaced by nils

date_data_with_nils[R]

Date date, with all missing values replaced by nils

labels[RW]

Change label for specific values

missing_data[R]

Missing values array

missing_values[R]

Array of values considered as missing. Nil is a missing value, by default

name[RW]

Name of vector. Should be used for output by many classes

today_values[R]

Array of values considered as “Today”, with date type. “NOW”, “TODAY”, :NOW and :TODAY are 'today' values, by default

type[R]

Level of measurement. Could be :nominal, :ordinal or :scale

valid_data[R]

Valid data. Equal to data, minus values assigned as missing values

Public Class Methods

[](*args) click to toggle source

Create a vector using (almost) any object

# File lib/statsample/vector.rb, line 104
def self.[](*args)
  values=[]
  args.each do |a|
    case a
    when Array
      values.concat a.flatten
    when Statsample::Vector
      values.concat a.to_a
    when Range
      values.concat  a.to_a
    else
      values << a
    end
  end
  vector=new(values)
  vector.type=:scale if vector.can_be_scale?
  vector
end
new(data=[], type=:nominal, opts=Hash.new) click to toggle source

Creates a new Vector object.

  • data Any data which can be converted on Array

  • type Level of meausurement. See #type

  • opts Hash of options

# File lib/statsample/vector.rb, line 71
def initialize(data=[], type=:nominal, opts=Hash.new)
  @data=data.is_a?(Array) ? data : data.to_a
  @type=type
  opts_default={
    :missing_values=>[],
    :today_values=>['NOW','TODAY', :NOW, :TODAY],
    :labels=>{},
    :name=>nil
  }
  @opts=opts_default.merge(opts)
  if  @opts[:name].nil?
    @@n_table||=0
    @@n_table+=1
    @opts[:name]="Vector #{@@n_table}"
  end
  @missing_values=@opts[:missing_values]
  @labels=@opts[:labels]
  @today_values=@opts[:today_values]
  @name=@opts[:name]
  @valid_data=[]
  @data_with_nils=[]
  @date_data_with_nils=[]
  @missing_data=[]
  @has_missing_data=nil
  @scale_data=nil
  set_valid_data
  self.type=type
end
new_scale(n,val=nil, &block) click to toggle source

Create a new scale type vector Parameters

n

Size

val

Value of each value

&block

If block provided, is used to set the values of vector

# File lib/statsample/vector.rb, line 127
def self.new_scale(n,val=nil, &block)
  if block
    vector=n.times.map {|i| block.call(i)}.to_scale
  else
    vector=n.times.map { val}.to_scale
  end
  vector.type=:scale
  vector
end

Public Instance Methods

*(v) click to toggle source
# File lib/statsample/vector.rb, line 424
def *(v)
  _vector_ari("*",v)
end
+(v) click to toggle source

Vector sum.

  • If v is a scalar, add this value to all elements

  • If v is a Array or a Vector, should be of the same size of this vector every item of this vector will be added to the value of the item at the same position on the other vector

# File lib/statsample/vector.rb, line 410
def +(v)
_vector_ari("+",v)
end
-(v) click to toggle source

Vector rest.

  • If v is a scalar, rest this value to all elements

  • If v is a Array or a Vector, should be of the same size of this vector every item of this vector will be rested to the value of the item at the same position on the other vector

# File lib/statsample/vector.rb, line 420
def -(v)
_vector_ari("-",v)
end
==(v2) click to toggle source

Vector equality. Two vector will be the same if their data, missing values, type, labels are equals

# File lib/statsample/vector.rb, line 221
def ==(v2)
  raise TypeError,"Argument should be a Vector" unless v2.instance_of? Statsample::Vector
  @data==v2.data and @missing_values==v2.missing_values and @type==v2.type and @labels==v2.labels
end
[](i) click to toggle source

Retrieves i element of data

# File lib/statsample/vector.rb, line 367
def [](i)
  @data[i]
end
[]=(i,v) click to toggle source

Set i element of data. Note: Use #set_valid_data if you include missing values

# File lib/statsample/vector.rb, line 372
def []=(i,v)
  @data[i]=v
end
add(v,update_valid=true) click to toggle source

Add a value at the end of the vector. If second argument set to false, you should update the Vector usign #set_valid_data at the end of your insertion cycle

# File lib/statsample/vector.rb, line 287
def add(v,update_valid=true)
  @data.push(v)
  set_valid_data if update_valid
end
adp( m = nil )
average_deviation_population( m = nil ) click to toggle source

Population average deviation (denominator N) author: Al Chou

# File lib/statsample/vector.rb, line 947
def average_deviation_population( m = nil )
  check_type :scale
  m ||= mean
  ( @scale_data.inject( 0 ) { |a, x| ( x - m ).abs + a } ).quo( n_valid )
end
Also aliased as: adp, adp
bootstrap(estimators, nr, s=nil) click to toggle source

Bootstrap

Generate nr resamples (with replacement) of size s from vector, computing each estimate from estimators over each resample. estimators could be a) Hash with variable names as keys and lambdas as values

a.bootstrap(:log_s2=>lambda {|v| Math.log(v.variance)},1000)

b) Array with names of method to bootstrap

a.bootstrap([:mean, :sd],1000)

c) A single method to bootstrap

a.jacknife(:mean, 1000)

If s is nil, is set to vector size by default.

Returns a dataset where each vector is an vector of length nr containing the computed resample estimates.

# File lib/statsample/vector.rb, line 538
def bootstrap(estimators, nr, s=nil)
  s||=n
  
  h_est, es, bss= prepare_bootstrap(estimators)
  
 
  nr.times do |i|
    bs=sample_with_replacement(s)
    es.each do |estimator|          
      # Add bootstrap
      bss[estimator].push(h_est[estimator].call(bs))
    end
  end
  
  es.each do |est|
    bss[est]=bss[est].to_scale
    bss[est].type=:scale
  end
  bss.to_dataset
  
end
can_be_date?() click to toggle source

Return true if all data is Date, “today” values or nil

# File lib/statsample/vector.rb, line 692
def can_be_date?
if @data.find {|v|       
!v.nil? and !v.is_a? Date and !v.is_a? Time and (v.is_a? String and !@today_values.include? v) and (v.is_a? String and !(v=~/\d{4,4}[-\/]\d{1,2}[-\/]\d{1,2}/))}
  false
else
  true
end
end
can_be_scale?() click to toggle source

Return true if all data is Numeric or nil

# File lib/statsample/vector.rb, line 701
def can_be_scale?
  if @data.find {|v| !v.nil? and !v.is_a? Numeric and !@missing_values.include? v}
    false
  else
    true
  end
end
centered()
Alias for: vector_centered
check_type(t) click to toggle source

Raises an exception if type of vector is inferior to t type

# File lib/statsample/vector.rb, line 150
def check_type(t)
  Statsample::STATSAMPLE__.check_type(self,t)
end
coefficient_of_variation() click to toggle source

Coefficient of variation Calculed with the sample standard deviation

# File lib/statsample/vector.rb, line 1019
def coefficient_of_variation
    check_type :scale
    standard_deviation_sample.quo(mean)
end
Also aliased as: cov, cov
count(x=false) { |i| ... } click to toggle source

Retrieves number of cases which comply condition. If block given, retrieves number of instances where block returns true. If other values given, retrieves the frequency for this value.

# File lib/statsample/vector.rb, line 665
def count(x=false)
if block_given?
  r=@data.inject(0) {|s, i|
    r=yield i
    s+(r ? 1 : 0)
  }
  r.nil? ? 0 : r
else
  frequencies[x].nil? ? 0 : frequencies[x]
end
end
cov()
db_type(dbs='mysql') click to toggle source

Returns the database type for the vector, according to its content

# File lib/statsample/vector.rb, line 679
def db_type(dbs='mysql')
# first, detect any character not number
if @data.find {|v|  v.to_s=~/\d{2,2}-\d{2,2}-\d{4,4}/} or @data.find {|v|  v.to_s=~/\d{4,4}-\d{2,2}-\d{2,2}/}
  return "DATE"
elsif @data.find {|v|  v.to_s=~/[^0-9e.-]/ }
  return "VARCHAR (255)"
elsif @data.find {|v| v.to_s=~/\./}
  return "DOUBLE"
else
  return "INTEGER"
end
end
dichotomize(low=nil) click to toggle source

Dicotomize the vector with 0 and 1, based on lowest value If parameter if defined, this value and lower will be 0 and higher, 1

# File lib/statsample/vector.rb, line 257
def dichotomize(low=nil)
  fs=factors
  low||=factors.min
  @data_with_nils.collect{|x|
    if x.nil?
      nil
    elsif x>low
      1
    else
      0
    end
  }.to_scale
end
dup() click to toggle source

Creates a duplicate of the Vector. Note: data, #missing_values and labels are duplicated, so changes on original vector doesn't propages to copies.

# File lib/statsample/vector.rb, line 139
def dup
  Vector.new(@data.dup,@type, :missing_values => @missing_values.dup, :labels => @labels.dup, :name=>@name)
end
dup_empty() click to toggle source

Returns an empty duplicate of the vector. Maintains the type, missing values and labels.

# File lib/statsample/vector.rb, line 144
def dup_empty
  Vector.new([],@type, :missing_values => @missing_values.dup, :labels => @labels.dup, :name=> @name)
end
each() { |x| ... } click to toggle source

Iterate on each item. Equivalent to

@data.each{|x| yield x}
# File lib/statsample/vector.rb, line 273
def each
  @data.each{|x| yield(x) }
end
each_index() { |i| ... } click to toggle source

Iterate on each item, retrieving index

# File lib/statsample/vector.rb, line 278
def each_index
(0...@data.size).each {|i|
  yield(i)
}
end
factors() click to toggle source

Retrieves uniques values for data.

# File lib/statsample/vector.rb, line 726
def factors
  if @type==:scale
    @scale_data.uniq.sort
  elsif @type==:date
    @date_data_with_nils.uniq.sort
  else
    @valid_data.uniq.sort
  end
end
flawed?()
Alias for: has_missing_data?
frequencies() click to toggle source

Returns a hash with the distribution of frecuencies for the sample

# File lib/statsample/vector.rb, line 738
def frequencies
  Statsample::STATSAMPLE__.frequencies(@valid_data)
end
has_missing_data?() click to toggle source

Retrieves true if data has one o more missing values

# File lib/statsample/vector.rb, line 338
def has_missing_data?
  @has_missing_data
end
Also aliased as: flawed?, flawed?
histogram(bins=10) click to toggle source

With a fixnum, creates X bins within the range of data With an Array, each value will be a cut point

# File lib/statsample/vector.rb, line 994
def histogram(bins=10)
  check_type :scale
  
  if bins.is_a? Array
    #h=Statsample::Histogram.new(self, bins)
    h=Statsample::Histogram.alloc(bins)                        
  else
    # ugly patch. The upper limit for a bin has the form
    # x < range
    #h=Statsample::Histogram.new(self, bins)
    min,max=Statsample::Util.nice(@valid_data.min,@valid_data.max)
    # fix last data
    if max==@valid_data.max
      max+=1e-10
    end
    h=Statsample::Histogram.alloc(bins,[min,max])
    # Fix last bin

  end
  h.increment(@valid_data)
  h
end
inspect() click to toggle source
# File lib/statsample/vector.rb, line 722
def inspect
  self.to_s
end
is_valid?(x) click to toggle source

Return true if a value is valid (not nil and not included on missing values)

# File lib/statsample/vector.rb, line 376
def is_valid?(x)
  !(x.nil? or @missing_values.include? x)
end
jacknife(estimators, k=1) click to toggle source

Jacknife

Returns a dataset with jacknife delete-k estimators estimators could be: a) Hash with variable names as keys and lambdas as values

a.jacknife(:log_s2=>lambda {|v| Math.log(v.variance)})

b) Array with method names to jacknife

a.jacknife([:mean, :sd])

c) A single method to jacknife

a.jacknife(:mean)

k represent the block size for block jacknife. By default is set to 1, for classic delete-one jacknife.

Returns a dataset where each vector is an vector of length cases/k containing the computed jacknife estimates.

Reference:

  • Sawyer, S. (2005). Resampling Data: Using a Statistical Jacknife.

# File lib/statsample/vector.rb, line 577
def jacknife(estimators, k=1)
  raise "n should be divisible by k:#{k}" unless n%k==0
  
  nb=(n / k).to_i
  
  
  h_est, es, ps= prepare_bootstrap(estimators)

  est_n=es.inject({}) {|h,v|
    h[v]=h_est[v].call(self)
    h
  }
  
  
  nb.times do |i|
    other=@data_with_nils.dup
    other.slice!(i*k,k)
    other=other.to_scale
    es.each do |estimator|
      # Add pseudovalue
      ps[estimator].push( nb * est_n[estimator] - (nb-1) * h_est[estimator].call(other))
    end
  end
  
  
  es.each do |est|
    ps[est]=ps[est].to_scale
    ps[est].type=:scale
  end
  ps.to_dataset
end
kurtosis(m=nil) click to toggle source

Kurtosis of the sample

# File lib/statsample/vector.rb, line 978
def kurtosis(m=nil)
    check_type :scale
    m||=mean
    fo=@scale_data.inject(0){|a,x| a+((x-m)**4)}
    fo.quo((@scale_data.size)*sd(m)**4)-3
    
end
label(x)
Alias for: labeling
labeling(x) click to toggle source

Retrieves label for value x. Retrieves x if no label defined.

# File lib/statsample/vector.rb, line 345
def labeling(x)
  @labels.has_key?(x) ? @labels[x].to_s : x.to_s
end
Also aliased as: label, label
mad()
max() click to toggle source

Maximum value

# File lib/statsample/vector.rb, line 864
def max
  check_type :ordinal
  @valid_data.max
end
mean() click to toggle source

The arithmetical mean of data

# File lib/statsample/vector.rb, line 910
def mean
  check_type :scale
  sum.to_f.quo(n_valid)
end
median() click to toggle source

Return the median (percentil 50)

# File lib/statsample/vector.rb, line 854
def median
  check_type :ordinal
  percentil(50)
end
median_absolute_deviation() click to toggle source
# File lib/statsample/vector.rb, line 952
def median_absolute_deviation
  med=median
  recode {|x| (x-med).abs}.median
end
Also aliased as: mad, mad
min() click to toggle source

Minimun value

# File lib/statsample/vector.rb, line 859
def min 
  check_type :ordinal
  @valid_data.min
end
missing_values=(vals) click to toggle source

Set missing_values. #set_valid_data is called after changes

# File lib/statsample/vector.rb, line 381
def missing_values=(vals)
  @missing_values = vals
  set_valid_data
end
mode() click to toggle source

Returns the most frequent item.

# File lib/statsample/vector.rb, line 757
def mode
  frequencies.max{|a,b| a[1]<=>b[1]}.first
end
n()
Alias for: size
n_valid() click to toggle source

The numbers of item with valid data.

# File lib/statsample/vector.rb, line 761
def n_valid
  @valid_data.size
end
percentil(q) click to toggle source

Return the value of the percentil q

# File lib/statsample/vector.rb, line 832
def percentil(q)
  check_type :ordinal
  sorted=@valid_data.sort
  v= (n_valid * q).quo(100)
  if(v.to_i!=v)
    sorted[v.to_i]
  else
    (sorted[(v-0.5).to_i].to_f + sorted[(v+0.5).to_i]).quo(2)
  end
end
product() click to toggle source

Product of all values on the sample

# File lib/statsample/vector.rb, line 987
def product
    check_type :scale
    @scale_data.inject(1){|a,x| a*x }
end
proportion(v=1) click to toggle source

Proportion of a given value.

# File lib/statsample/vector.rb, line 773
def proportion(v=1)
    frequencies[v].quo(@valid_data.size)
end
proportion_confidence_interval_t(n_poblation,margin=0.95,v=1) click to toggle source
# File lib/statsample/vector.rb, line 813
def proportion_confidence_interval_t(n_poblation,margin=0.95,v=1)
  Statsample::proportion_confidence_interval_t(proportion(v), @valid_data.size, n_poblation, margin)
end
proportion_confidence_interval_z(n_poblation,margin=0.95,v=1) click to toggle source
# File lib/statsample/vector.rb, line 816
def proportion_confidence_interval_z(n_poblation,margin=0.95,v=1)
  Statsample::proportion_confidence_interval_z(proportion(v), @valid_data.size, n_poblation, margin)
end
proportions() click to toggle source

Returns a hash with the distribution of proportions of the sample.

# File lib/statsample/vector.rb, line 766
def proportions
    frequencies.inject({}){|a,v|
        a[v[0]] = v[1].quo(n_valid)
        a
    }
end
push(v) click to toggle source
# File lib/statsample/vector.rb, line 250
def push(v)
  @data.push(v)
  set_valid_data
end
range() click to toggle source

The range of the data (max - min)

# File lib/statsample/vector.rb, line 900
def range; 
  check_type :scale
  @scale_data.max - @scale_data.min
end
ranked(type=:ordinal) click to toggle source

Returns a ranked vector.

# File lib/statsample/vector.rb, line 843
def ranked(type=:ordinal)
  check_type :ordinal
  i=0
  r=frequencies.sort.inject({}){|a,v|
    a[v[0]]=(i+1 + i+v[1]).quo(2)
    i+=v[1]
    a
  }
  @data.collect {|c| r[c] }.to_vector(type)
end
recode(type=nil) { |x| ... } click to toggle source

Returns a new vector, with data modified by block. Equivalent to create a Vector after collect on data

# File lib/statsample/vector.rb, line 236
def recode(type=nil)
  type||=@type
  @data.collect{|x|
    yield x
  }.to_vector(type)
end
recode!() { |x| ... } click to toggle source

Modifies current vector, with data modified by block. Equivalent to collect! on @data

# File lib/statsample/vector.rb, line 244
def recode!
@data.collect!{|x|
  yield x
}
set_valid_data
end
report_building(b) click to toggle source
# File lib/statsample/vector.rb, line 776
def report_building(b)
  b.section(:name=>name) do |s|
    s.text _("n :%d") % n        
    s.text _("n valid:%d") % n_valid
    if @type==:nominal
      s.text  _("factors:%s") % factors.join(",") 
      s.text   _("mode: %s") % mode 
      
      s.table(:name=>_("Distribution")) do |t|
        frequencies.sort.each do |k,v|
          key=labels.has_key?(k) ? labels[k]:k
          t.row [key, v , ("%0.2f%%" % (v.quo(n_valid)*100))]
        end
      end
    end
    
    s.text _("median: %s") % median.to_s if(@type==:ordinal or @type==:scale)
    if(@type==:scale)
      s.text _("mean: %0.4f") % mean
      if sd
        s.text _("std.dev.: %0.4f") % sd
        s.text _("std.err.: %0.4f") % se
        s.text _("skew: %0.4f") % skew
        s.text _("kurtosis: %0.4f") % kurtosis
      end
    end
  end
end
sample_with_replacement(sample=1) click to toggle source

Returns an random sample of size n, with replacement, only with valid data.

In all the trails, every item have the same probability of been selected.

# File lib/statsample/vector.rb, line 639
def sample_with_replacement(sample=1)
  vds=@valid_data.size
  (0...sample).collect{ @valid_data[rand(vds)] }
end
sample_without_replacement(sample=1) click to toggle source

Returns an random sample of size n, without replacement, only with valid data.

Every element could only be selected once.

A sample of the same size of the vector is the vector itself.

# File lib/statsample/vector.rb, line 650
def sample_without_replacement(sample=1)
  raise ArgumentError, "Sample size couldn't be greater than n" if sample>@valid_data.size
  out=[]
  size=@valid_data.size
  while out.size<sample
    value=rand(size)
    out.push(value) if !out.include?value
  end
  out.collect{|i| @data[i]}
end
sd(m=nil)
sdp(m=nil)
sds(m=nil)
se()
Alias for: standard_error
set_valid_data() click to toggle source

Update #valid_data, #missing_data, #data_with_nils and gsl at the end of an insertion.

Use after #add(v,false) Usage:

v=Statsample::Vector.new
v.add(2,false)
v.add(4,false)
v.data
=> [2,3]
v.valid_data
=> []
v.set_valid_data
v.valid_data
=> [2,3]
# File lib/statsample/vector.rb, line 306
def set_valid_data
  @valid_data.clear
  @missing_data.clear
  @data_with_nils.clear
  @date_data_with_nils.clear
  set_valid_data_intern
  set_scale_data if(@type==:scale)
  set_date_data if(@type==:date)
end
size() click to toggle source

Size of total data

# File lib/statsample/vector.rb, line 361
def size
  @data.size
end
Also aliased as: n, n
skew(m=nil) click to toggle source

Skewness of the sample

# File lib/statsample/vector.rb, line 971
def skew(m=nil)
    check_type :scale
    m||=mean
    th=@scale_data.inject(0){|a,x| a+((x-m)**3)}
    th.quo((@scale_data.size)*sd(m)**3)
end
split_by_separator(sep=Statsample::SPLIT_TOKEN) click to toggle source

Returns a hash of Vectors, defined by the different values defined on the fields Example:

a=Vector.new(["a,b","c,d","a,b"])
a.split_by_separator
=>  {"a"=>#<Statsample::Type::Nominal:0x7f2dbcc09d88 
      @data=[1, 0, 1]>, 
     "b"=>#<Statsample::Type::Nominal:0x7f2dbcc09c48 
      @data=[1, 1, 0]>, 
    "c"=>#<Statsample::Type::Nominal:0x7f2dbcc09b08 
      @data=[0, 1, 1]>}
# File lib/statsample/vector.rb, line 493
def split_by_separator(sep=Statsample::SPLIT_TOKEN)
split_data=splitted(sep)
factors=split_data.flatten.uniq.compact
out=factors.inject({}) {|a,x|
  a[x]=[]
  a
}
split_data.each do |r|
  if r.nil?
    factors.each do |f|
      out[f].push(nil)
    end
  else
    factors.each do |f|
      out[f].push(r.include?(f) ? 1:0) 
    end
  end
end
out.inject({}){|s,v|
  s[v[0]]=Vector.new(v[1],:nominal)
  s
}
end
split_by_separator_freq(sep=Statsample::SPLIT_TOKEN) click to toggle source
# File lib/statsample/vector.rb, line 516
def split_by_separator_freq(sep=Statsample::SPLIT_TOKEN)
  split_by_separator(sep).inject({}) {|a,v|
    a[v[0]]=v[1].inject {|s,x| s+x.to_i}
    a
  }
end
splitted(sep=Statsample::SPLIT_TOKEN) click to toggle source

Return an array with the data splitted by a separator.

a=Vector.new(["a,b","c,d","a,b","d"])
a.splitted
  =>
[["a","b"],["c","d"],["a","b"],["d"]]
# File lib/statsample/vector.rb, line 469
def splitted(sep=Statsample::SPLIT_TOKEN)
@data.collect{|x|
  if x.nil?
    nil
  elsif (x.respond_to? :split)
    x.split(sep)
  else
    [x]
  end
}
end
ss(m=nil)
Alias for: sum_of_squares
standard_deviation_population(m=nil) click to toggle source

Population Standard deviation (denominator N)

# File lib/statsample/vector.rb, line 939
def standard_deviation_population(m=nil)
  check_type :scale
  Math::sqrt( variance_population(m) )
end
Also aliased as: sdp, sdp
standard_deviation_sample(m=nil) click to toggle source

Sample Standard deviation (denominator n-1)

# File lib/statsample/vector.rb, line 965
def standard_deviation_sample(m=nil)
    check_type :scale
    m||=mean
    Math::sqrt(variance_sample(m))
end
Also aliased as: sds, sd, sds, sd
standard_error() click to toggle source

Standard error of the distribution mean Calculated using sd/sqrt(n)

# File lib/statsample/vector.rb, line 1025
def standard_error
  standard_deviation_sample.quo(Math.sqrt(valid_data.size))
end
Also aliased as: se, se
standarized(use_population=false)
Alias for: vector_standarized
sum() click to toggle source

The sum of values for the data

# File lib/statsample/vector.rb, line 905
def sum
  check_type :scale
  @scale_data.inject(0){|a,x|x+a} ; 
end
sum_of_squared_deviation() click to toggle source

Sum of squared deviation

# File lib/statsample/vector.rb, line 924
def sum_of_squared_deviation
  check_type :scale
  @scale_data.inject(0) {|a,x| x.square+a} - (sum.square.quo(n_valid))
end
sum_of_squares(m=nil) click to toggle source

Sum of squares for the data around a value. By default, this value is the mean

ss= sum{(xi-m)^2}
# File lib/statsample/vector.rb, line 918
def sum_of_squares(m=nil)
  check_type :scale
  m||=mean
  @scale_data.inject(0){|a,x| a+(x-m).square}
end
Also aliased as: ss, ss
to_REXP() click to toggle source
# File lib/statsample/rserve_extension.rb, line 6
def to_REXP
  Rserve::REXP::Wrapper.wrap(data_with_nils)
end
to_a() click to toggle source
# File lib/statsample/vector.rb, line 396
def to_a
  if @data.is_a? Array
    @data.dup
  else
    @data.to_a
  end
end
Also aliased as: to_ary, to_ary
to_ary()
Alias for: to_a
to_matrix(dir=:horizontal) click to toggle source

Ugly name. Really, create a Vector for standard 'matrix' package. dir could be :horizontal or :vertical

# File lib/statsample/vector.rb, line 714
def to_matrix(dir=:horizontal)
  case dir
  when :horizontal
    Matrix[@data]
  when :vertical
    Matrix.columns([@data])
  end
end
to_s() click to toggle source
# File lib/statsample/vector.rb, line 709
def to_s
  sprintf("Vector(type:%s, n:%d)[%s]",@type.to_s,@data.size, @data.collect{|d| d.nil? ? "nil":d}.join(","))
end
today_values=(vals) click to toggle source

Set data considered as “today” on data vectors

# File lib/statsample/vector.rb, line 386
def today_values=(vals)
  @today_values = vals
  set_valid_data
end
type=(t) click to toggle source

Set level of measurement.

# File lib/statsample/vector.rb, line 391
def type=(t)
  @type=t   
  set_scale_data if(t==:scale)
  set_date_data if (t==:date)
end
variance(m=nil)
Alias for: variance_sample
variance_population(m=nil) click to toggle source

Population variance (denominator N)

# File lib/statsample/vector.rb, line 930
def variance_population(m=nil)
  check_type :scale
  m||=mean
  squares=@scale_data.inject(0){|a,x| x.square+a}
  squares.quo(n_valid) - m.square
end
variance_proportion(n_poblation, v=1) click to toggle source

Variance of p, according to poblation size

# File lib/statsample/vector.rb, line 806
def variance_proportion(n_poblation, v=1)
  Statsample::proportion_variance_sample(self.proportion(v), @valid_data.size, n_poblation)
end
variance_sample(m=nil) click to toggle source

Sample Variance (denominator n-1)

# File lib/statsample/vector.rb, line 958
def variance_sample(m=nil)
  check_type :scale
  m||=mean
  sum_of_squares(m).quo(n_valid - 1)
end
Also aliased as: variance, variance
variance_total(n_poblation, v=1) click to toggle source

Variance of p, according to poblation size

# File lib/statsample/vector.rb, line 810
def variance_total(n_poblation, v=1)
  Statsample::total_variance_sample(self.proportion(v), @valid_data.size, n_poblation)
end
vector_centered() click to toggle source

Return a centered vector

# File lib/statsample/vector.rb, line 184
def vector_centered
  check_type :scale
  m=mean
  return ([nil]*size).to_scale if mean.nil?
  vector=vector_centered_compute(m)
  vector.name=_("%s(centered)") % @name
  vector
end
Also aliased as: centered, centered
vector_labeled() click to toggle source

Returns a Vector with data with labels replaced by the label.

# File lib/statsample/vector.rb, line 350
def vector_labeled
  d=@data.collect{|x|
    if @labels.has_key? x
      @labels[x]
    else
      x
    end
  }
  Vector.new(d,@type)
end
vector_percentil() click to toggle source

Return a vector with values replaced with the percentiles of each values

# File lib/statsample/vector.rb, line 197
def vector_percentil
  check_type :ordinal
  c=@valid_data.size
  vector=ranked.map {|i| i.nil? ? nil : (i.quo(c)*100).to_f }.to_vector(@type)
  vector.name=_("%s(percentil)")  % @name
  vector
end
vector_standarized(use_population=false) click to toggle source

Return a vector usign the standarized values for data with sd with denominator n-1. With variance=0 or mean nil, returns a vector of equal size full of nils

# File lib/statsample/vector.rb, line 171
def vector_standarized(use_population=false)
  check_type :scale
  m=mean
  sd=use_population ? sdp : sds
  return ([nil]*size).to_scale if mean.nil? or sd==0.0 
  vector=vector_standarized_compute(m,sd)
  vector.name=_("%s(standarized)")  % @name
  vector
end
Also aliased as: standarized, standarized
verify() { |data)| ... } click to toggle source

Reports all values that doesn't comply with a condition. Returns a hash with the index of data and the invalid data.

# File lib/statsample/vector.rb, line 429
def verify
h={}
(0...@data.size).to_a.each{|i|
  if !(yield @data[i])
    h[i]=@data[i]
  end
}
h
end