Statsample¶ ↑
Homepage :: github.com/clbustos/statsample
DESCRIPTION¶ ↑
A suite for basic and advanced statistics on Ruby. Tested on Ruby 2.1.1p76 (June 2014), 1.8.7, 1.9.1, 1.9.2 (April, 2010), ruby-head(June, 2011) and JRuby 1.4 (Ruby 1.8.7 compatible).
Include: * Descriptive statistics: frequencies, median, mean, standard
error, skew, kurtosis (and many others). * Imports and exports datasets
from and to Excel, CSV and plain text files. * Correlations: Pearson's
r, Spearman's rank correlation (rho), point biserial, tau a, tau b and
gamma. Tetrachoric and Polychoric correlation provides by
statsample-bivariate-extension gem. * Intra-class correlation
* Anova: generic and vector-based One-way ANOVA and Two-way ANOVA, with
contrasts for One-way ANOVA. * Tests: F, T, Levene, U-Mannwhitney. *
Regression: Simple, Multiple (OLS), Probit and Logit * Factorial Analysis:
Extraction (PCA and Principal Axis), Rotation (Varimax, Equimax, Quartimax)
and Parallel Analysis and Velicer's MAP test, for estimation of number
of factors. * Reliability analysis for simple scale and a DSL to easily
analyze multiple scales using factor analysis and correlations, if you want
it. * Basic time series support * Dominance Analysis, with multivariate
dependent and bootstrap (Azen & Budescu) * Sample calculation related
formulas * Structural Equation Modeling (SEM), using R libraries
sem and OpenMx * Creates reports on text, html
and rtf, using ReportBuilder gem * Graphics: Histogram, Boxplot and
Scatterplot
Principles¶ ↑
- 
Software Design: 
- 
One module/class for each type of analysis 
- 
Options can be set as hash on initialize() or as setters methods 
- 
Clean API for interactive sessions 
- 
summary() returns all necessary informacion for interactive sessions 
- 
All statistical data available though methods on objects 
- 
All (important) methods should be tested. Better with random data. 
- 
Statistical Design 
- 
Results are tested against text results, SPSS and R outputs. 
- 
Go beyond Null Hiphotesis Testing, using confidence intervals and effect sizes when possible 
- 
(When possible) All references for methods are documented, providing sensible information on documentation 
Features¶ ↑
- 
Classes for manipulation and storage of data: 
- 
Statsample::Vector: An extension of an array, with statistical methods like sum, mean and standard deviation 
- 
Statsample::Dataset: a group of Statsample::Vector, analog to a excel spreadsheet or a dataframe on R. The base of almost all operations on statsample. 
- 
Statsample::Multiset: multiple datasets with same fields and type of vectors 
- 
Anova module provides generic Statsample::Anova::OneWay and vector based Statsample::Anova::OneWayWithVectors. Also you can create contrast using Statsample::Anova::Contrast 
- 
Module Statsample::Bivariate provides covariance and pearson, spearman, point biserial, tau a, tau b, gamma, tetrachoric (see Bivariate::Tetrachoric) and polychoric (see Bivariate::Polychoric) correlations. Include methods to create correlation and covariance matrices 
- 
Multiple types of regression. 
- 
Simple Regression : Statsample::Regression::Simple 
- 
Multiple Regression: Statsample::Regression::Multiple 
- 
Logit Regression: Statsample::Regression::Binomial::Logit 
- 
Probit Regression: Statsample::Regression::Binomial::Probit 
- 
Factorial Analysis algorithms on Statsample::Factor module. 
- 
Classes for Extraction of factors: 
- 
Classes for Rotation of factors: 
- 
Classes for calculation of factors to retain - 
Statsample::Factor::ParallelAnalysis performs Horn's 'parallel analysis' to a principal components analysis to adjust for sample bias in the retention of components. 
- 
Statsample::Factor::MAP performs Velicer's Minimum Average Partial (MAP) test, which retain components as long as the variance in the correlation matrix represents systematic variance. 
 
- 
- 
Dominance Analysis. Based on Budescu and Azen papers, dominance analysis is a method to analyze the relative importance of one predictor relative to another on multiple regression 
- 
Statsample::DominanceAnalysis class can report dominance analysis for a sample, using uni or multivariate dependent variables 
- 
Statsample::DominanceAnalysis::Bootstrap can execute bootstrap analysis to determine dominance stability, as recomended by Azen & Budescu (2003) link. 
- 
Module Statsample::Codification, to help to codify open questions 
- 
Converters to import and export data: 
- 
Statsample::Database : Can create sql to create tables, read and insert data 
- 
Statsample::CSV : Read and write CSV files 
- 
Statsample::Excel : Read and write Excel files 
- 
Statsample::Mx : Write Mx Files 
- 
Statsample::GGobi : Write Ggobi files 
- 
Module Statsample::Crosstab provides function to create crosstab for categorical data 
- 
Module Statsample::Reliability provides functions to analyze scales with psychometric methods. 
- 
Class Statsample::Reliability::ScaleAnalysis provides statistics like mean, standard deviation for a scale, Cronbach's alpha and standarized Cronbach's alpha, and for each item: mean, correlation with total scale, mean if deleted, Cronbach's alpha is deleted. 
- 
Class Statsample::Reliability::MultiScaleAnalysis provides a DSL to easily analyze reliability of multiple scales and retrieve correlation matrix and factor analysis of them. 
- 
Class Statsample::Reliability::ICC provides intra-class correlation, using Shrout & Fleiss(1979) and McGraw & Wong (1996) formulations. 
- 
Module Statsample::SRS (Simple Random Sampling) provides a lot of functions to estimate standard error for several type of samples 
- 
Module Statsample::Test provides several methods and classes to perform inferencial statistics 
- 
Statsample::Test::ChiSquare 
- 
Statsample::Test::KolmogorovSmirnov (only D value) 
- 
Module Graph provides several classes to create beautiful graphs using rubyvis 
- 
Gem +bio-statsample-timeseries- provides module Statsample::TimeSeries with support for time series, including ARIMA estimation using Kalman-Filter. 
- 
Gem statsample-semprovides a DSL to R librariessemandOpenMx
- 
Close integration with gem reportbuilder, to easily create reports on text, html and rtf formats.
Examples of use:¶ ↑
See the examples folder too.
Boxplot¶ ↑
require 'statsample' ss_analysis(Statsample::Graph::Boxplot) do n=30 a=rnorm(n-1,50,10) b=rnorm(n, 30,5) c=rnorm(n,5,1) a.push(2) boxplot(:vectors=>[a,b,c], :width=>300, :height=>300, :groups=>%w{first first second}, :minimum=>0) end Statsample::Analysis.run # Open svg file on *nix application defined
Correlation matrix¶ ↑
require 'statsample' # Note R like generation of random gaussian variable # and correlation matrix ss_analysis("Statsample::Bivariate.correlation_matrix") do samples=1000 ds=data_frame( 'a'=>rnorm(samples), 'b'=>rnorm(samples), 'c'=>rnorm(samples), 'd'=>rnorm(samples)) cm=cor(ds) summary(cm) end Statsample::Analysis.run_batch # Echo output to console
Requirements¶ ↑
Optional:
- 
Plotting: gnuplot and rbgnuplot, SVG::Graph 
- 
Factorial analysis and polychorical correlation(joint estimate and polychoric series): gsl library and rb-gsl (rubygems.org/gems/rb-gsl/). You should install it using gem install rb-gsl.
Note: Use gsl 1.12.109 or later.
Resources¶ ↑
- 
Source code on github :: github.com/clbustos/statsample 
- 
Docs :: rubydoc.info/gems/statsample/ 
- 
Bug report and feature request :: github.com/clbustos/statsample/issues 
- 
E-mailing list :: groups.google.com/group/statsample 
Installation¶ ↑
$ sudo gem install statsample
On *nix, you should install statsample-optimization to retrieve gems gsl, statistics2 and a C extension to speed some methods.
There are available precompiled version for Ruby 1.9 on x86, x86_64 and mingw32 archs.
$ sudo gem install statsample-optimization
If you use Ruby 1.8, you should compile statsample-optimization, usign
parameter --platform ruby
$ sudo gem install statsample-optimization --platform ruby
If you need to work on Structural Equation Modeling, you could see
statsample-sem. You need R with sem or
OpenMx [openmx.psyc.virginia.edu/]
libraries installed
$ sudo gem install statsample-sem
Available setup.rb file
sudo gem ruby setup.rb
License¶ ↑
GPL-2 (See LICENSE.txt)