Package 'regtools' reference manual

Title:	Regression and Classification Tools
Description:	Tools for linear, nonlinear and nonparametric regression and classification. Novel graphical methods for assessment of parametric models using nonparametric methods. One vs. All and All vs. All multiclass classification, optional class probabilities adjustment. Nonparametric regression (k-NN) for general dimension, local-linear option. Nonlinear regression with Eickert-White method for dealing with heteroscedasticity. Utilities for converting time series to rectangular form. Utilities for conversion between factors and indicator variables. Some code related to "Statistical Regression and Classification: from Linear Models to Machine Learning", N. Matloff, 2017, CRC, ISBN 9781498710916.
Authors:	Norm Matloff [aut, cre] , Robin Yancey [aut], Bochao Xin [ctb], Kenneth Lee [ctb], Rongkui Han [ctb]
Maintainer:	Norm Matloff <[email protected]>
License:	GPL (>= 2)
Version:	1.7.3
Built:	2025-03-22 06:17:25 UTC
Source:	https://github.com/matloff/regtools

Overview and Package Reference Guide

Description

This package provides a broad collection of functions useful for regression and classification analysis, and machine learning.

Function List

Parametric modeling:

nonlinear regression: nlshc
ridge regression: ridgelm, plot
missing values (also see our toweranNA package): lmac,makeNA,coef.lmac,vcov.lmac,pcac

Diagnostic plots:

regression diagnostics: parvsnonparplot, nonparvsxplot, nonparvarplot
other: boundaryplot, nonparvsxplot

Classification:

unbalanced data: classadjust (see UnbalancedClasses.md)
All vs. All: avalogtrn, avalogpred
k-NN reweighting: exploreExpVars, plotExpVars, knnFineTune

Machine learning (also see qeML package):

k-NN: kNN, kmin, knnest, knntrn, preprocessx, meany, vary, loclin, predict, kmin, pwplot, bestKperPoint, knnFineTune
neural networks: krsFit,multCol
advanced grid search: fineTuning, fineTuningPar, plot.tuner, knnFineTune
loss: l1, l2, MAPE, ROC

Dummies and R factors Utilities:

conversion between factors and dummies: dummiesToFactor, dummiesToInt, factorsToDummies, factorToDummies, factorTo012etc, dummiesToInt, hasFactors, charsToFactors, makeAllNumeric
dealing with superset and subsets of factors: toSuperFactor, toSubFactor

Statistics:

Matrix:

multCols, constCols

Time series:

convert rectangular to TS: TStoX

Text processing:

textToXY

Misc.:

scaling: mmscale, unscale
data frames: catDFRow, tabletofakedf
R: getNamedArgs, ulist
discretize

Records from several offerings of a certain course.

Description

The data are in the form of an R list. Each element of the list corresponds to one offering of the course. Fields are: Class level; major (two different computer science majors, LCSI in Letters and Science and ECSE in engineering); quiz grade average (scale of 4.0, A+ counting as 4.3); homework grade average (same scale); and course letter grade.

Pre-Euro Era Currency Fluctuations

Description

From Wai Mun Fong and Sam Ouliaris, "Spectral Tests of the Martingale Hypothesis for Exchange Rates", Journal of Applied Econometrics, Vol. 10, No. 3, 1995, pp. 255-271. Weekly exchange rates against US dollar, over the period 7 August 1974 to 29 March 1989.

Bike sharing data.

Description

This is the Bike Sharing dataset (day records only) from the UC Irvine Machine Learning Dataset Repository. Included here with permission of Dr. Hadi Fanaee.

The day data is as on UCI; day1 is modified so that the numeric weather variables are on their original scale.

The day2 is the same as day1, except that dteday has been removed, and season, mnth, weekday and weathersit have been converted to R factors.

See https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset for details.

English vocabulary data

Description

The Stanford WordBank data on vocabulary acquisition in young children. The file consists of about 5500 rows. (There are many NA values, though, and only about 2800 complete cases.) Variables are age, birth order, sex, mother's education and vocabulary size.

Factor Conversion Utilities

Description

Utilities from converting back and forth between factors and dummy variables.

Usage

xyDataframeToMatrix(xy)
dummiesToInt(dms,inclLast=FALSE)
factorToDummies(f,fname,omitLast=FALSE,factorInfo=NULL)
factorsToDummies(dfr,omitLast=FALSE,factorsInfo=NULL,dfOut=FALSE)
dummiesToFactor(dms,inclLast=FALSE) 
charsToFactors(dtaf)
factorTo012etc(f,earlierLevels = NULL)
discretize(x,endpts)
getDFclasses(dframe)
hasCharacters(dfr)
hasFactors(x)
toAllNumeric(w,factorsInfo=NULL)
toSubFactor(f,saveLevels,lumpedLevel="zzzOther")
toSuperFactor(inFactor,superLevels)
xyDataframeToMatrix(xy)
dummiesToInt(dms,inclLast=FALSE)
factorToDummies(f,fname,omitLast=FALSE,factorInfo=NULL)
factorsToDummies(dfr,omitLast=FALSE,factorsInfo=NULL,dfOut=FALSE)
dummiesToFactor(dms,inclLast=FALSE) 
charsToFactors(dtaf)
factorTo012etc(f,earlierLevels = NULL)
discretize(x,endpts)
getDFclasses(dframe)
hasCharacters(dfr)
hasFactors(x)
toAllNumeric(w,factorsInfo=NULL)
toSubFactor(f,saveLevels,lumpedLevel="zzzOther")
toSuperFactor(inFactor,superLevels)

Arguments

`dfOut`	If TRUE, return a data frame, otherwise a matrix.
`dms`	Matrix or data frame of dummy columns.
`inclLast`	When forming a factor from dummies, include the last dummy as a level if this is TRUE.
`xy`	A data frame mentioned for prediction, "Y" in last column.
`saveLevels`	In collapsing a factor, which levels to retain.
`lumpedLevel`	Name of new level to be created from levels not retained.
`x`	A numeric vector, except in `hasFactors`, where it is a data frame.
`endpts`	Vector to be used as `breaks` in call to `cut`. To avoid NAs, range of the vector must cover the range of the input vector.
`f`	A factor.
`inFactor`	Original factor, to be extended.
`superLevels`	New levels to be added to the original factor.
`earlierLevels`	Previous levels found for this factor.
`fname`	A factor name.
`dfr`	A data frame.
`w`	A data frame.
`dframe`	A data frame, for which we wish to find the column classes.
`omitLast`	If TRUE, then generate only k-1 dummies from k factor levels.
`factorsInfo`	Attribute from output of `factorsToDummies`.
`factorInfo`	Attribute from output of `factorToDummies`.
`dtaf`	A data frame.

Details

Many R users prefer to express categorical data as R factors, or often work with data that is of this type to begin with. On the other hand, many regression packages, e.g. lars, disallow factors. These utilities facilitate conversion from one form to another.

Here is an overview of the roles of the various functions:

factorToDummies: Convert one factor to dummies, yielding a matrix of dummies corresponding to that factor.
factorsToDummies: Convert all factors to dummies, yielding a matrix of dummies, corresponding to all factors in the input data frame.
dummiesToFactor: Convert a set of related dummies to a factor.
factorTo012etc: Convert a factor to a numeric code, starting at 0.
dummiesToInt: Convert a related set of dummies to a numeric code, starting at 0.
charsToFactors: Convert all character columns in a data frame to factors.
toAllNumeric: Convert all factors in a data frame to dummies, yielding a new version of the data frame, including its original nonfactor columns.
toSubFactor: Coalesce some levels of a factor, yielding a new factor.
toSuperFactor: Add levels to a factor. Typically used in prediction contexts, in which a factor in a data point to be predicted does not have all the levels of the same factor in the training set.

\item xyDataframeToMatrix: Given a data frame to be used in a training set, with "Y" a factor in the last column, change to all numeric, with dummies in place of all "X" factors and in place of the "Y" factor.

The optional argument factorsInfo is intended for use in prediction contexts. Typically a set of new cases will not have all levels of the factor in the training set. Without this argument, only an incomplete set of dummies would be generated for the set of new cases.

A key point about changing factors to dummies is that, for later prediction after fitting a model in our training set, one needs to use the same transformations. Say a factor has levels 'abc', 'de' and 'f' (and omitLast = FALSE). If we later have a set of say two new cases to predict, and their values for this factor are 'de' and 'f', we would generate dummies for them but not for 'abc', incompatible with the three dummies used in the training set.

Thus the factor names and levels are saved in attributes, and can be used as input: The relations are as follows:

factorsToDummies calls factorToDummies on each factor it finds in its input data frame
factorToDummies outputs and later inputs factorsInfo
factorsToDummies outputs and later inputs factorsInfo

Other functions:

getDFclasses: Return a vector of the classes of the columns of a data frame.
discretize: Partition range of a vector into (not necessarily equal-length) intervals, and construct a factor from the labels of the intervals that the input elements fall into.
hasCharacters, hasFactors: Logical scalars, TRUE if the input data frame has any character or factor columns.

Value

The function factorToDummies returns a matrix of dummy variables, while factorsToDummies returns a new version of the input data frame, in which each factor is replaced by columns of dummies. The function factorToDummies is similar, but changes character vectors to factors.

Author(s)

Norm Matloff

Examples

x <- factor(c('abc','de','f','de'))
xd <- factorToDummies(x,'x')  
xd 
#      x.abc x.de
# [1,]     1    0
# [2,]     0    1
# [3,]     0    0
# [4,]     0    1
# attr(,"factorInfo")
# attr(,"factorInfo")$fname
# [1] "x"
# 
# attr(,"factorInfo")$omitLast
# [1] TRUE
# 
# attr(,"factorInfo")$fullLvls
# [1] "abc" "de"  "f"  
w <- factor(c('de','abc','abc'))
wd <- factorToDummies(w,'x',factorInfo=attr(xd,'factorInfo')) 
wd 
#      x.abc x.de
# [1,]     0    1
# [2,]     1    0
# [3,]     1    0
# attr(,"factorInfo")
# attr(,"factorInfo")$fname
# [1] "x"
# 
# attr(,"factorInfo")$omitLast
# [1] TRUE
# 
# attr(,"factorInfo")$fullLvls
# [1] "abc" "de"  "f"  

x <- factor(c('abc','de','f','de'))
xd <- factorToDummies(x,'x')  
xd 
#      x.abc x.de
# [1,]     1    0
# [2,]     0    1
# [3,]     0    0
# [4,]     0    1
# attr(,"factorInfo")
# attr(,"factorInfo")$fname
# [1] "x"
# 
# attr(,"factorInfo")$omitLast
# [1] TRUE
# 
# attr(,"factorInfo")$fullLvls
# [1] "abc" "de"  "f"  
w <- factor(c('de','abc','abc'))
wd <- factorToDummies(w,'x',factorInfo=attr(xd,'factorInfo')) 
wd 
#      x.abc x.de
# [1,]     0    1
# [2,]     1    0
# [3,]     1    0
# attr(,"factorInfo")
# attr(,"factorInfo")$fname
# [1] "x"
# 
# attr(,"factorInfo")$omitLast
# [1] TRUE
# 
# attr(,"factorInfo")$fullLvls
# [1] "abc" "de"  "f"

Fall Detection Data

Description

Detection falls in the elderly via physiological measurements. Obtained from Kaggle, but is no longer there.

Grid Search Plus More

Description

Adds various extra features to grid search for specified tuning parameter/hyperparameter combinations: There is a plot() function, using parallel coordinates graphs to show trends among the different combinations; and Bonferroni confidence intervals are computed to avoid p-hacking. An experimental smoothing facility is also included.

Usage

fineTuning(dataset,pars,regCall,nCombs=NULL,specCombs=NULL,nTst=500,
   nXval=1,up=TRUE,k=NULL,dispOrderSmoothed=FALSE,
   showProgress=TRUE,...)
fineTuningMult(dataset,pars,regCall,nCombs=NULL,
   nTst=500,nXval=1,up=TRUE,k=NULL,dispOrderSmoothed=FALSE,
   showProgress=TRUE,outDim=1,...)
## S3 method for class 'tuner'
plot(x,...)
knnFineTune(data,yName,k,expandVars,ws,classif=FALSE,seed=9999)
fineTuningPar(cls,dataset,pars,regCall,nCombs=NULL,specCombs=NULL,
   nTst=500,nXval=1,up=TRUE,k=NULL,dispOrderSmoothed=FALSE)
fineTuning(dataset,pars,regCall,nCombs=NULL,specCombs=NULL,nTst=500,
   nXval=1,up=TRUE,k=NULL,dispOrderSmoothed=FALSE,
   showProgress=TRUE,...)
fineTuningMult(dataset,pars,regCall,nCombs=NULL,
   nTst=500,nXval=1,up=TRUE,k=NULL,dispOrderSmoothed=FALSE,
   showProgress=TRUE,outDim=1,...)
## S3 method for class 'tuner'
plot(x,...)
knnFineTune(data,yName,k,expandVars,ws,classif=FALSE,seed=9999)
fineTuningPar(cls,dataset,pars,regCall,nCombs=NULL,specCombs=NULL,
   nTst=500,nXval=1,up=TRUE,k=NULL,dispOrderSmoothed=FALSE)

Arguments

`...`	Arguments to be passed on by `fineTuning` or `plot.tuner`.
`x`	Output object from `fineTuning`.
`cls`	A `parallel` cluster.
`dataset`	Data frame etc. containing the data to be analyzed.
`data`	The data to be analyzed.
`yName`	Quoted name of "Y" in the column names of `data`.
`expandVars`	Indices of columns in `data` to be weighted in distance calculations.
`ws`	Weights to be used for `expandVars`.
`classif`	Set to TRUE for classification problems.
`seed`	Seed for random number generation.
`pars`	R list, showing the desired tuning parameter values.
`regCall`	Function to be called at each parameter combination, performing the model fit etc.
`nCombs`	Number of parameter combinations to run. If Null, all will be run

`nTst`	Number of data points to be in each holdout set.
`nXval`	Number of holdout sets/folds to be run for a given data partition and parameter combination.
`k`	Nearest-neighbor smoothing parameter.
`up`	If TRUE, display results in ascending order of performance value.
`dispOrderSmoothed`	Display in order of smoothed results.
`showProgress`	If TRUE, print each output line as it becomes ready.
`specCombs`	A data frame in which the user specifies hyperparameter parameter combinations to evaluate.
`outDim`	Number of components in the value returned by `theCall`.

Details

The user specifies the values for each tuning parameter in pars. This leads to a number of possible combinations of the parameters. In many cases, there are more combinations than the user wishes to try, so nCombs of them will be chosen at random.

For each combination, the function will run the analysis specified by the user in regCall. The latter must have the call form

ftnName(dtrn,dtst,cmbi

Again, note that it is fineTuning that calls this function. It will provide the training and test sets dtrn and dtst, as well as cmbi ("combination i"), the particular parameter combination to be run at this moment.

Each chosen combination is run in nXval folds. All specified combinations are run fully, as opposed to a directional "hill descent" search that hopes it might eliminate poor combinations early in the process.

The function knnFineTune is a wrapper for fineTuning for k-NN problems.

The function plot.tuner draws a parallel coordinates plot to visualize the grid. The argument x is the output of fineTuning. Arguments to specify in the ellipsis are: col is the column to be plotted; disp is the number to display, with 0, -m and +m meaning cases with the m smallest 'smoothed' values, all cases and the m largest values of 'smoothed', respectively; jit avoids plotting coincident lines by adding jitter in the amount jit * range(x) * runif(n,-0.5,0.5).

Value

Object of class **”tuner'**. Contains the grid results, including upper bounds of approximate one-sided 95 univariate and Bonferroni-Dunn (adjusted for the number of parameter combinations).

Author(s)

Norm Matloff

Examples


# mlb data set, predict weight using k-NN, try various values of k

tc <- function(dtrn,dtst,cmbi,...)
{
   knnout <- kNN(dtrn[,-10],dtrn[,10],dtst[,-10],as.integer(cmbi[1]))
   preds <- knnout$regests
   mean(abs(preds - dtst[,10]))
}

data(mlb)
mlb <- mlb[,3:6]
mlb.d <- factorsToDummies(mlb)
fineTuning(mlb.d,list(k=c(5,25)),tc,nTst=100,nXval=2)

tc <- function(dtrn,dtst,cmbi,...)
{
   knnout <- kNN(dtrn[,-10],dtrn[,10],dtst[,-10],as.integer(cmbi[1]))
   preds <- knnout$regests
   mean(abs(preds - dtst[,10]))
}

fineTuningMult(mlb.d,list(k=c(5,25)),tc,nTst=100,nXval=2) 

## Not run: 
library(qeML)
data(svcensus)
tc1 <- function(dtrn,dtst,cmbi,...)
{
   knnout <- qeKNN(dtrn,'wageinc',as.integer(cmbi[1]),holdout=NULL)
   preds <- predict(knnout,dtst[,-4])
   mape <- mean(abs(preds - dtst[,4]))
   bigprobs75 <- mean(preds > 75000)
   c(mape,bigprobs75)
}

fineTuningMult(svcensus,list(k = c(10,25)),tc1,outDim=2)

## End(Not run)

# mlb data set, predict weight using k-NN, try various values of k

tc <- function(dtrn,dtst,cmbi,...)
{
   knnout <- kNN(dtrn[,-10],dtrn[,10],dtst[,-10],as.integer(cmbi[1]))
   preds <- knnout$regests
   mean(abs(preds - dtst[,10]))
}

data(mlb)
mlb <- mlb[,3:6]
mlb.d <- factorsToDummies(mlb)
fineTuning(mlb.d,list(k=c(5,25)),tc,nTst=100,nXval=2)

tc <- function(dtrn,dtst,cmbi,...)
{
   knnout <- kNN(dtrn[,-10],dtrn[,10],dtst[,-10],as.integer(cmbi[1]))
   preds <- knnout$regests
   mean(abs(preds - dtst[,10]))
}

fineTuningMult(mlb.d,list(k=c(5,25)),tc,nTst=100,nXval=2) 

## Not run: 
library(qeML)
data(svcensus)
tc1 <- function(dtrn,dtst,cmbi,...)
{
   knnout <- qeKNN(dtrn,'wageinc',as.integer(cmbi[1]),holdout=NULL)
   preds <- predict(knnout,dtst[,-4])
   mape <- mean(abs(preds - dtst[,4]))
   bigprobs75 <- mean(preds > 75000)
   c(mape,bigprobs75)
}

fineTuningMult(svcensus,list(k = c(10,25)),tc1,outDim=2)

## End(Not run)

k-NN Nonparametric Regression and Classification

Description

Full set of tools for k-NN regression and classification, including both for direct usage and as tools for assessing the fit of parametric models.

Usage

kNN(x,y,newx=x,kmax,scaleX=TRUE,PCAcomps=0,expandVars=NULL,expandVals=NULL,
   smoothingFtn=mean,allK=FALSE,leave1out=FALSE, classif=FALSE,
   startAt1=TRUE,saveNhbrs=FALSE,savedNhbrs=NULL)
knnest(y,xdata,k,nearf=meany)
preprocessx(x,kmax,xval=FALSE)
meany(nearIdxs,x,y,predpt) 
mediany(nearIdxs,x,y,predpt) 
vary(nearIdxs,x,y,predpt) 
loclin(nearIdxs,x,y,predpt) 
## S3 method for class 'knn'
predict(object,...)
kmin(y,xdata,lossftn=l2,nk=5,nearf=meany) 
parvsnonparplot(lmout,knnout,cex=1.0) 
nonparvsxplot(knnout,lmout=NULL) 
nonparvarplot(knnout,returnPts=FALSE)
l2(y,muhat)
l1(y,muhat)
MAPE(yhat,y)
bestKperPoint(x,y,maxK,lossFtn="MAPE",classif=FALSE)
kNNallK(x,y,newx=x,kmax,scaleX=TRUE,PCAcomps=0,
   expandVars=NULL,expandVals=NULL,smoothingFtn=mean,
   allK=FALSE,leave1out=FALSE,classif=FALSE,startAt1=TRUE)
kNNxv(x,y,k,scaleX=TRUE,PCAcomps=0,smoothingFtn=mean,
   nSubSam=500)
knnest(y,xdata,k,nearf=meany)
loclogit(nearIdxs,x,y,predpt)
mediany(nearIdxs,x,y,predpt) 
exploreExpVars(xtrn, ytrn, xtst, ytst, k, eVar, maxEVal, lossFtn, 
    eValIncr = 0.05, classif = FALSE, leave1out = FALSE) 
plotExpVars(xtrn,ytrn,xtst,ytst,k,eVars,maxEVal,lossFtn,
   ylim,eValIncr=0.05,classif=FALSE,leave1out=FALSE)
kNN(x,y,newx=x,kmax,scaleX=TRUE,PCAcomps=0,expandVars=NULL,expandVals=NULL,
   smoothingFtn=mean,allK=FALSE,leave1out=FALSE, classif=FALSE,
   startAt1=TRUE,saveNhbrs=FALSE,savedNhbrs=NULL)
knnest(y,xdata,k,nearf=meany)
preprocessx(x,kmax,xval=FALSE)
meany(nearIdxs,x,y,predpt) 
mediany(nearIdxs,x,y,predpt) 
vary(nearIdxs,x,y,predpt) 
loclin(nearIdxs,x,y,predpt) 
## S3 method for class 'knn'
predict(object,...)
kmin(y,xdata,lossftn=l2,nk=5,nearf=meany) 
parvsnonparplot(lmout,knnout,cex=1.0) 
nonparvsxplot(knnout,lmout=NULL) 
nonparvarplot(knnout,returnPts=FALSE)
l2(y,muhat)
l1(y,muhat)
MAPE(yhat,y)
bestKperPoint(x,y,maxK,lossFtn="MAPE",classif=FALSE)
kNNallK(x,y,newx=x,kmax,scaleX=TRUE,PCAcomps=0,
   expandVars=NULL,expandVals=NULL,smoothingFtn=mean,
   allK=FALSE,leave1out=FALSE,classif=FALSE,startAt1=TRUE)
kNNxv(x,y,k,scaleX=TRUE,PCAcomps=0,smoothingFtn=mean,
   nSubSam=500)
knnest(y,xdata,k,nearf=meany)
loclogit(nearIdxs,x,y,predpt)
mediany(nearIdxs,x,y,predpt) 
exploreExpVars(xtrn, ytrn, xtst, ytst, k, eVar, maxEVal, lossFtn, 
    eValIncr = 0.05, classif = FALSE, leave1out = FALSE) 
plotExpVars(xtrn,ytrn,xtst,ytst,k,eVars,maxEVal,lossFtn,
   ylim,eValIncr=0.05,classif=FALSE,leave1out=FALSE)

Arguments

`nearf`	Function to be applied to a neighborhood.
`ylim`	Range of Y values for plot.
`lossFtn`	Loss function for plot.
`eVar`	Variable to be expanded.
`eVars`	Variables to be expanded.
`maxEVal`	Maximum expansion value.
`eValIncr`	Increment in range of expansion value.
`xtrn`	Training set for X.
`ytrn`	Training set for Y.
`xtst`	Test set for X.
`ytst`	Test set for Y.
`nearIdxs`	Indices of the neighbors.
`nSubSam`	Number of folds.
`x`	"X" data, predictors, one row per data point, in the training set.
`y`	Response variable data in the training set. Vector or matrix, the latter case for vector-valued response, e.g. multiclass classification. In that case, can be a vector, either (0,1,2,...,) or (1,2,3,...), which automatically is converted into a matrix of dummies.
`newx`	New data points to be predicted. If NULL in `kNN`, compute regression functions estimates on `x` and save for future prediction with `predict.kNN`
`scaleX`	If TRUE, call `scale` on `x` and `newx`
`PCAcomps`	If positive, transform `x` and `newx` by PCA, using the top `PCAcomps` principal components. Disabled.
`expandVars`	Indices of columns in `x` to expand.
`expandVals`	The corresponding expansion values.
`smoothingFtn`	Function to apply to the "Y" values in the set of nearest neighbors. Built-in choices are `meany`, `mediany`, `vary` and `loclin`.
`allK`	If TRUE, find regression estimates for all `k` through `kmax`. Currently disabled.
`leave1out`	If TRUE, omit the 1-nearest neighbor from analysis
`classif`	If TRUE, compute the predicted class labels, not just the regression function values
`startAt1`	If TRUE, class labels start at 1, else 0.
`k`	Number of nearest neighbors
`saveNhbrs`	If TRUE, place output of `FNN::get.knnx` into `nhbrs` of component in return value
`savedNhbrs`	If non-NULL, this is the `nhbrs` component in the return value of a previous call; `newx` must be the same in both calls
`...`	Needed for consistency with generic. See Details below for 'arguments.
`xdata`	X and associated neighbor indices. Output of `preprocessx`.
`object`	Output of `knnest`.
`predpt`	One point on which to predict, as a vector.
`kmax`	Maximal number of nearest neighbors to find.
`maxK`	Maximal number of nearest neighbors to find.
`xval`	Cross-validation flag. If TRUE, then the set of nearest neighbors of a point will not include the point itself.
`lossftn`	Loss function to be used in cross-validation determination of "best" `k`.
`nk`	Number of values of `k` to try in cross-validation.
`lmout`	Output of `lm`.
`knnout`	Output of `knnest`.
`cex`	R parameter to control dot size in plot.
`muhat`	Vector of estimated regression function values.
`yhat`	Vector of estimated regression function values.
`returnPts`	If TRUE, return matrix of plotted points.

Details

The kNN function is the main tool here; knnest is being deprecated. (Note too qeKNN, a wrapper for kNN; more on this below.) Here are the capabilities:

In its most basic form, the function will input training data and output predictions for new cases newx. By default this is done for a single value of the number k of nearest neighbors, but by setting allK to TRUE, the user can request that it be done for all k through the specified maximum.

In the second form, newx is set to NULL in the call to kNN. No predictions are made; instead, the regression function is estimated on all data points in x, which are saved in the return value. Future new cases can then be predicted from this saved object, via predict.kNN (called via the generic predict). The call form is predict(knnout,newx,newxK, with a default value of 1 for newxK.

In this second form, the closest k points to the newx in x are determined as usual, but instead of averaging their Y values, the average is taken over the fitted regression estimates at those points. In this manner, there is almost no computational cost in the prediction stage.

The second form is intended more for production use, so that neighbor distances need not be repeatedly recomputed.

Nearest-neighbor computation can be time-consuming. If more than one value of k is anticipated, for the same x, y and newx, first run with the largest anticipated value of k, with saveNhbrs set to TRUE. Then for other values of k, set savedNhbrs to the nhbrs component in the return value of the first call.

In addition, a novel feature allows the user to weight some predictors more than others. This is done by scaling the given predictor up or down, according to a specified value. Normally, this should be done with scaleX = TRUE, which applies scale() to the data. In other words, first we create a "level playing field" in which all predictors have standard deviation 1.0, then scale some of them up or down.

Alternatives are provided to calculating the mean Y in the given neighborhood, such as the median and the variance, the latter of possible use in dealing with heterogeneity in linear models.

Another choice of note is to allow local-linear smoothing, by setting smoothingFtn to loclin. Here the value of the regression function at a point is predicted from a linear fit to the point's neighbors. This may be especially helpful to counteract bias near the edges of the data. As in any regression fit, the number of predictors should be considerably less than the number of neighbors.

Custom functions for smoothing can easily be written, say following the pattern of loclin.

The main alternative to kNN is qeKNN in the qe* ("quick and easy") series. It is more convenient, e.g. allowing factor inputs, but less flexible.

The functions ovaknntrn and ovaknnpred are multiclass wrappers for knnest and knnpred, thus also deprecated. Here y is coded 0,1,...,m-1 for the m classes.

The tools here can be useful for fit assessment of parametric models. The parvsnonparplot function plots fitted values of parameteric model vs. kNN fitted, nonparvsxplot k-NN fitted values against each predictor, one by one.

The functions l2 and l1 are used to define L2 and L1 loss.

Author(s)

Norm Matloff

Examples


x <- rbind(c(1,0),c(2,5),c(0,5),c(3,3),c(6,3))
y <- c(8,3,10,11,4)
newx <- c(0,0)

kNN(x,y,newx,2,scaleX=FALSE)
# $whichClosest
#      [,1] [,2]
# [1,]    1    4
# $regests
# [1] 9.5

kNN(x,y,newx,3,scaleX=FALSE,smoothingFtn=loclin)$regests
# 7.307692

knnout <- kNN(x,y,newx,2,scaleX=FALSE)
knnout
# $whichClosest
#      [,1] [,2]
# [1,]    1    4
# ...

## Not run: 
data(mlb) 
mlb <- mlb[,c(4,6,5)]  # height, age, weight
# fit, then predict 75", age 21, and 72", age 32
knnout <- kNN(mlb[,1:2],mlb[,3],rbind(c(75,21),c(72,32)),25) 
knnout$regests
# [1] 202.72 195.72

# fit now, predict later
knnout <- kNN(mlb[,1:2],mlb[,3],NULL,25) 
predict(knnout,c(70,28)) 
# [1] 186.48

data(peDumms) 
names(peDumms) 
ped <- peDumms[,c(1,20,22:27,29,31,32)] 
names(ped) 

# fit, and predict income of a 35-year-old man, MS degree, occupation 101,
# worked 50 weeks, using 25 nearest neighbors
kNN(ped[,-10],ped[,10],c(35,1,0,0,1,0,0,0,1,50),25) $regests
# [1] 67540

# fit, and predict occupation 101 for a 35-year-old man, MS degree, 
# wage $55K, worked 50 weeks, using 25 nearest neighbors
z <- kNN(ped[,-c(4:8)],ped[,4],c(35,1,0,1,55,50),25,classif=TRUE)
z$regests
# [1] 0.16  16
z$ypreds
# [1] 0  class 0, i.e. not occupation 101; round(0.24) = 0, 
# computed by user request, classif = TRUE

# the y argument must be either a vector (2-class setting) or a matrix
# (multiclass setting)
occs <- as.matrix(ped[, 4:8])
z <- kNN(ped[,-c(4:8)],occs,c(35,1,0,1,72000,50),25,classif=TRUE)
z$ypreds
# [1] 3   occupation 3, i.e. 102, is predicted

# predict occupation in general; let's bring occ.141 back in (was
# excluded as a predictor due to redundancy)
names(peDumms)
#  [1] "age"     "cit.1"   "cit.2"   "cit.3"   "cit.4"   "cit.5"   "educ.1" 
#  [8] "educ.2"  "educ.3"  "educ.4"  "educ.5"  "educ.6"  "educ.7"  "educ.8" 
# [15] "educ.9"  "educ.10" "educ.11" "educ.12" "educ.13" "educ.14" "educ.15"
# [22] "educ.16" "occ.100" "occ.101" "occ.102" "occ.106" "occ.140" "occ.141"
# [29] "sex.1"   "sex.2"   "wageinc" "wkswrkd" "yrentry"
occs <- as.matrix(peDumms[,23:28])  
z <- kNN(ped[,-c(4:8)],occs,c(35,1,0,1,72000,50),25,classif=TRUE)
z$ypreds
# [1] 3   prediction is occ.102

# try weight age 0.5, wkswrked 1.5; use leave1out to avoid overfit
knnout <- kNN(ped[,-10],ped[,10],ped[,-10],25,leave1out=TRUE)
mean(abs(knnout$regests - ped[,10]))
# [1] 25341.6

# use of the weighted distance feature; deweight age by a factor of 0.5,
# put increased weight on weeks worked, factor of 1.5
knnout <- kNN(ped[,-10],ped[,10],ped[,-10],25,
   expandVars=c(1,10),expandVals=c(0.5,1.5),leave1out=TRUE)
mean(abs(knnout$regests - ped[,10]))
# [1] 25196.61




## End(Not run)

x <- rbind(c(1,0),c(2,5),c(0,5),c(3,3),c(6,3))
y <- c(8,3,10,11,4)
newx <- c(0,0)

kNN(x,y,newx,2,scaleX=FALSE)
# $whichClosest
#      [,1] [,2]
# [1,]    1    4
# $regests
# [1] 9.5

kNN(x,y,newx,3,scaleX=FALSE,smoothingFtn=loclin)$regests
# 7.307692

knnout <- kNN(x,y,newx,2,scaleX=FALSE)
knnout
# $whichClosest
#      [,1] [,2]
# [1,]    1    4
# ...

## Not run: 
data(mlb) 
mlb <- mlb[,c(4,6,5)]  # height, age, weight
# fit, then predict 75", age 21, and 72", age 32
knnout <- kNN(mlb[,1:2],mlb[,3],rbind(c(75,21),c(72,32)),25) 
knnout$regests
# [1] 202.72 195.72

# fit now, predict later
knnout <- kNN(mlb[,1:2],mlb[,3],NULL,25) 
predict(knnout,c(70,28)) 
# [1] 186.48

data(peDumms) 
names(peDumms) 
ped <- peDumms[,c(1,20,22:27,29,31,32)] 
names(ped) 

# fit, and predict income of a 35-year-old man, MS degree, occupation 101,
# worked 50 weeks, using 25 nearest neighbors
kNN(ped[,-10],ped[,10],c(35,1,0,0,1,0,0,0,1,50),25) $regests
# [1] 67540

# fit, and predict occupation 101 for a 35-year-old man, MS degree, 
# wage $55K, worked 50 weeks, using 25 nearest neighbors
z <- kNN(ped[,-c(4:8)],ped[,4],c(35,1,0,1,55,50),25,classif=TRUE)
z$regests
# [1] 0.16  16
z$ypreds
# [1] 0  class 0, i.e. not occupation 101; round(0.24) = 0, 
# computed by user request, classif = TRUE

# the y argument must be either a vector (2-class setting) or a matrix
# (multiclass setting)
occs <- as.matrix(ped[, 4:8])
z <- kNN(ped[,-c(4:8)],occs,c(35,1,0,1,72000,50),25,classif=TRUE)
z$ypreds
# [1] 3   occupation 3, i.e. 102, is predicted

# predict occupation in general; let's bring occ.141 back in (was
# excluded as a predictor due to redundancy)
names(peDumms)
#  [1] "age"     "cit.1"   "cit.2"   "cit.3"   "cit.4"   "cit.5"   "educ.1" 
#  [8] "educ.2"  "educ.3"  "educ.4"  "educ.5"  "educ.6"  "educ.7"  "educ.8" 
# [15] "educ.9"  "educ.10" "educ.11" "educ.12" "educ.13" "educ.14" "educ.15"
# [22] "educ.16" "occ.100" "occ.101" "occ.102" "occ.106" "occ.140" "occ.141"
# [29] "sex.1"   "sex.2"   "wageinc" "wkswrkd" "yrentry"
occs <- as.matrix(peDumms[,23:28])  
z <- kNN(ped[,-c(4:8)],occs,c(35,1,0,1,72000,50),25,classif=TRUE)
z$ypreds
# [1] 3   prediction is occ.102

# try weight age 0.5, wkswrked 1.5; use leave1out to avoid overfit
knnout <- kNN(ped[,-10],ped[,10],ped[,-10],25,leave1out=TRUE)
mean(abs(knnout$regests - ped[,10]))
# [1] 25341.6

# use of the weighted distance feature; deweight age by a factor of 0.5,
# put increased weight on weeks worked, factor of 1.5
knnout <- kNN(ped[,-10],ped[,10],ped[,-10],25,
   expandVars=c(1,10),expandVals=c(0.5,1.5),leave1out=TRUE)
mean(abs(knnout$regests - ped[,10]))
# [1] 25196.61




## End(Not run)

Tools for Neural Networks

Description

Tools to complement existing neural networks software, notably a more "R-like" wrapper to fitting data with R's keras package.

Usage

krsFit(x,y,hidden,acts=rep("relu",length(hidden)),learnRate=0.001,
   conv=NULL,xShape=NULL,classif=TRUE,nClass=NULL,nEpoch=30,
   scaleX=TRUE,scaleY=TRUE)
krsFitImg(x,y,hidden=c(100,100),acts=rep("relu",length(hidden)),
    nClass,nEpoch=30) 
## S3 method for class 'krsFit'
predict(object,...)
diagNeural(krsFitOut)
krsFit(x,y,hidden,acts=rep("relu",length(hidden)),learnRate=0.001,
   conv=NULL,xShape=NULL,classif=TRUE,nClass=NULL,nEpoch=30,
   scaleX=TRUE,scaleY=TRUE)
krsFitImg(x,y,hidden=c(100,100),acts=rep("relu",length(hidden)),
    nClass,nEpoch=30) 
## S3 method for class 'krsFit'
predict(object,...)
diagNeural(krsFitOut)

Arguments

`object`	An object of class 'krsFit'.
`...`	Data points to be predicted, 'newx'.
`x`	X data, predictors, one row per data point, in the training set. Must be a matrix.
`y`	Numeric vector of Y values. In classification case must be integers, not an R factor, and take on the values 0,1,2,..., `nClass`-1

`hidden`	Vector of number of units per hidden layer, or the rate for a dropout layer.
`acts`	Vector of names of the activation functions, one per hidden layer. Choices inclde 'relu', 'sigmoid', 'tanh', 'softmax', 'elu', 'selu'.
`learnRate`	Learning rate.
`conv`	R list specifying the convolutional layers, if any.
`xShape`	Vector giving the number of rows and columns, and in the convolutional case with multiple channels, the number of channels.
`classif`	If TRUE, indicates a classification problem.
`nClass`	Number of classes.
`nEpoch`	Number of epochs.
`krsFitOut`	An object returned by `krstFit`.
`scaleX`	If TRUE, scale X columns.
`scaleY`	If TRUE, scale Y columns.

Details

The krstFit function is a wrapper for the entire pipeline in fitting the R keras package to a dataset: Defining the model, compiling, stating the inputs and so on. As a result, the wrapper allows the user to skip those details (or not need to even know them), and define the model in a manner more familiar to R users.

The paired predict.krsFit takes as its first argument the output of krstFit, and newx, the points to be predicted.

Author(s)

Norm Matloff

Examples


## Not run: 
library(keras)
data(peDumms) 
ped <- peDumms[,c(1,20,22:27,29,32,31)]
# predict wage income
x <- ped[,-11] 
y <- ped[,11] 
z <- krsFit(x,y,c(50,50,50),classif=FALSE,nEpoch=25) 
preds <- predict(z,x) 
mean(abs(preds-y))  # something like 25000

x <- ped[,-(4:8)] 
y <- ped[,4:8] 
y <- dummiesToInt(y,FALSE) - 1
z <- krsFit(x,y,c(50,50,0.20,50),classif=TRUE,nEpoch=175,nClass=6) 
preds <- predict(z,x)
mean(preds == y)   # something like 0.39

# obtain MNIST training and test sets; the following then uses the
# example network of 

# https://databricks-prod-cloudfront.cloud.databricks.com/
# public/4027ec902e239c93eaaa8714f173bcfc/2961012104553482/
# 4462572393058129/1806228006848429/latest.html

# converted to use the krsFit wrapper

x <- mntrn[,-785] / 255 
y <- mntrn[,785]
xShape <- c(28,28)

# define convolutional layers
conv1 <- list(type='conv2d',filters=32,kern=3)
conv2 <- list(type='pool',kern=2)
conv3 <- list(type='conv2d',filters=64,kern=3) 
conv4 <- list(type='pool',kern=2)
conv5 <- list(type='drop',drop=0.5)

# call wrapper, 1 dense hidden layer of 128 units, then dropout layer
# with proportion 0.5
z <- krsFit(x,y,conv=list(conv1,conv2,conv3,conv4,conv5),c(128,0.5),
   classif=TRUE,nClass=10,nEpoch=10,xShape=c(28,28),scaleX=FALSE,scaleY=FALSE)

# try on test set
preds <- predict(z,mntst[,-785]/255)
mean(preds == mntst[,785])  # 0.98 in my sample run


## End(Not run)

## Not run: 
library(keras)
data(peDumms) 
ped <- peDumms[,c(1,20,22:27,29,32,31)]
# predict wage income
x <- ped[,-11] 
y <- ped[,11] 
z <- krsFit(x,y,c(50,50,50),classif=FALSE,nEpoch=25) 
preds <- predict(z,x) 
mean(abs(preds-y))  # something like 25000

x <- ped[,-(4:8)] 
y <- ped[,4:8] 
y <- dummiesToInt(y,FALSE) - 1
z <- krsFit(x,y,c(50,50,0.20,50),classif=TRUE,nEpoch=175,nClass=6) 
preds <- predict(z,x)
mean(preds == y)   # something like 0.39

# obtain MNIST training and test sets; the following then uses the
# example network of 

# https://databricks-prod-cloudfront.cloud.databricks.com/
# public/4027ec902e239c93eaaa8714f173bcfc/2961012104553482/
# 4462572393058129/1806228006848429/latest.html

# converted to use the krsFit wrapper

x <- mntrn[,-785] / 255 
y <- mntrn[,785]
xShape <- c(28,28)

# define convolutional layers
conv1 <- list(type='conv2d',filters=32,kern=3)
conv2 <- list(type='pool',kern=2)
conv3 <- list(type='conv2d',filters=64,kern=3) 
conv4 <- list(type='pool',kern=2)
conv5 <- list(type='drop',drop=0.5)

# call wrapper, 1 dense hidden layer of 128 units, then dropout layer
# with proportion 0.5
z <- krsFit(x,y,conv=list(conv1,conv2,conv3,conv4,conv5),c(128,0.5),
   classif=TRUE,nClass=10,nEpoch=10,xShape=c(28,28),scaleX=FALSE,scaleY=FALSE)

# try on test set
preds <- predict(z,mntst[,-785]/255)
mean(preds == mntst[,785])  # 0.98 in my sample run


## End(Not run)

Available Cases Method for Missing Data

Description

Various estimators that handle missing data via the Available Cases Method

Usage

lmac(xy,nboot=0) 
makeNA(m,probna)
NAsTo0s(x)
ZerosToNAs(x,replaceVal=0)
## S3 method for class 'lmac'
coef(object,...)
## S3 method for class 'lmac'
vcov(object,...)
pcac(indata,scale=FALSE) 
loglinac(x,margin) 
tbltofakedf(tbl)
lmac(xy,nboot=0) 
makeNA(m,probna)
NAsTo0s(x)
ZerosToNAs(x,replaceVal=0)
## S3 method for class 'lmac'
coef(object,...)
## S3 method for class 'lmac'
vcov(object,...)
pcac(indata,scale=FALSE) 
loglinac(x,margin) 
tbltofakedf(tbl)

Arguments

`replaceVal`	Value to be replaced by NA.
`xy`	Matrix or data frame, X values in the first columns, Y in the last column.
`indata`	Matrix or data frame.
`x`	Matrix or data frame, one column per variable.
`nboot`	If positive, number of bootstrap samples to take.
`probna`	Probability that an element will be NA.
`scale`	If TRUE, call `cor` instead of `cov`.
`tbl`	An R table.
`m`	Number of synthetic NAs to insert.
`object`	Output from `lmac`.
`...`	Needed for consistency with generic function. Not used.
`margin`	A list of vectors specifying the model, as in `loglin`.

Details

The Available Cases (AC) approach applies to statistical methods that depend only on products of k of the variables, so that cases having non-NA values for those k variables can be used, as opposed to using only cases that are fully intact in all variables, the Complete Cases (CC) approach. In the case of linear regression, for instance, the estimated coefficients depend only on covariances between the variables (both predictors and response). This approach assumes thst the cases with missing values have the same distribution as the intact cases.

The lmac function forms OLS estimates as with lm, but applying AC, in contrast to lm, which uses the CC method.

The pcac function is an AC substitute for prcomp. The data is centered, corresponding to a fixed value of center = TRUE in prcomp. It is also scaled if scale is TRUE, corresponding scale = TRUE in prcomp. Due to AC, there is a small chance of negative eigenvalues, in which case stop will be called.

The loglinac function is an AC substitute for loglin. The latter takes tables as input, but loglinac takes the raw data. If you have just the table, use tbltofakedf to regenerate a usable data frame.

The makeNA function is used to insert random NA values into data, for testing purposes.

Value

For lmac, an object of class lmac, with components

coefficients, as with lm; accessible directly or by calling coef, as with lm
fitted.values, as with lm
residuals, as with lm
r2, (unadjusted) R-squared
cov, for nboot > 0 the estimated covariance matrix of the vector of estimated regression coefficients; accessible directly or by calling vcov, as with lm

For pcac, an R list, with components

sdev, as with prcomp
rotation, as with prcomp

For loglinac, an R list, with components

param, estimated coefficients, as in loglin
fit, estimated expected call counts, as in loglin

Author(s)

Norm Matloff

Examples

n <- 25000
w <- matrix(rnorm(2*n),ncol=2)  # x and epsilon
x <- w[,1]
y <- x + w[,2]
# insert some missing values
nmiss <- round(0.1*n)
x[sample(1:n,nmiss)] <- NA
nmiss <- round(0.2*n)
y[sample(1:n,nmiss)] <- NA
acout <- lmac(cbind(x,y))
coef(acout)  # should be near pop. values 0 and 1
n <- 25000
w <- matrix(rnorm(2*n),ncol=2)  # x and epsilon
x <- w[,1]
y <- x + w[,2]
# insert some missing values
nmiss <- round(0.1*n)
x[sample(1:n,nmiss)] <- NA
nmiss <- round(0.2*n)
y[sample(1:n,nmiss)] <- NA
acout <- lmac(cbind(x,y))
coef(acout)  # should be near pop. values 0 and 1

Letter Frequencies

Description

This is data consists of capital letter frequencies obtained at http://www.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.h tml

Utilities

Description

Various helper functions.

Usage

replicMeans(nrep,toReplic,timing=FALSE)
stdErrPred(regObj,xnew)
pythonBlankSplit(s)
stopBrowser(msg = stop("msg not supplied"))
doPCA(x,pcaProp)
PCAwithFactors(x, nComps = ncol(x))
ulist(lst)
prToFile(filename)
partTrnTst(fullData,nTest=min(1000,round(0.2*nrow(fullData))))
findOverallLoss(regests,y,lossFtn = MAPE) 
getNamedArgs(argVec)
multCols(x,cols,vals)
probIncorrectClass(yhat, y, startAt1 = TRUE)
propMisclass(y,yhat)
replicMeans(nrep,toReplic,timing=FALSE)
stdErrPred(regObj,xnew)
pythonBlankSplit(s)
stopBrowser(msg = stop("msg not supplied"))
doPCA(x,pcaProp)
PCAwithFactors(x, nComps = ncol(x))
ulist(lst)
prToFile(filename)
partTrnTst(fullData,nTest=min(1000,round(0.2*nrow(fullData))))
findOverallLoss(regests,y,lossFtn = MAPE) 
getNamedArgs(argVec)
multCols(x,cols,vals)
probIncorrectClass(yhat, y, startAt1 = TRUE)
propMisclass(y,yhat)

Arguments

`regests`	Fitted regression estimates, training set.
`y`	Y values, training set.
`yhat`	Predicted Y values
`startAt1`	TRUE if indexing starts at 1, FALSE if starting at 0.
`lossFtn`	Loss functin.
`fullData`	A data frame or matrix.
`nTest`	Number of rows for the test set.
`filename`	Name of output file.
`lst`	An R list.
`x`	Matrix or data frame.
`pcaProp`	Fraction in [0,1], specifying number of PCA components to compute, in terms of fraction of total variance.
`nComps`	Number of PCA components.
`regObj`	An object of class `'lm'` or similar, for which there is a `vcov` generic function.
`xnew`	New X value to be predicted.
`nrep`	Number of replications.
`s`	A character string.
`toReplic`	Function call(s), as a quoted string, separated by semicolons if more than one call.
`timing`	If TRUE, find average elapsed time over the replicates.
`msg`	Character string, error message for existing debug browser.
`argVec`	R list or vector with named elements.
`cols`	A set of column numbers.
`vals`	A set of positive expansion numbers.

Details

The function PCAwithFactors is a wrapper for stats::prcomp, to be used on data frames that contain at least on R factor.

Value

The function PCAwithFactors returns an object of class 'PCAwithFactors'. with components pcout, the object returned by the wrapped call to prcomp; factorsInfo, factor conversion information to be used with predict; and preds, the PCA version of x.

The function getNamedArgs will assign in the caller's space variables with the names and values in argVec.

Author(s)

Norm Matloff

Examples


w <- list(a=3,b=8)
getNamedArgs(w)  
a
b
u <- c(5,12,13)
names(u) <- c('x','y','z')
getNamedArgs(u)
x
y
z

w <- list(a=3,b=8)
getNamedArgs(w)  
a
b
u <- c(5,12,13)
names(u) <- c('x','y','z')
getNamedArgs(u)
x
y
z

Major Leage Baseball player data set.

Description

Heights, weights, ages etc. of major league baseball players. A new variable has been added, consolidating positions into Infielders, Outfielders, Catchers and Pitchers.

Included here with the permission of the UCLA Statistics Department.

MovieLens User Summary Data

Description

The MovieLens dataset, https://grouplens.org/, is a standard example in the recommender systems literature. Here we give demographic data for each user, plus the mean rating and number of ratings. One may explore, for instance, the relation between ratings and age.

Method of Moments, Including Possible Regression Terms

Description

Method of Moments computation for almost any statistical problem that has derivatives with respect to theta. Capable of handling models that include parametric regression terms, but not need be a regression problem. (This is not Generalized Method of Moments; see the package gmm for the latter.)

Usage

mm(m,g,x,init=rep(0.5,length(m)),eps=0.0001,maxiters=1000) 
mm(m,g,x,init=rep(0.5,length(m)),eps=0.0001,maxiters=1000)

Arguments

`m`	Vector of sample moments, "left-hand sides" of moment equations.
`g`	Function of parameter estimates, forming the "right-hand sides." This is a multivariate-valued function, of dimensionality equal to that of `m`

`init`	Vector of initial guesses for parameter estimates. If components are named, these will be used as labels in the output.
`eps`	Convergence criterion.
`maxiters`	Maximum number of iterations.
`x`	Input data.

Details

Standard Newton-Raphson methods are used to solve for the parameter estimates, with numericDeriv being used to find the approximate derivatives.

Value

R list consisting of components tht, the vector of parameter estimates, and numiters, the number of iterations performed.

Author(s)

Norm Matloff

Examples

x <- rgamma(1000,2)
m <- c(mean(x),var(x))
g <- function(x,theta) {  # from theoretical properties of gamma distr.
   g1 <-  theta[1] / theta[2]
   g2 <-  theta[1] / theta[2]^2
   c(g1,g2)
}
# should output about 2 and 1
mm(m,g,x)

## Not run: 
library(mfp)
data(bodyfat)
# model as a beta distribution 
g <- function(x,theta) {
   t1 <- theta[1]
   t2 <- theta[2]
   t12 <- t1 + t2
   meanb <- t1 / t12
   m1 <- meanb 
   m2 <- t1*t2 / (t12^2 * (t12+1)) 
   c(m1,m2)
}
x <- bodyfat$brozek/100
m <- c(mean(x),var(x))
# about 4.65 and 19.89
mm(m,g,x)

## End(Not run)

x <- rgamma(1000,2)
m <- c(mean(x),var(x))
g <- function(x,theta) {  # from theoretical properties of gamma distr.
   g1 <-  theta[1] / theta[2]
   g2 <-  theta[1] / theta[2]^2
   c(g1,g2)
}
# should output about 2 and 1
mm(m,g,x)

## Not run: 
library(mfp)
data(bodyfat)
# model as a beta distribution 
g <- function(x,theta) {
   t1 <- theta[1]
   t2 <- theta[2]
   t12 <- t1 + t2
   meanb <- t1 / t12
   m1 <- meanb 
   m2 <- t1*t2 / (t12^2 * (t12+1)) 
   c(m1,m2)
}
x <- bodyfat$brozek/100
m <- c(mean(x),var(x))
# about 4.65 and 19.89
mm(m,g,x)

## End(Not run)

Classification with More Than 2 Classes

Description

Tools for multiclass classification, parametric and nonparametric.

Usage

avalogtrn(trnxy,yname)
ovaknntrn(trnxy,yname,k,xval=FALSE)
avalogpred()
classadjust(econdprobs,wrongprob1,trueprob1) 
boundaryplot(y01,x,regests,pairs=combn(ncol(x),2),pchvals=2+y01,cex=0.5,band=0.10)
avalogtrn(trnxy,yname)
ovaknntrn(trnxy,yname,k,xval=FALSE)
avalogpred()
classadjust(econdprobs,wrongprob1,trueprob1) 
boundaryplot(y01,x,regests,pairs=combn(ncol(x),2),pchvals=2+y01,cex=0.5,band=0.10)

Arguments

`pchvals`	Point size in base-R graphics.
`trnxy`	Data matrix, Y last.
`xval`	If TRUE, use leaving-one-out method.
`y01`	Y vector (1s and 0s).
`regests`	Estimated regression function values.
`x`	X data frame or matrix.
`pairs`	Two-row matrix, column i of which is a pair of predictor variables to graph.
`cex`	Symbol size for plotting.
`band`	If `band` is non-NULL, only points within `band`, say 0.1, of est. P(Y = 1) are displayed, for a contour-like effect.
`yname`	Name of the Y column.
`k`	Number of nearest neighbors.
`econdprobs`	Estimated conditional class probabilities, given the predictors.
`wrongprob1`	Incorrect, data-provenanced, unconditional P(Y = 1).
`trueprob1`	Correct unconditional P(Y = 1).

Details

These functions aid classification in the multiclass setting.

The function boundaryplot serves as a visualization technique, for the two-class setting. It draws the boundary between predicted Y = 1 and predicted Y = 0 data points in 2-dimensional feature space, as determined by the argument regests. Used to visually assess goodness of fit, typically running this function twice, say one for glm then for kNN. If there is much discrepancy and the analyst wishes to still use glm(), he/she may wish to add polynomial terms.

The functions not listed above are largely deprecated, e.g. in favor of qeLogit and the other qe-series functions.

Author(s)

Norm Matloff

Examples


## Not run: 


data(oliveoils) 
oo <- oliveoils[,-1] 

# toy example
set.seed(9999)
x <- runif(25)
y <- sample(0:2,25,replace=TRUE)
xd <- preprocessx(x,2,xval=FALSE)
kout <- ovaknntrn(y,xd,m=3,k=2)
kout$regest  # row 2:  0.0,0.5,0.5
predict(kout,predpts=matrix(c(0.81,0.55,0.15),ncol=1))  # 0,2,0or2
yd <- factorToDummies(as.factor(y),'y',FALSE)
kNN(x,yd,c(0.81,0.55,0.15),2)  # predicts 0, 1or2, 2

data(peDumms)  # prog/engr data 
ped <- peDumms[,-33] 
ped <- as.matrix(ped)
x <- ped[,-(23:28)]
y <- ped[,23:28]
knnout <- kNN(x,y,x,25,leave1out=TRUE) 
truey <- apply(y,1,which.max) - 1
mean(knnout$ypreds == truey)  # about 0.37
xd <- preprocessx(x,25,xval=TRUE)
kout <- knnest(y,xd,25)
preds <- predict(kout,predpts=x)
hats <- apply(preds,1,which.max) - 1
mean(yhats == truey)  # about 0.37

data(peFactors)
# discard the lower educ-level cases, which are rare
edu <- peFactors$educ 
numedu <- as.numeric(edu) 
idxs <- numedu >= 12 
pef <- peFactors[idxs,]
numedu <- numedu[idxs]
pef$educ <- as.factor(numedu)
pef1 <- pef[,c(1,3,5,7:9)]

# ovalog
ovaout <- ovalogtrn(pef1,"occ")
preds <- predict(ovaout,predpts=pef1[,-3])
mean(preds == factorTo012etc(pef1$occ))  # about 0.39

# avalog

avaout <- avalogtrn(pef1,"occ")  
preds <- predict(avaout,predpts=pef1[,-3]) 
mean(preds == factorTo012etc(pef1$occ))  # about 0.39 

# knn

knnout <- ovalogtrn(pef1,"occ",25)
preds <- predict(knnout,predpts=pef1[,-3])
mean(preds == factorTo012etc(pef1$occ))  # about 0.43

data(oliveoils)
oo <- oliveoils
oo <- oo[,-1]
knnout <- ovaknntrn(oo,'Region',10)
# predict a new case that is like oo1[1,] but with palmitic = 950
newx <- oo[1,2:9,drop=FALSE]
newx[,1] <- 950
predict(knnout,predpts=newx)  # predicts class 2, South


## End(Not run)

## Not run: 


data(oliveoils) 
oo <- oliveoils[,-1] 

# toy example
set.seed(9999)
x <- runif(25)
y <- sample(0:2,25,replace=TRUE)
xd <- preprocessx(x,2,xval=FALSE)
kout <- ovaknntrn(y,xd,m=3,k=2)
kout$regest  # row 2:  0.0,0.5,0.5
predict(kout,predpts=matrix(c(0.81,0.55,0.15),ncol=1))  # 0,2,0or2
yd <- factorToDummies(as.factor(y),'y',FALSE)
kNN(x,yd,c(0.81,0.55,0.15),2)  # predicts 0, 1or2, 2

data(peDumms)  # prog/engr data 
ped <- peDumms[,-33] 
ped <- as.matrix(ped)
x <- ped[,-(23:28)]
y <- ped[,23:28]
knnout <- kNN(x,y,x,25,leave1out=TRUE) 
truey <- apply(y,1,which.max) - 1
mean(knnout$ypreds == truey)  # about 0.37
xd <- preprocessx(x,25,xval=TRUE)
kout <- knnest(y,xd,25)
preds <- predict(kout,predpts=x)
hats <- apply(preds,1,which.max) - 1
mean(yhats == truey)  # about 0.37

data(peFactors)
# discard the lower educ-level cases, which are rare
edu <- peFactors$educ 
numedu <- as.numeric(edu) 
idxs <- numedu >= 12 
pef <- peFactors[idxs,]
numedu <- numedu[idxs]
pef$educ <- as.factor(numedu)
pef1 <- pef[,c(1,3,5,7:9)]

# ovalog
ovaout <- ovalogtrn(pef1,"occ")
preds <- predict(ovaout,predpts=pef1[,-3])
mean(preds == factorTo012etc(pef1$occ))  # about 0.39

# avalog

avaout <- avalogtrn(pef1,"occ")  
preds <- predict(avaout,predpts=pef1[,-3]) 
mean(preds == factorTo012etc(pef1$occ))  # about 0.39 

# knn

knnout <- ovalogtrn(pef1,"occ",25)
preds <- predict(knnout,predpts=pef1[,-3])
mean(preds == factorTo012etc(pef1$occ))  # about 0.43

data(oliveoils)
oo <- oliveoils
oo <- oo[,-1]
knnout <- ovaknntrn(oo,'Region',10)
# predict a new case that is like oo1[1,] but with palmitic = 950
newx <- oo[1,2:9,drop=FALSE]
newx[,1] <- 950
predict(knnout,predpts=newx)  # predicts class 2, South


## End(Not run)

UCI adult income data set, adapted

Description

This data set is adapted from the Adult data from the UCI Machine Learning Repository, which was in turn adapted from Census data on adult incomes and other demographic variables. The UCI data is used here with permission from Ronny Kohavi.

The variables are:

gt50, which converts the original >50K variable to an indicator variable; 1 for income greater than $50,000, else 0
edu, which converts a set of education levels to approximate number of years of schooling
age
gender, 1 for male, 0 for female
mar, 1 for married, 0 for single

Note that the education variable is now numeric.

Heteroscedastic Nonlinear Regression

Description

Extension of nls to the heteroscedastic case.

Usage

nlshc(nlsout,type='HC')
nlshc(nlsout,type='HC')

Arguments

`nlsout`	Object of type 'nls'.
`type`	Eickert-White algorithm to use. See documentation for nls.

Details

Calls nls but then forms a different estimated covariance matrix for the estimated regression coefficients, applying the Eickert-White technique to handle heteroscedasticity. This then gives valid statistical inference in that setting.

Some users may prefer to use nlsLM of the package minpack.lm instead of nls. This is fine, as both functions return objects of class 'nls'.

Value

Estimated covariance matrix

Author(s)

Norm Matloff

References

Zeileis A (2006), Object-Oriented Computation of Sandwich Estimators. Journal of Statistical Software, 16(9), 1–16, https://www.jstatsoft.org/v16/i09/.

Examples

# simulate data from a setting in which mean Y is 
# 1 / (b1 * X1 + b2 * X2)
n <- 250
b <- 1:2
x <- matrix(rexp(2*n),ncol=2)
meany <- 1 / (x %*% b)  # reg ftn
y <- meany + (runif(n) - 0.5) * meany  # heterosced epsilon
xy <- cbind(x,y)
xy <- data.frame(xy)
# see nls() docs
nlout <- nls(X3 ~ 1 / (b1*X1+b2*X2),
   data=xy,start=list(b1 = 1,b2=1))
nlshc(nlout)
# simulate data from a setting in which mean Y is 
# 1 / (b1 * X1 + b2 * X2)
n <- 250
b <- 1:2
x <- matrix(rexp(2*n),ncol=2)
meany <- 1 / (x %*% b)  # reg ftn
y <- meany + (runif(n) - 0.5) * meany  # heterosced epsilon
xy <- cbind(x,y)
xy <- data.frame(xy)
# see nls() docs
nlout <- nls(X3 ~ 1 / (b1*X1+b2*X2),
   data=xy,start=list(b1 = 1,b2=1))
nlshc(nlout)

Italian olive oils data set.

Description

Italian olive oils data set, as used in Graphics of Large Datasets: Visualizing a Million, by Antony Unwin, Martin Theus and Heike Hofmann, Springer, 2006. Included here with permission of Dr. Martin Theus.

Penrose-Inverse Linear Models and Polynomial Regression

Description

Provides mininum-norm solutions to linear models, identical to OLS in standard situations, but allowing exploration of overfitting in the overparameterized case. Also provides a wrapper for the polynomial case.

Usage

penroseLM(d,yName)
penrosePoly(d,yName,deg,maxInteractDeg=deg)
ridgePoly(d,yName,deg,maxInteractDeg=deg)
## S3 method for class 'penroseLM'
predict(object,...)
## S3 method for class 'penrosePoly'
predict(object,...)

penroseLM(d,yName)
penrosePoly(d,yName,deg,maxInteractDeg=deg)
ridgePoly(d,yName,deg,maxInteractDeg=deg)
## S3 method for class 'penroseLM'
predict(object,...)
## S3 method for class 'penrosePoly'
predict(object,...)

Arguments

`...`	Arguments for the `predict` functions.
`d`	Dataframe, training set.
`yName`	Name of the class labels column.
`deg`	Polynomial degree.
`maxInteractDeg`	Maximum degree of interaction terms.
`object`	A value returned by `penroseLM` or `penrosePoly`.

Details

First, provides a convenient wrapper to the polyreg package for polynomial regression. (See qePoly here for an even higher-level wrapper.) Note that this computes true polynomials, with cross-product/interaction terms rather than just powers, and that dummy variables are handled properly (to NOT compute powers).

Second, provides a tool for exploring the "double descent" phenomenon, in which prediction error may improve upon fitting past the interpolation point.

Author(s)

Norm Matloff

Phoneme Data

Description

Phoneme detection, 2 types. Features are from harmonic analysis of th voice. From OpenML, https://www.openml.org/d/1489.

Silicon Valley programmers and engineers data

Description

This data set is adapted from the 2000 Census (5% sample, person records). It is mainly restricted to programmers and engineers in the Silicon Valley area. (Apparently due to errors, there are some from other ZIP codes.)

There are three versions:

prgeng, the original data, with categorical variables, e.g. Occupation, in their original codes
peDumms, same but with categorical variables converted to dummies; due to the large number of levels the birth and PUMA data is not included
peFactors, same but with categorical variables converted to factors
pef, same as peFactors, but having only columns for age, education, occupation, gender, wage income and weeks worked. The education column has been collapsed to Master's degree, PhD and other.

The variable codes, e.g. occupational codes, are available from https://usa.ipums.org/usa/volii/occ2000.shtml. (Short code lists are given in the record layout, but longer ones are in the appendix Code Lists.)

The variables are:

age, with a U(0,1) variate added for jitter
cit, citizenship; 1-4 code various categories of citizens; 5 means noncitizen (including permanent residents)
educ: 01-09 code no college; 10-12 means some college; 13 is a bachelor's degree, 14 a master's, 15 a professional degree and 16 is a doctorate
occ, occupation
birth, place of birth
wageinc, wage income
wkswrkd, number of weeks worked
yrentry, year of entry to the U.S. (0 for natives)
powpuma, location of work
gender, 1 for male, 2 for female

Usage

data(prgeng)
data(peDumms)
data(peFactors)
data(prgeng)
data(peDumms)
data(peFactors)

Course quiz documents

Description

This data is suitable for NLP analysis. It consists of all the quizzes I've given in undergraduate courses, 143 quizzes in all.

It is available in two forms. First, quizzes is a data.frame, 143 rows and 2 columns. Row i consists of a single character vector comprising the entire quiz i, followed by the course name (as an R factor). The second form is an R list, 143 elements. Each list element is a character vector, one vector element per line of the quiz.

The original documents were LaTeX files. They have been run through the detex utility to remove most LaTeX commands, as well as removing the LaTeX preambles separately.

The names of the list elements are the course names, as follows:

ECS 50: a course in machine organization

ECS 132: an undergraduate course in probabilistic modeling

ECS 145: a course in scripting languages (Python, R)

ECS 158: an undergraduate course in parallel computation

ECS 256: a graduate course in probabilistic modeling

Ridge Regression

Description

Similar to lm.ridge in MASS packaged included with R, but with a different kind of scaling and a little nicer plotting.

Usage

ridgelm(xy,lambda = seq(0.01,1,0.01),mapback=TRUE) 
## S3 method for class 'rlm'
plot(x,y,...)
ridgelm(xy,lambda = seq(0.01,1,0.01),mapback=TRUE) 
## S3 method for class 'rlm'
plot(x,y,...)

Arguments

`xy`	Data, response variable in the last column.
`lambda`	Vector of desired values for the ridge parameter.
`mapback`	If TRUE, the scaling that had been applied to the original data will be map back to the original scale, so that the estimated regression coefficients are now on the scale of the original data.
`x`	Object of type 'rlm', output of `ridgelm`.
`y`	Needed for consistency with the generic. Not used.
`...`	Needed for consistency with the generic. Not used.

Details

Centers and scales the predictors X, and centers the response variable Y. Computes X'X and then solves [(X'X)/n + lambda I]b = X'Y/n for b. The 1/n factors are important, making the diagonal elements of (X'X)/n all 1s and thus facilitating choices for the lambdas in a manner independent of the data.

Calling plot on the output of ridgelm dispatches to plot.rlm, thus diplaying the ridge traces.

Value

The function ridgelm returns an object of class 'rlm', with components bhats, the estimated beta vectors, one column per lambda value, and lambda, a copy of the input.

Author(s)

Norm Matloff

Swiss Roll

Description

See http://people.cs.uchicago.edu/~dinoj/manifold/swissroll.html for this version of Swiss Roll.

Running data(SwissRoll) produces an object sw.

Tools for Text Classification

Description

"R-style," classification-oriented wrappers for the text2vec package.

Usage

    textToXY(docs,labels,kTop=50,stopWords='a') 
    textToXYpred(ttXYout,predDocs) 
textToXY(docs,labels,kTop=50,stopWords='a') 
    textToXYpred(ttXYout,predDocs)

Arguments

`docs`	Character vector, one element per document.
`predDocs`	Character vector, one element per document.
`labels`	Class labels, as numeric, character or factor. NULL is used at the prediction stage.
`kTop`	The number of most-frequent words to retain; 0 means retain all.
`stopWords`	Character vector of common words, e.g. prepositions to delete. Recommended is `tm::stopwords('english')`.
`ttXYout`	Output object from `textToXY`.

Details

A typical classification/machine learning package will have as arguments a feature matrix X and a labels vector/factor Y. For a "bag of words" analysis in the text case, each row of X would be a document and each column a word.

The functions here are basically wrappers for generating X. Wrappers are convenient in that:

The text2vec package is rather arcane, so a "R-style" wrapper would be useful.
The text2vec are not directly set up to do classification, so the functions here provide the "glue" to do that.

The typical usage pattern is thus:

Run the documents vector and labels vector/factor through textToXY, generating X and Y.
Apply your favorite classification/machine learning package p to X and Y, returning o.
When predicting a new document d, run o and d through textToXY, producing x.
Run x on p's predict function.

Value

The function textToXY returns an R list with components x and y for X and Y, and a copy of the input stopWords.

The function textToXY returns X.

Author(s)

Norm Matloff

Transform Time Series to Rectangular Form

Description

Input a time series and transform it to a form suitable for prediction using lm etc.

Usage

TStoX(x,lg)
TStoXmv(xmat,lg,y)
TStoX(x,lg)
TStoXmv(xmat,lg,y)

Arguments

`x`	A vector.
`lg`	Lag, a positive integer.
`xmat`	A matrix, data frame etc., a multivariate time series. Each column is a time series, over a common time period.
`y`	A time series, again on that common time period. If NULL in `TStoXmv`, then `y` is set to `x` (i.e. for a univariate time series in which older values predict newer ones).

Details

Similar to stats::embed, but in lagged form, with applications such as lm in mind.

TStoX is for transforming vectors, while TStoXmv handles the multivariate time series case. Intended for use with lm or other regression/machine learning model, predicting y[i] from observations i-lg, i-lg+1,...,i-1.

Value

As noted, the idea is to set up something like lm(Y ~ X). Let m denote length of x, and in the matrix input case, the number of rows in xmat. Let p be 1 in the vector case, ncol(xmat) in the matrix case. The return value is a matrix with m-lg rows. There will be p*lg+1 columns, with "Y," the numbers to be predicted in the last column.

In the output in the multivariate case, let k denote ncol(xmat). Then the first k columns of the output will be the k series at lag lg, the second k columns will be the k series at lag lg-1, ..., and the lg-th set of k columns will be the k series at lag 1,

Author(s)

Norm Matloff

Examples


x1 <- c(5,12,13,8,88,6) 
x2 <- c(5,4,3,18,168,0) 
y <- 1:6 
xmat <- cbind(x1,x2) 

TStoX(x1,2)
#      [,1] [,2] [,3]
# [1,]    5   12   13
# [2,]   12   13    8
# [3,]   13    8   88
# [4,]    8   88    6

xy <- TStoXmv(xmat,2,y)
xy
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    5    5   12    4    3
# [2,]   12    4   13    3    4
# [3,]   13    3    8   18    5
# [4,]    8   18   88  168    6

lm(xy[,5] ~ xy[,-5])
# Coefficients:
# (Intercept)    xy[, -5]1    xy[, -5]2    xy[, -5]3    xy[, -5]4
#       -65.6          3.2         18.2         -3.2           NA
# need n > 7 here for useful lm() call, but this illustrates the idea
x1 <- c(5,12,13,8,88,6) 
x2 <- c(5,4,3,18,168,0) 
y <- 1:6 
xmat <- cbind(x1,x2) 

TStoX(x1,2)
#      [,1] [,2] [,3]
# [1,]    5   12   13
# [2,]   12   13    8
# [3,]   13    8   88
# [4,]    8   88    6

xy <- TStoXmv(xmat,2,y)
xy
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    5    5   12    4    3
# [2,]   12    4   13    3    4
# [3,]   13    3    8   18    5
# [4,]    8   18   88  168    6

lm(xy[,5] ~ xy[,-5])
# Coefficients:
# (Intercept)    xy[, -5]1    xy[, -5]2    xy[, -5]3    xy[, -5]4
#       -65.6          3.2         18.2         -3.2           NA
# need n > 7 here for useful lm() call, but this illustrates the idea

Miscellaneous Utilities

Description

Utilities.

Usage

unscale(scaledx,ctrs=NULL,sds=NULL)
mmscale(m,scalePars=NULL,p=NULL)
catDFRow(dfRow)
constCols(d)
allNumeric(lst)
unscale(scaledx,ctrs=NULL,sds=NULL)
mmscale(m,scalePars=NULL,p=NULL)
catDFRow(dfRow)
constCols(d)
allNumeric(lst)

Arguments

`scaledx`	A matrix.
`m`	A matrix.
`ctrs`	Take the original means to be `ctrs`
`lst`	An R list.
`sds`	Take the original standard deviations to be `sds`
`dfRow`	A row in a data frame.
`d`	A data frame or matrix.
`scalePars`	If not NULL, a 2-row matrix, with column `i` storing the min and max values to be used in scaling column `i` of `m`. Typically, one has previously called `mmscale` on a dataset and saved the resulting scale parameters, and we wish to use those same scale parameters on new data.
`p`	If `m` is a vector, this specifies the number of columns it should have as a matrix. The code will try to take care of this by itself if `p` is left at NULL.

Details

The function mmscale is meant as a better-behaved alternative to scale. Using minimum and maximum values, it maps variables to [0,1], thus avoiding the problems arising from very small standard deviations in scale.

The function catDFRow nicely prints a row of a data frame.

The function constCols determines which columns of a data frame or matrix are constant, if any.

Value

The function unscale returns the original object to which scale had been applied. Or, the attributes ctrs and sds can be specified by the user.

Author(s)

Norm Matloff

Weather Time Series

Description

Various measurements on weather variables collected by NASA. Downloaded via nasapower; see that package for documentation.

Misc. Graphics

Description

Graphics utiliites.

Usage

xyzPlot(xyz,clrs=NULL,cexText=1.0,xlim=NULL,ylim=NULL,
   xlab=NULL,ylab=NULL,legendPos=NULL,plotType='l') 
xyzPlot(xyz,clrs=NULL,cexText=1.0,xlim=NULL,ylim=NULL,
   xlab=NULL,ylab=NULL,legendPos=NULL,plotType='l')

Arguments

`xyz`	A matrix or data frame of at least 3 columns, the first 3 serving as 'x', 'y' and 'z' coordinates of points to be plotted. Grouping, if any, is specified in column 4, in which case `xyz` must be a data frame.
`clrs`	Colors to be used in the grouped case.
`cexText`	Text size, proportional to standard.
`xlim`	As in `plot`.
`ylim`	As in `plot`.
`xlab`	As in `plot`.
`ylab`	As in `plot`.
`legendPos`	As in `legend`.
`plotType`	Coded 'l' for lines, 'p' for points.

Details

A way to display 3-dimensional data in 2 dimensions. For each plotted point (x,y), a z value is written in text over the point. A grouping variable is also allowed, with different colors used to plot different groups.

A group (including the entire data in the case of one group) can be displayed either as a polygonal line, or just as a point cloud. The user should experiment with different argument settings to get the most visually impactful plot.

Author(s)

Norm Matloff

Examples


## Not run: 

xyzPlot(mtcars[,c(3,6,1)],plotType='l',cexText=0.75)
xyzPlot(mtcars[,c(3,6,1)],plotType='p',cexText=0.75)
xyzPlot(mtcars[,c(3,6,1)],plotType='l',cexText=0.75)
xyzPlot(mtcars[,c(3,6,1,2)],clrs=c('red','darkgreen','blue'),plotType='l',cexText=0.75)


## End(Not run)

## Not run: 

xyzPlot(mtcars[,c(3,6,1)],plotType='l',cexText=0.75)
xyzPlot(mtcars[,c(3,6,1)],plotType='p',cexText=0.75)
xyzPlot(mtcars[,c(3,6,1)],plotType='l',cexText=0.75)
xyzPlot(mtcars[,c(3,6,1,2)],clrs=c('red','darkgreen','blue'),plotType='l',cexText=0.75)


## End(Not run)

New York Taxi Data

Description

From public data on New York City taxi trips.

Package 'regtools'

Help Index

Overview and Package Reference Guide

Description

Function List

Records from several offerings of a certain course.

Description

Pre-Euro Era Currency Fluctuations

Description

Bike sharing data.

Description

English vocabulary data

Description

Factor Conversion Utilities

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Fall Detection Data

Description

Grid Search Plus More

Description

Usage

Arguments

Details

Value

Author(s)

Examples

k-NN Nonparametric Regression and Classification

Description

Usage

Arguments

Details

Author(s)

Examples

Tools for Neural Networks

Description

Usage

Arguments

Details

Author(s)

Examples

Available Cases Method for Missing Data

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Letter Frequencies

Description

Utilities

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Major Leage Baseball player data set.

Description

MovieLens User Summary Data

Description

Method of Moments, Including Possible Regression Terms

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Classification with More Than 2 Classes

Description

Usage

Arguments

Details