For finite mixtures, consistent estimation of number of components, known as mixture complexity, is considered based on a random sample of counts distributed according to a probability mass function, whose exact form is unknown but is postulated to be close to members of some parametric family of finite mixtures. Following a recent approach of Woo and Sriram (2004), we develop a robust estimator of mixture complexity using Minimum Hellinger distance, when all the parameters associated with the mixture model are unknown. The estimator is shown to be consistent.
This paper studies the ordinary least squares trend estimator in a simple linear regression under the setting of multiple known changepoint times. The error component in the model is allowed to be a general short-memory stationary autocorrelated series. Consistency and asymptotic normality of the estimator is established and its limiting properties are quantified. An example in climatology is given where the multiple changepoint aspect is key.
The asymptotic distribution of the test statistic for testing the dimensionality in the sir-II method is derived and shown to be a linear combination of chi-squared random variables under weak assumptions. This statistic is based on Li's (1991) sequential test statistic for sliced inverse regression (sir). Also presented is a simulation study of the result.
We consider the problem of predicting cancer patient survival time from the gene expression profile of their tumor samples. The partial least squares methodology has been modified to account for right censoring. Performances of three approaches: reweighting, mean imputation and multiple imputation, to handle right censored data, are studied in a detailed simulation study against the benchmark of standard PLS had there been no censoring. It is shown that both imputation schemes perform very similarly and are better than reweighting.
Rogers gives three cases of infinite continued fractions which terminate for certain parameter values. We have analyzed the associated integrals and produced equivalent rational factor ratios.
A simple graphic, the "inverted q-q plot," enables visualization of the monotonic function that transforms data to a desired target distribution. An important special case is use of the Box-Cox family to transform data to a normal distribution. This graphic can be used to develop novel estimates of parameters in a transformation. We describe here the asymptotic properties of these parameter estimates.
Kernel smoothing methods are widely used in many areas of statistics with great success. In particular, minimum distance procedures heavily depend on kernel density estimators. It has been argued that when estimating mixture parameters in finite mixture models, adaptive kernel density estimators would be preferable over non-adaptive kernel density estimators. Cutler and Cordero-Brana (1996) introduced such an adaptive kernel density estimator for the minimum Hellinger distance estimation in finite mixture models.
Motivation: A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. Numerous attempts have been made in the literature in order to validate the results of an existing or a novel clustering algorithm often within the context of microarray data analysis. A closely related problem is that of selecting a clustering algorithm that is optimal in some way from a rather impressive list of clustering algorithms that currently exist.
Abstract not available.
Suppose that data on (X, Y) are collected from C independent but closely related populations and one is interested in measuring the amount of relationship between sets of variables Y and X within each population. Goria and Flury (1996) argued that in these situations it is more meaningful to construct common canonical variates that are identical across populations, while the canonical correlations themselves may vary across populations. Their method of constructing common canonical variates is based on classical normal theory and is more suitable for measuring only linear relationships.