Regression splines are smooth, flexible, and parsimonious nonparametric function estimators. They are known to be sensitive to knot number and placement, but if assumptions such as monotonicity or convexity may be imposed on the regression function, the shape-restricted regression splines are much more robust to knot choices. Monotone regression splines were introduced by Ramsay (1988). In this paper a more numerically efficient computational method is developed, and the method is extended to convex constraints.
Abstract not available
First we provide a simple derivation of the density of a chi-square random variable. Then we provide another simple proof of the statistical independence of the sample mean and the sample variance of a random sample from a normal distribution. The technique proving independence readily extends to the multivariate case.
Genetic algorithms (GAs) are a popular technology to search for an optimum in a large search space. Using new concepts of forbidden array and weighted mutation, Mandal, Wu and Johnson (2006) used elements of GAs to introduce a new global optimization technique called sequential elimination of level combinations (SELC), that efficiently finds optimums. A SAS macro, and Matlab and R functions are developed to implement the SELC algorithm.
Quantile estimation for discrete distributions has not been well studied, although discrete data are common in practice. Under the assumption that data are drawn from a discrete distribution, we examine the consistency of the maximum empirical likelihood estimator (MELE) of the pth population quantile µp, with the assistance of a jittering method and results for continuous distributions. The MELE and the sample quantile estimator are closely related, and they may or may not be consistent for µp, depending on whether or not the underlying distribution has a plateau at the level of p.
Identifying promising compounds from a vast collection of feasible compounds is an important and yet challenging problem in pharmaceutical industry. An efficient solution to this problem will help reduce the expenditure at the early stages of drug discovery. In an attemp to solve this problem, Mandal, Wu and Johnson (2006) proposed SELC algorithm which was motivated by SEL algorithm of Wu, Mao and Ma (1990). However, SELC fails to extract substantial information from the data to guide the search efficiently as this methodology is not based on any statistical modeling of the data.
The nature of fMRI studies impels the need for multi-objective designs that simultaneously accomplish statistical goals, circumvent psychological confounds, and fulfill customized requirements. Incorporating knowledge about fMRI designs, we propose an efficient algorithm to search for optimal multi-objective designs. This algorithm significantly outperforms previous search algorithms in terms of achieved efficiency, computation time and convergence rate. Furthermore, our design criterion allows fair, consistent design comparisons.
We consider the problem of estimating the Hurst parameter for long-range dependent processes using wavelets. Wavelet techniques have shown to effectively exploit the asymptotic linear relationship that forms the basis of constructing an estimator. However, it has been noticed that the commonly adopted standard wavelet estimator is vulnerable to various non-stationary phenomena that increasingly occur in practice and thus leads to unreliable results.
In this paper, we conduct an investigation of the null hypothesis distribution for functional magnetic resonance imaging (fMRI) data using multiscale analysis. Most current approaches to the analysis of fMRI data assume temporal independence, or, at best, simple models for temporal (short term or long term) dependence structure. The spatial structure of fMRI data is commonly assumed to be independent or weakly spatially dependent.
In this paper we examine rigorously the evidence for correlations among data size, transfer rate, and duration in Internet flows. We emphasize various statistical approaches for studying correlations, including computing Pearson's correlation coefficient and using the extremal dependence analysis (EDA) method. We apply these methods to three large data sets of packet traces from a diverse set of networks. Our major results show that correlation between size and duration is much weaker than one might expect.