Dynamic treatment regimes (DTRs) are sequential decision rules for individual patients that can adapt over time to an evolving illness. Discovering DTRs from a SMART trial is challenging due to high-dimensional information and complex interactions between a patient's temporal characteristics and treatments. In this work, we introduce a new statistical learning method, namely outcome weighted learning (O-learning), for estimating the optimal DTR. The approach converts individualized treatment selection into a sequential statistical learning problem, and the method can be implemented via modified support vector machines.
We prove that the resulting rules are consistent, and provide finite risk bounds for the errors using the estimated rules. Simulation results suggest the proposed methods produce superior performance to the Q-learning method commonly used in this field. We illustrate our method using data from a smoking cessation study.
More information on Donglin Zeng may be found at http://www.bios.unc.edu/~dzeng/
This Colloquium is sponsored jointly by the University of Georgia Department of Statistics and the University of Georgia Department of Epidemiology and Biostatistics.