Prof. Fu Wenjiang:Variable Selection via the Lasso 下午2:00-3:00
2006-12-26 来源:数学科学研究中心活动地点:
活动类型:学术报告
主讲人:Fu Wenjiang
活动时间:
活动内容:
![]() |
2006 | |
![]() |
||
![]() | ||
Center of Mathematical Sciences at Zhejiang University | ||
学术演讲 | ||
题目:Variable Selection via the Lasso 报告人:Fu Wenjiang(Department of Epidemiology,Michigan 内容:Information technology (IT) has generated massive data sets in our work and life, such as finance data in stock market, customer data in marketing / economics, medical imaging data and genetic / genomic data in biological / biomedical research, etc. These massive data not only provide opportunities for quantitative research, but also challenge us with unprecedented requirements of model sophistication and computational power. Very often, these data sets involve hundreds or even thousands of variables in order to understand the relationship between the occurrence of major events of interests (response variables) and the variables that may provide potential explanation (explanatory variables) to the major events. Statisticians have contributed greatly to the modeling and analysis of massive data in the post-IT and post-genome era in working with other scientists in data mining and bioinformatics, etc. One of the major statistical tasks is to deal with a large number of variables. Although a large number of variables are collected in many studies, not all of them contribute to the event under investigation. For example, microarray studies of breast cancer often collect gene expression data from thousands of genes / probes, it has been shown that only 70 genes may determine the prognosis of patient’s survival after surgery (van de Vijver et al. 2002). The statistical challenge is how to select these significant / important variables from a large or huge number of variables collected. This is the so called statistical variable selection problem. Traditional variable selection was achieved with forward and backward selection by dropping insignificant variables and adding significant ones to the statistical models. Such a procedure can be unstable due to its discrete nature. The lasso method proposed by Tibshirani (1996) provides an alternative to the variable selection. It achieves variable selection in regression models in a continuous way by shrinking small parameters to zero and leaves large parameters in the model and thus is more stable. Recent studies show that although the lasso may generate biased estimation, it can provide asymptotically unbiased estimation and variable selection through an adaptive fashion. In this talk, I will discuss the properties of the lasso and its applications. | ||
![]() |