statistics - Is brute force the best option for multiple regression using Python? -
in linear model đŚ = đ0 + đ1 × đĽi + đ2 × đĽj + đ3 × đĽk + đ , values đ,j,k ∈ [1,100] results in model highest r-squared?
the data set consists of 100 independent variables , 1 dependent variable. each variable has 50 observations.
my guess loop through possible combinations of 3 variables , compare r-squared each combination. way have done python is:
import itertools itr import pandas pd import time t sklearn import linear_model lm start = t.time() #linear regression model lr = lm.linearregression() #import data data = pd.read_csv('csv_file') #all possible combinations of 3 variables combs = [comb comb in itr.combinations(range(1, 101), 3)] target = data.iloc[:,0] hi_r2 = 0 comb in combs: variables = data.iloc[:, comb] r2 = lr.fit(variables, target).score(variables, target) if r2 > hi_r2: hi_r2 = r2 indices = comb end = t.time() time = float((end-start)/60) print 'variables: {}\nr2 = {:.2f}\ntime: {:.1f} mins'.format(indices, hi_r2, time)
it took 4.3 mins complete. believe method not efficient data set thousands observations each variable. method suggest instead?
thank you.
exhaustive search going slowest way of doing this
the fastest way mentioned in 1 of comments. should pre-specify model based on theory/intuition/logic , come set of variables hypothesize predictors of outcome.
the difference between 2 extremes exhaustive search may leave model doesn't make sense use whatever variables has access to, if unrelated question of interest
if, however, dont want specify model , still want use automated technique build "best" model, middle ground might stepwise regression
there few different ways of doing (e.g. forward/backward elimination), in case of forward selection, example, start adding in 1 variable @ time , testing coefficient significance. if variables improves model fit (either determined throught individual regression coefficient, or r2 of model) keep , add another. if doesnt aid prediction throw away. repeat process until you've found best predictors
Comments
Post a Comment