지방이의 Data Science Lab

[python] x, y 쪼개기, train, test 쪼개기 본문

Data Analysis/Python

[python] x, y 쪼개기, train, test 쪼개기

[지현] 2020. 2. 9. 15:41

imbalance일때 학습시키려면 계층유지셔커서 쪼개는 방법

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
= flatten.drop('KIS_credit_&_2018',axis=1
= flatten['KIS_credit_&_2018']
 
 
#방법1
from sklearn.model_selection import train_test_split 
train_test_split(X, y, random_state=0, stratify=y, shuffle=True) 
 
train=flatten.iloc[train_inds] 
test=flatten.iloc[test_inds]
 
#방법2
from sklearn.model_selection import train_test_split 
from sklearn.model_selection import GroupShuffleSplit 
  
train_inds, test_inds=next(GroupShuffleSplit(test_size=.3,n_splits=10,random_state=7).split(flatten,groups=flatten['Name'])) 
train=flatten.iloc[train_inds] 
test=flatten.iloc[test_inds]
 
 
#방법3(추천)
from sklearn.model_selection import StratifiedShuffleSplit
split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
 
for train_index, test_index in split.split(X, y):
    strat_train_set = flatten.loc[train_index]
    strat_test_set = flatten.loc[test_index]
cs

 

 

쪼갠 후 , 확인하는 코드:

1
2
3
 
strat_test_set.groupby(['KIS_credit_&_2018'])['Name'].count()
 
 
Comments