[python] file 속 데이터들을 전부 가져오는 방법 glob.glob

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

지방이의 Data Science Lab

[python] file 속 데이터들을 전부 가져오는 방법 glob.glob 본문

Data Analysis/Python

[python] file 속 데이터들을 전부 가져오는 방법 glob.glob

[지현] 2020. 2. 8. 15:15

1
2

import glob
(glob.glob("../data/x/*/*.csv"))

위 코드 번역

==>data라는 폴더안에 들어있는 모든 폴더에 들어가서 .csv에 해당하는 모든 파일을 데려와서 directory를 보여라.

응용: ravel과 glob를 이용해 데이터 전처리 하는 방법 (옆으로 늘어져있던걸 밑으로 늘리는 방법)

https://jlim0316.tistory.com/122
ravel c에 이어지는

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

y = pd.read_csv("../data/y.csv",encoding ='cp949', skiprows = 6)

colnames = [x for x in y.columns]

company = colnames[0:3]

del colnames[0:3]

credit = pd.Series(y

          .set_index(company)[colnames]

          .values.ravel('C'))

credit = pd.DataFrame({'KIS_credit':credit})

company = y.loc[y.index.repeat(5)].reset_index(drop=True)[['KIS', 'Stock', 'Name']]

data = pd.concat([company, credit], axis = 1)

year = cycle([2014, 2015, 2016, 2017, 2018])

data['year'] = [next(year) for count in range(data.shape[0])]

#-------------------------------------------------------------------------#

def data_preprocessing(readcsv_string, col_string):



    y = pd.read_csv(readcsv_string, encoding ='cp949', skiprows = 6)

    colnames = [x for x in y.columns]

    company = colnames[0:3]

    del colnames[0:3]

    credit = pd.Series(y

                       .set_index(company)[colnames]

                       .values.ravel('C'))



    credit = pd.DataFrame({col_string:credit})

    company = y.loc[y.index.repeat(5)].reset_index(drop=True)[['KIS', 'Stock', 'Name']]

    y = pd.concat([company, credit], axis = 1)

    year = cycle([2014, 2015, 2016, 2017, 2018])

    y['year'] = [next(year) for count in range(y.shape[0])]



    return(y)



#-----------------------------------------------------------------------#

filename = glob.glob("../data/x/*/*.csv")

for f in filename:

    colname = f.split('\\')[2].split('.')[0]

    temp = data_preprocessing(f,colname)

    data[colname] = temp[colname]

data

저작자표시 비영리 동일조건

'Data Analysis > Python' 카테고리의 다른 글

[python] x, y 쪼개기, train, test 쪼개기 (1)	2020.02.09
[python] imputation (0)	2020.02.08
[python] one row to multiple rows (0)	2020.02.07
[python] 원하는 string포함한 pd.dataframe 필터링 (0)	2020.02.05
[python] key id가 multiple 관측치일때 갯수 일정하게 (1)	2020.02.01

'Data Analysis/Python' Related Articles

Comments

지방이의 Data Science Lab

[python] file 속 데이터들을 전부 가져오는 방법 glob.glob 본문

[python] file 속 데이터들을 전부 가져오는 방법 glob.glob

'Data Analysis > Python' 카테고리의 다른 글

티스토리툴바