'Data Analysis' 카테고리의 글 목록 (11 Page)

Notice

Recent Posts

Recent Comments

Link

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

목록Data Analysis (116)

지방이의 Data Science Lab

[R] geom_line (geom_path: Each group consists of only one observation. )

geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? 위와 같은 에러 코드가 나는이유: 엄청 간단한 그룹도 없는 가장 basic한 라인그래프인데 왜 안그려질까? 할때는 group=1이라는 값을 지정해주지 않아서 그런것이다. library(ggplot2) ggplot(data, aes(x=X1, y=상관계수, group=1)) + geom_line(colour='#68C8CB')+ theme_bw()+xlab("")

Data Analysis/깨R지식 2020. 1. 2. 15:42

[R] 조건에 맞는 특정 데이터 추출

data[(data$group %in% 'c'),] 밑줄 부분은 인덱스로 반환된다 True혹은 False로 생겼고, 코드가 실행되면 데이터형식안에 true에 해당하는 인덱스만 추출한 특정 데이터 즉, data라는 데이터프레임 속 group c만 보고싶을때 사용. data = subset(data, !(data$시작 %in% 20193025)) 20193025는 누가봐도 없는날짜. err를 drop하고 보고 싶을때 사용.

Data Analysis/깨R지식 2019. 12. 30. 00:52

[R] 이상치 or NA 처리

["NA" to NA] 1 data = mutate_all(data, funs(replace(., .=='NA', NA))) [Mean Imputation] 1 data$AGE[data$USERID=="홍길동"] = mean(mean(data$AGE[!is.na(data$AGE)])) [이상치] #1. 0보다 작을 수 없는 경우인데 0보다 작게 나온 데이터 이상치 삭제 1 2 idx = which(S_table$Sales.M.2015

Data Analysis/Data Preprocessing 2019. 12. 28. 17:19

[R] x축 날짜 데이터 시각화

https://www.r-graph-gallery.com/316-possible-inputs-for-the-dygraphs-library.html An introduction to interactive time series with R and dygraphs This post is an introduction to the dygraphs package for interactive time series visualization with R. It shows how to deal with various input formats, and what are the main chart types offered. www.r-graph-gallery.com 위 사이트에 좋은 시각화 방법이 많다. 내가 공부하다 찾은 좋..

Data Analysis/깨R지식 2019. 12. 28. 16:34

[R] 날짜 열로 할 수 있는 컬럼관리 방법

(데이터 크기가 커질수록 lubridate함수로 관리하는 것보다 string으로 들고 있는 것이 억만배 가볍다. 월별 추출도 string단위로 긁어오는게 훨씬 빠른 속도를 보인다.) [2019.12.25 => 20191225] library(stringr) monthly$관리년월 = str_replace_all(monthly$관리년월, "[.]", "") monthly$yearmonth = str_sub(monthly$yearmonth,1,7) [201912 => 2019-12-01] '몇일' 변수를 꼭 추가해야 하는 경우가 있다. (시각화) 예를들어 x축에 날짜를 넣고 싶은 경우가 그렇다. 그럴때 보통 사용하는 코드가 이것: temp$yearmonth = as.Date(ymd(paste0(temp$ye..

Data Analysis/깨R지식 2019. 12. 28. 16:09

[R] get_dummies와 같은 형태

방법1.(추천) temp2 = summarise(group_by(data, yearmonth, 원하는 컬럼))%>%arrange(yearmonth) temp3 = model.matrix(~원하는 컬럼, temp2)%>%data.frame() temp3[,1]=NULL 한 컬럼에 담겨진 character들을 여러 컬럼으로 더미화하고 싶을때 사용. 방법2.(비추) library(mlr) df

Data Analysis/깨R지식 2019. 12. 28. 16:02

[R] 계절컬럼 파생변수 추가

library(stringr) temp1$month = str_sub(temp1$yearmonth,5,6) temp1$month = as.numeric(temp1$month) seasons = function(x){ if(x %in% 2:4) return('Spring') if(x %in% 5:7) return('Summer') if(x %in% 8:10) return('Fall') if(x %in% c(11,12,1)) return('Winter') } temp1$season = sapply(temp1$month, seasons)

Data Analysis/깨R지식 2019. 12. 27. 02:45

[python] nlp 자연어 처리 3 (불용어 처리를 위한 공통 단어 뽑기)

https://jlim0316.tistory.com/97 앞서 만들었던 vocab에서 스팸인 문자와 아닌문자 중 공통으로 들어가는 단어가 존재할 것이다. 예를들어, '하다'라는 단어가 있다고 치자. 스팸인 문자에서는 '하다'가 1번 나왔고 스팸아닌 문자에선 '하다'가 100번 나왔다고 생각해보자. common단어라 보기 어렵다. 스팸아닌데서만 하다가 나오는 경우가 훨씬 많기 때문에. 이런경우 min값을 출력하되 정렬을 큰 숫자부터 보고 75% 퀀타일로 잘라서 불용어로 보겠다. max값을 출력하되 정렬을 작은 숫자부터 정렬시키고 25% 퀀타일로 잘라서 불용어 처리 하겠다. 나온 결과는 다음과 같다: common = [min_common,max_common] common = sum(common, []) co..

Data Analysis/Natural language processing 2019. 12. 6. 18:41

Prev 1 ··· 8 9 10 11 12 13 14 15 Next

목록Data Analysis (116)

지방이의 Data Science Lab

티스토리툴바