본문 바로가기
AI월드/⚙️AI BOOTCAMP_Section 1

1sample vs 2sample chisquare 코드_Day7(3)

by khalidpark 2021. 1. 17.

[1 sample chisquare]

목적 : 주어진 데이터가 균등 한 분포를 나타내고 있는지 확인

Expected => sum / 데이터 수

import numpy as np
data = np.array([10, 11, 10, 12, 10, 11])

exp = np.sum(data) / 6  # [10.6, 10.6, 10.6, 10.6, 10.6, 10.6]

chi = np.sum(np.power(data - exp, 2) / exp) # chisquare statistics = 0.3125
print(chi)

print( 1 - stats.chi2.cdf(chi, df = 6 - 1)) # pvalue : 0.9974013615235537
from scipy.stats import chisquare  

data = np.array([10,11,10,12,10,11])

print(chisquare(data)) # statistic=0.3125, pvalue=0.9974013615235537

 


[2 sample chisquare]

목적 : 2개의 데이터가 연관이 있는지를 확인 (Frequency 기반)

Expected = rowsum*colsum/totalsum

data = np.array([  
[10, 12] ,
[14, 16] 
])

exp = np.array([
[(10+12)*(10+14), (10+12)*(12+16)],
[(14+16)*(10+14), (14+16)*(12+16)]
]) / np.sum(data)

chi = np.sum( np.power(data-exp,2)/exp ) # 0.007503607503607451

print( 1 - stats.chi2.cdf(chi, df = (2-1)*(2-1) )) # 0.9309708924815491
from scipy.stats import chi2_contingency
data = np.array([  
[10, 12] ,
[14, 16] 
])
print(chi2_contingency(data, correction = False))  # chi = 0.007503607503607451, pvalue =  0.9309708924815491

 

728x90

댓글