본문 바로가기
코딩월드/🕹️토이프로젝트

E-Commerce 결제 내역 분석 EDA

by khalidpark 2021. 2. 6.

 

 

 

In [38]:
!pip install nbconvert
 
Requirement already satisfied: nbconvert in /usr/local/lib/python3.6/dist-packages (5.6.1)
Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (0.8.4)
Requirement already satisfied: entrypoints>=0.2.2 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (0.3)
Requirement already satisfied: jupyter-core in /usr/local/lib/python3.6/dist-packages (from nbconvert) (4.7.1)
Requirement already satisfied: pygments in /usr/local/lib/python3.6/dist-packages (from nbconvert) (2.6.1)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.6/dist-packages (from nbconvert) (0.6.0)
Requirement already satisfied: jinja2>=2.4 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (2.11.3)
Requirement already satisfied: testpath in /usr/local/lib/python3.6/dist-packages (from nbconvert) (0.4.4)
Requirement already satisfied: nbformat>=4.4 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (5.1.2)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (1.4.3)
Requirement already satisfied: bleach in /usr/local/lib/python3.6/dist-packages (from nbconvert) (3.3.0)
Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (4.3.3)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.6/dist-packages (from jinja2>=2.4->nbconvert) (1.1.1)
Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.4->nbconvert) (0.2.0)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.4->nbconvert) (2.6.0)
Requirement already satisfied: webencodings in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert) (0.5.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert) (20.9)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert) (1.15.0)
Requirement already satisfied: decorator in /usr/local/lib/python3.6/dist-packages (from traitlets>=4.2->nbconvert) (4.4.2)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->bleach->nbconvert) (2.4.7)
In [4]:
import pandas as pd
In [5]:
import matplotlib as mpl
import matplotlib.pyplot as plt
 
%config InlineBackend.figure_format = 'retina'
 
!apt -qq -y install fonts-nanum
 
import matplotlib.font_manager as fm
fontpath = '/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf'
font = fm.FontProperties(fname=fontpath, size=9)
plt.rc('font', family='NanumBarunGothic') 
mpl.font_manager._rebuild()
 
fonts-nanum is already the newest version (20170925-1).
0 upgraded, 0 newly installed, 0 to remove and 15 not upgraded.
In [6]:
df = pd.read_csv('/content/Download.CSV')
In [7]:
df.info()
 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 417 entries, 0 to 416
Data columns (total 41 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   날짜                 417 non-null    object 
 1   시간                 417 non-null    object 
 2   시간대                417 non-null    object 
 3   이름                 363 non-null    object 
 4   유형                 417 non-null    object 
 5   상태                 417 non-null    object 
 6   통화                 417 non-null    object 
 7   총액                 417 non-null    object 
 8   수수료                417 non-null    float64
 9   순액                 417 non-null    object 
 10  보내는 이메일 주소         399 non-null    object 
 11  받는 이메일 주소          381 non-null    object 
 12  거래 ID              417 non-null    object 
 13  배송 주소              352 non-null    object 
 14  주소 상태              363 non-null    object 
 15  상품 이름              363 non-null    object 
 16  상품 ID              0 non-null      float64
 17  배송 및 처리 금액         330 non-null    float64
 18  보험 금액              0 non-null      float64
 19  판매세                337 non-null    float64
 20  옵션 1 이름            0 non-null      float64
 21  옵션 1 값             0 non-null      float64
 22  옵션 2 이름            0 non-null      float64
 23  옵션 2 값             0 non-null      float64
 24  참조 납세 번호           70 non-null     object 
 25  인보이스 번호            363 non-null    object 
 26  맞춤형 번호             359 non-null    object 
 27  수량                 363 non-null    float64
 28  영수증 ID             3 non-null      float64
 29  잔액                 417 non-null    object 
 30  주소 입력란 1           352 non-null    object 
 31  주소 입력란 2/지역/동      171 non-null    object 
 32  시/읍                352 non-null    object 
 33  주/도/지역/군/준주/현/공화국  128 non-null    object 
 34  우편번호               203 non-null    object 
 35  국가                 352 non-null    object 
 36  연락처 전화번호           360 non-null    float64
 37  제목                 363 non-null    object 
 38  메모                 3 non-null      object 
 39  국가번호               352 non-null    object 
 40  잔액 영향              417 non-null    object 
dtypes: float64(12), object(29)
memory usage: 133.7+ KB
 

중요한 해당 컬럼만 따로 빼서 df로 활용하고자 함

'날짜','시간,'이름','유형','상태','통화' ,'총액' ,'국가번호'


시간대별 결제빈도 및 금액 파악 -> 해당 시간대에 반짝이벤트

날짜별 결제빈도 및 금액 파악 -> 급여날? 월초월중월말 어떤 날로 마케팅할지

국가별 평균결제금액 -> 파악 후 다음 상품 적정한 가격 선정

In [8]:
df = df.loc[:,['날짜','시간','이름','유형' ,'총액' ,'국가번호']]
df.head()
Out[8]:
  날짜 시간 이름 유형 총액 국가번호
0 2020.01.01 10:22:55 NADIA ALSHAIJI 웹사이트 결제 280.00 KW
1 2020.01.01 16:19:39 Reem al teneiji 웹사이트 결제 2,550.00 AE
2 2020.01.02 11:53:27 Bothaina Al-Thani 결제 환불 -180.00 NaN
3 2020.01.02 14:46:27 humaid bin hendi 웹사이트 결제 150.00 US
4 2020.01.03 13:27:49 NADIA ALSHAIJI 웹사이트 결제 140.00 KW
In [9]:
df.info()
 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 417 entries, 0 to 416
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   날짜      417 non-null    object
 1   시간      417 non-null    object
 2   이름      363 non-null    object
 3   유형      417 non-null    object
 4   총액      417 non-null    object
 5   국가번호    352 non-null    object
dtypes: object(6)
memory usage: 19.7+ KB
In [10]:
# 유형 컬럼에서 '익스프레스 체크아웃 결제'만 선택

df = df[df['유형'] == '익스프레스 체크아웃 결제']
In [11]:
df.head()
Out[11]:
  날짜 시간 이름 유형 총액 국가번호
34 2020.03.27 15:44:15 Yousef alamid 익스프레스 체크아웃 결제 0.10 SA
37 2020.05.21 07:34:07 Fayeza Ahmed 익스프레스 체크아웃 결제 85.00 AE
38 2020.05.22 21:35:54 Wafa Al Mahrami 익스프레스 체크아웃 결제 85.00 OM
39 2020.05.23 01:49:32 Sarah ALshamari 익스프레스 체크아웃 결제 85.00 QA
40 2020.05.24 01:25:09 wadha alraqeeb 익스프레스 체크아웃 결제 85.00 KW
In [12]:
import sys
if 'google.colab' in sys.modules:
    !pip install category_encoders==2.*
    !pip install pandas-profiling==2.*
    !pip install plotly==4.*
 
Collecting category_encoders==2.*
  Downloading https://files.pythonhosted.org/packages/44/57/fcef41c248701ee62e8325026b90c432adea35555cbc870aff9cfba23727/category_encoders-2.2.2-py2.py3-none-any.whl (80kB)
     |████████████████████████████████| 81kB 3.4MB/s 
Requirement already satisfied: pandas>=0.21.1 in /usr/local/lib/python3.6/dist-packages (from category_encoders==2.*) (1.1.5)
Requirement already satisfied: patsy>=0.5.1 in /usr/local/lib/python3.6/dist-packages (from category_encoders==2.*) (0.5.1)
Requirement already satisfied: numpy>=1.14.0 in /usr/local/lib/python3.6/dist-packages (from category_encoders==2.*) (1.19.5)
Requirement already satisfied: scipy>=1.0.0 in /usr/local/lib/python3.6/dist-packages (from category_encoders==2.*) (1.4.1)
Requirement already satisfied: scikit-learn>=0.20.0 in /usr/local/lib/python3.6/dist-packages (from category_encoders==2.*) (0.22.2.post1)
Requirement already satisfied: statsmodels>=0.9.0 in /usr/local/lib/python3.6/dist-packages (from category_encoders==2.*) (0.10.2)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.21.1->category_encoders==2.*) (2018.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.21.1->category_encoders==2.*) (2.8.1)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from patsy>=0.5.1->category_encoders==2.*) (1.15.0)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.6/dist-packages (from scikit-learn>=0.20.0->category_encoders==2.*) (1.0.0)
Installing collected packages: category-encoders
Successfully installed category-encoders-2.2.2
Collecting pandas-profiling==2.*
  Downloading https://files.pythonhosted.org/packages/3d/8e/645ad7f304dd8d6d7181d22d4bd3d6356331c80c2944a25be3ebe617ec38/pandas_profiling-2.10.0-py2.py3-none-any.whl (239kB)
     |████████████████████████████████| 245kB 4.2MB/s 
Collecting tqdm>=4.48.2
  Downloading https://files.pythonhosted.org/packages/80/02/8f8880a4fd6625461833abcf679d4c12a44c76f9925f92bf212bb6cefaad/tqdm-4.56.0-py2.py3-none-any.whl (72kB)
     |████████████████████████████████| 81kB 6.3MB/s 
Requirement already satisfied: pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling==2.*) (1.1.5)
Collecting htmlmin>=0.1.12
  Downloading https://files.pythonhosted.org/packages/b3/e7/fcd59e12169de19f0131ff2812077f964c6b960e7c09804d30a7bf2ab461/htmlmin-0.1.12.tar.gz
Collecting phik>=0.10.0
  Downloading https://files.pythonhosted.org/packages/d9/27/d4197ed93c26d9eeedb7c73c0f24462a65c617807c3140e012950c35ccf9/phik-0.11.0.tar.gz (594kB)
     |████████████████████████████████| 604kB 8.9MB/s 
Requirement already satisfied: ipywidgets>=7.5.1 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling==2.*) (7.6.3)
Collecting visions[type_image_path]==0.6.0
  Downloading https://files.pythonhosted.org/packages/98/30/b1e70bc55962239c4c3c9660e892be2d8247a882135a3035c10ff7f02cde/visions-0.6.0-py3-none-any.whl (75kB)
     |████████████████████████████████| 81kB 6.8MB/s 
Requirement already satisfied: missingno>=0.4.2 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling==2.*) (0.4.2)
Requirement already satisfied: scipy>=1.4.1 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling==2.*) (1.4.1)
Requirement already satisfied: numpy>=1.16.0 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling==2.*) (1.19.5)
Requirement already satisfied: attrs>=19.3.0 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling==2.*) (20.3.0)
Requirement already satisfied: matplotlib>=3.2.0 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling==2.*) (3.2.2)
Collecting tangled-up-in-unicode>=0.0.6
  Downloading https://files.pythonhosted.org/packages/4a/e2/e588ab9298d4989ce7fdb2b97d18aac878d99dbdc379a4476a09d9271b68/tangled_up_in_unicode-0.0.6-py3-none-any.whl (3.1MB)
     |████████████████████████████████| 3.1MB 17.1MB/s 
Collecting requests>=2.24.0
  Downloading https://files.pythonhosted.org/packages/29/c1/24814557f1d22c56d50280771a17307e6bf87b70727d975fd6b2ce6b014a/requests-2.25.1-py2.py3-none-any.whl (61kB)
     |████████████████████████████████| 61kB 6.8MB/s 
Requirement already satisfied: seaborn>=0.10.1 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling==2.*) (0.11.1)
Requirement already satisfied: jinja2>=2.11.1 in /usr/local/lib/python3.6/dist-packages (from pandas-profiling==2.*) (2.11.3)
Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from pandas-profiling==2.*) (1.0.0)
Collecting confuse>=1.0.0
  Downloading https://files.pythonhosted.org/packages/6d/55/b4726d81e5d6509fa3441f770f8a9524612627dc1b2a7d6209d1d20083fe/confuse-1.4.0-py2.py3-none-any.whl
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling==2.*) (2018.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.6/dist-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling==2.*) (2.8.1)
Requirement already satisfied: numba>=0.38.1 in /usr/local/lib/python3.6/dist-packages (from phik>=0.10.0->pandas-profiling==2.*) (0.51.2)
Requirement already satisfied: nbformat>=4.2.0 in /usr/local/lib/python3.6/dist-packages (from ipywidgets>=7.5.1->pandas-profiling==2.*) (5.1.2)
Requirement already satisfied: jupyterlab-widgets>=1.0.0; python_version >= "3.6" in /usr/local/lib/python3.6/dist-packages (from ipywidgets>=7.5.1->pandas-profiling==2.*) (1.0.0)
Requirement already satisfied: ipykernel>=4.5.1 in /usr/local/lib/python3.6/dist-packages (from ipywidgets>=7.5.1->pandas-profiling==2.*) (4.10.1)
Requirement already satisfied: widgetsnbextension~=3.5.0 in /usr/local/lib/python3.6/dist-packages (from ipywidgets>=7.5.1->pandas-profiling==2.*) (3.5.1)
Requirement already satisfied: ipython>=4.0.0; python_version >= "3.3" in /usr/local/lib/python3.6/dist-packages (from ipywidgets>=7.5.1->pandas-profiling==2.*) (5.5.0)
Requirement already satisfied: traitlets>=4.3.1 in /usr/local/lib/python3.6/dist-packages (from ipywidgets>=7.5.1->pandas-profiling==2.*) (4.3.3)
Requirement already satisfied: networkx>=2.4 in /usr/local/lib/python3.6/dist-packages (from visions[type_image_path]==0.6.0->pandas-profiling==2.*) (2.5)
Requirement already satisfied: Pillow; extra == "type_image_path" in /usr/local/lib/python3.6/dist-packages (from visions[type_image_path]==0.6.0->pandas-profiling==2.*) (7.0.0)
Collecting imagehash; extra == "type_image_path"
  Downloading https://files.pythonhosted.org/packages/8e/18/9dbb772b5ef73a3069c66bb5bf29b9fb4dd57af0d5790c781c3f559bcca6/ImageHash-4.2.0-py2.py3-none-any.whl (295kB)
     |████████████████████████████████| 296kB 35.6MB/s 
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.2.0->pandas-profiling==2.*) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.2.0->pandas-profiling==2.*) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.2.0->pandas-profiling==2.*) (0.10.0)
Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests>=2.24.0->pandas-profiling==2.*) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests>=2.24.0->pandas-profiling==2.*) (2020.12.5)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests>=2.24.0->pandas-profiling==2.*) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests>=2.24.0->pandas-profiling==2.*) (2.10)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.6/dist-packages (from jinja2>=2.11.1->pandas-profiling==2.*) (1.1.1)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.6/dist-packages (from confuse>=1.0.0->pandas-profiling==2.*) (3.13)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.7.3->pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling==2.*) (1.15.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from numba>=0.38.1->phik>=0.10.0->pandas-profiling==2.*) (53.0.0)
Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /usr/local/lib/python3.6/dist-packages (from numba>=0.38.1->phik>=0.10.0->pandas-profiling==2.*) (0.34.0)
Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (0.2.0)
Requirement already satisfied: jupyter-core in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (4.7.1)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (2.6.0)
Requirement already satisfied: jupyter-client in /usr/local/lib/python3.6/dist-packages (from ipykernel>=4.5.1->ipywidgets>=7.5.1->pandas-profiling==2.*) (5.3.5)
Requirement already satisfied: tornado>=4.0 in /usr/local/lib/python3.6/dist-packages (from ipykernel>=4.5.1->ipywidgets>=7.5.1->pandas-profiling==2.*) (5.1.1)
Requirement already satisfied: notebook>=4.4.1 in /usr/local/lib/python3.6/dist-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (5.3.1)
Requirement already satisfied: pygments in /usr/local/lib/python3.6/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling==2.*) (2.6.1)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/local/lib/python3.6/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling==2.*) (4.8.0)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.6/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling==2.*) (0.7.5)
Requirement already satisfied: decorator in /usr/local/lib/python3.6/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling==2.*) (4.4.2)
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/local/lib/python3.6/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling==2.*) (1.0.18)
Requirement already satisfied: simplegeneric>0.8 in /usr/local/lib/python3.6/dist-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling==2.*) (0.8.1)
Requirement already satisfied: PyWavelets in /usr/local/lib/python3.6/dist-packages (from imagehash; extra == "type_image_path"->visions[type_image_path]==0.6.0->pandas-profiling==2.*) (1.1.1)
Requirement already satisfied: pyzmq>=13 in /usr/local/lib/python3.6/dist-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets>=7.5.1->pandas-profiling==2.*) (22.0.2)
Requirement already satisfied: terminado>=0.8.1 in /usr/local/lib/python3.6/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (0.9.2)
Requirement already satisfied: Send2Trash in /usr/local/lib/python3.6/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (1.5.0)
Requirement already satisfied: nbconvert in /usr/local/lib/python3.6/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (5.6.1)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.6/dist-packages (from pexpect; sys_platform != "win32"->ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling==2.*) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.6/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython>=4.0.0; python_version >= "3.3"->ipywidgets>=7.5.1->pandas-profiling==2.*) (0.2.5)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.6/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (0.6.0)
Requirement already satisfied: entrypoints>=0.2.2 in /usr/local/lib/python3.6/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (0.3)
Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.6/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (0.8.4)
Requirement already satisfied: testpath in /usr/local/lib/python3.6/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (0.4.4)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.6/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (1.4.3)
Requirement already satisfied: bleach in /usr/local/lib/python3.6/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (3.3.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (20.9)
Requirement already satisfied: webencodings in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling==2.*) (0.5.1)
Building wheels for collected packages: htmlmin, phik
  Building wheel for htmlmin (setup.py) ... done
  Created wheel for htmlmin: filename=htmlmin-0.1.12-cp36-none-any.whl size=27085 sha256=5edd414b693da6ab0ae09f82e0eb8601e5d00e171420615c311dd3eaf746ccee
  Stored in directory: /root/.cache/pip/wheels/43/07/ac/7c5a9d708d65247ac1f94066cf1db075540b85716c30255459
  Building wheel for phik (setup.py) ... done
  Created wheel for phik: filename=phik-0.11.0-cp36-none-any.whl size=599738 sha256=8e809ac205b91ae9d091425a3fface82fb9d65e7c38d4f436fad16ee17faf964
  Stored in directory: /root/.cache/pip/wheels/af/54/11/aba77f21075918de02f7964eabfe8c10d5542df9e6ad10b225
Successfully built htmlmin phik
ERROR: google-colab 1.0.0 has requirement requests~=2.23.0, but you'll have requests 2.25.1 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.
Installing collected packages: tqdm, htmlmin, phik, tangled-up-in-unicode, imagehash, visions, requests, confuse, pandas-profiling
  Found existing installation: tqdm 4.41.1
    Uninstalling tqdm-4.41.1:
      Successfully uninstalled tqdm-4.41.1
  Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
  Found existing installation: pandas-profiling 1.4.1
    Uninstalling pandas-profiling-1.4.1:
      Successfully uninstalled pandas-profiling-1.4.1
Successfully installed confuse-1.4.0 htmlmin-0.1.12 imagehash-4.2.0 pandas-profiling-2.10.0 phik-0.11.0 requests-2.25.1 tangled-up-in-unicode-0.0.6 tqdm-4.56.0 visions-0.6.0
Requirement already satisfied: plotly==4.* in /usr/local/lib/python3.6/dist-packages (4.4.1)
Requirement already satisfied: retrying>=1.3.3 in /usr/local/lib/python3.6/dist-packages (from plotly==4.*) (1.3.3)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from plotly==4.*) (1.15.0)
In [13]:
from pandas_profiling import ProfileReport

profile = ProfileReport(df, title="dataset")
profile
 
 
 
 
 
 
 
In [14]:

 

df['총액'] = pd.to_numeric(df['총액'])
In [15]:
df.info()
 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 329 entries, 34 to 413
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   날짜      329 non-null    object 
 1   시간      329 non-null    object 
 2   이름      329 non-null    object 
 3   유형      329 non-null    object 
 4   총액      329 non-null    float64
 5   국가번호    329 non-null    object 
dtypes: float64(1), object(5)
memory usage: 18.0+ KB
In [16]:
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

sns.set_theme(style="whitegrid")
ax = sns.barplot(x="국가번호", y="총액", data=df)
ax.set(xlabel='country', ylabel='payment per person')

for p in ax.patches:
    ax.annotate(format(p.get_height(), '.1f'), 
                   (p.get_x() + p.get_width() / 2., p.get_height()), 
                   ha = 'center', va = 'center', 
                   xytext = (0, 9), 
                   textcoords = 'offset points')

plt.show()
 
 

최저평균가격은 사우디아라비아의 118.5불

바레인은 결제건이 많지 않으므로 그다음 카타르에 의미를 두면

최고 평균가격은 151.7불

GB는 예외국가이므로 제외

 

In [17]:
#시간 데이터 object 를 datetime 형식으로 변경

df['시간'] = pd.to_datetime(df['시간'].str.strip(), format='%H:%M:%S', errors='raise')
In [18]:
# 1시간 단위로 끊어서 총 결제금액을 합침

df2 = df.groupby(df.시간.dt.to_period('H')).agg('sum')
In [20]:
df2.head()
Out[20]:
  총액
시간  
1900-01-01 00:00 2219.69
1900-01-01 01:00 2844.34
1900-01-01 02:00 2701.15
1900-01-01 03:00 3962.65
1900-01-01 04:00 2955.30
In [24]:
df2.reset_index()
Out[24]:
  시간 총액
0 1900-01-01 00:00 2219.69
1 1900-01-01 01:00 2844.34
2 1900-01-01 02:00 2701.15
3 1900-01-01 03:00 3962.65
4 1900-01-01 04:00 2955.30
5 1900-01-01 05:00 1919.26
6 1900-01-01 06:00 1431.84
7 1900-01-01 07:00 760.95
8 1900-01-01 08:00 435.19
9 1900-01-01 09:00 265.80
10 1900-01-01 10:00 1278.95
11 1900-01-01 11:00 1091.96
12 1900-01-01 12:00 474.82
13 1900-01-01 13:00 428.20
14 1900-01-01 14:00 599.10
15 1900-01-01 15:00 959.76
16 1900-01-01 16:00 1753.68
17 1900-01-01 17:00 2157.51
18 1900-01-01 18:00 3232.22
19 1900-01-01 19:00 2696.92
20 1900-01-01 20:00 1620.58
21 1900-01-01 21:00 2484.14
22 1900-01-01 22:00 3360.01
23 1900-01-01 23:00 2533.30
In [32]:
df2.info()
 
<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 24 entries, 1900-01-01 00:00 to 1900-01-01 23:00
Freq: H
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   총액      24 non-null     float64
dtypes: float64(1)
memory usage: 384.0 bytes

 

728x90

댓글