본문 바로가기
AI월드/⚙️AI BOOTCAMP_Section 6

FINAL PROJECT 2_PDF parser

by khalidpark 2021. 6. 24.

2번째 단계

적절한 PDF 파서를 찾고, 데이터를 찾는 과정

(0) PyPDF2

https://pythonhosted.org/PyPDF2/

 

PyPDF2 Documentation — PyPDF2 1.26.0 documentation

 

pythonhosted.org

(1) docparser

https://docparser.com/

 

Docparser - Document Parser Software - Extract Data From PDF to Excel, JSON and Webhooks

The leading document parser. Extract data from PDF to Excel, JSON or update apps with webhooks via Docparser.

docparser.com

https://pypi.org/project/PyDocParser/

 

PyDocParser

A python client for the DocParser API

pypi.org

 

(2) nanonets

https://nanonets.com/documentation/#

 

NanoNets API Reference

 

nanonets.com

 

(3) PeePDF
https://pypi.org/project/peepdf/0.3.2/

 

peepdf

UNKNOWN

pypi.org

(4) py pdf parser

https://pypi.org/project/py-pdf-parser/

 

py-pdf-parser

A tool to help extracting information from structured PDFs.

pypi.org

 

(5) PikePDF

https://pikepdf.readthedocs.io/en/latest/

 

pikepdf Documentation — pikepdf 2.12.1 documentation

pikepdf is a library intended for developers who want to create, manipulate, parse, repair, and abuse the PDF format. It supports reading and write PDFs, including creating from scratch. Thanks to QPDF, it supports linearizing PDFs and access to encrypted

pikepdf.readthedocs.io


참고자료 :

https://towardsdatascience.com/pdf-preprocessing-with-python-19829752af9f

 

PDF Processing with Python

The way to extract text from your pdf documents.

towardsdatascience.com

https://www.youtube.com/watch?v=UmPe07a3bWs 

 

728x90

댓글