logo

PDF EXTRACTION

PDF Extraction consist of PDF to Text, PDF TO Text (Compatible for PDF with images), PDF Form to JSON, PDF to CSV

1.PDF TO TEXT

Introduction This documentation provides information on how to use the API for extracting text from PDFs. The API supports two input types: PDF File URL and PDF File. Depending on the input type, the API expects different parameters and follows distinct request formats.

1.1. Extract Text from PDF by PDF File URL

AVAILABLE METHODS

METHOD POST
URL /extract-text-from-pdf
Request Typeurlencoded

QS Parameters

Attribute Description Type Required Option
fileURL text true
pdf_password Enter the password if pdf is password protected. text false

1.2. Extract Text from PDF by PDF File

AVAILABLE METHODS

Method POST
URL /extract-text-from-pdf
HeadersContent-Type: application/octet-stream
QS Parameters
Attribute Description Type Required Option
pdf_password Enter the password if pdf is password protected. text false
Body Parameters
Attribute Description Type Required Option
PDF File Binary data of the PDF File binary True

Example Request

curl -X POST 'https://pdf.msquare.pro/extract-text-from-pdf?appName=DocCrafter&fileURL=https://www.africau.edu/images/default/sample.pdf&pdf_password=1234' \
            -H 'authorization: YourAuthTokenHere' \
            -H 'content-type: application/x-www-form-urlencoded' \
            -d ''
             

2.PDF TO TEXT(Compatible for PDF with images)


Introduction This API enables the extraction of text from PDFs containing images. Two input methods are supported: data (using multipart/form-data) and url (using application/json). Please follow the guidelines below for a seamless integration.

Data

Available Method

Method POST
URL /pdfWithImages-to-text
Headers: Content-Type: multipart/form-data
type: multipart/form-data

Body Parameters
Attribute Description Type Required Option
file PDF File file True
text_language Select image text language for better accuracy. text false Afrikaans
Amharic
Arabic
Assamese
Azerbaijani
Azerbaijani (Cyrillic)
Belarusian
Bengali
Tibetan
Bosnian
Breton
Bulgarian
Catalan
Cebuano
Czech
Chinese Simplified
Chinese Simplified (Vertical)
Chinese Traditional
Chinese Traditional (Vertical)
Cherokee
Corsican
Welsh
German
Divehi
Dzongkha
Greek
English
Middle English
Esperanto
Math / Equation Detection
Estonian
Basque
Faroese
Persian
Filipino
Finnish
French
Frankish
Middle French
Frisian
Scottish Gaelic
Irish
Galician
Ancient Greek
Gujarati
Haitian Creole
Hebrew
Hindi
Croatian
Hungarian
Armenian
Inuktitut
Indonesian
Icelandic
Italian
Old Italian
Javanese
Japanese
Japanese (Vertical)
Kannada
Georgian
Old Georgian
Kazakh
Khmer
Kyrgyz
Kurdish (Kurmanji)
Korean
Lao
Latin
Latvian
Lithuanian
Luxembourgish
Malayalam
Marathi
Macedonian
Maltese
Mongolian
Maori
Malay
Burmese
Nepali
Dutch
Norwegian
Occitan
Oriya
Orientation and script detection (OSD)
Punjabi
Polish
Portuguese
Pashto
Quechua
Romanian
Russian
Sanskrit
Arabic Script
Armenian Script
Bengali Script
Canadian Aboriginal Script
Cherokee Script
Cyrillic Script
Devanagari Script
Ethiopic Script
Fraktur Script
Georgian Script
Greek Script
Gujarati Script
Gurmukhi Script
HanS Script
HanS (Vertical) Script
HanT Script
HanT (Vertical) Script
Hangul Script
Hangul (Vertical) Script
Hebrew Script
Japanese Script
Japanese (Vertical) Script
Kannada Script
Khmer Script
Lao Script
Latin Script
Malayalam Script
Myanmar Script
Oriya Script
Sinhala Script
Syriac Script
Tamil Script
Telugu Script
Thaana Script
Thai Script
Tibetan Script
Vietnamese Script
Sinhalese
Slovak
Slovenian
Sindhi
Spanish
Old Spanish
Albanian
Serbian
Serbian (Latin)
Sundanese
Swahili
Swedish
Syriac
Tamil
Tatar
Telugu
Tajik
Thai
Tigrinya
Tongan
Turkish
Uighur
Ukrainian
Urdu
Uzbek
Uzbek (Cyrillic)
Vietnamese
Yiddish
Yoruba
pdf_password Enter the password if pdf is password protected text false

URL

Available Method

Method POST
URL /pdfWithImages-to-text
Headers: Content-Type: application/json
type: json

Body Parameters
Attribute Description Type Required Option
pdf_url text True
text_language Select image text language for better accuracy. text false Afrikaans
Amharic
Arabic
Assamese
Azerbaijani
Azerbaijani (Cyrillic)
Belarusian
Bengali
Tibetan
Bosnian
Breton
Bulgarian
Catalan
Cebuano
Czech
Chinese Simplified
Chinese Simplified (Vertical)
Chinese Traditional
Chinese Traditional (Vertical)
Cherokee
Corsican
Welsh
German
Divehi
Dzongkha
Greek
English
Middle English
Esperanto
Math / Equation Detection
Estonian
Basque
Faroese
Persian
Filipino
Finnish
French
Frankish
Middle French
Frisian
Scottish Gaelic
Irish
Galician
Ancient Greek
Gujarati
Haitian Creole
Hebrew
Hindi
Croatian
Hungarian
Armenian
Inuktitut
Indonesian
Icelandic
Italian
Old Italian
Javanese
Japanese
Japanese (Vertical)
Kannada
Georgian
Old Georgian
Kazakh
Khmer
Kyrgyz
Kurdish (Kurmanji)
Korean
Lao
Latin
Latvian
Lithuanian
Luxembourgish
Malayalam
Marathi
Macedonian
Maltese
Mongolian
Maori
Malay
Burmese
Nepali
Dutch
Norwegian
Occitan
Oriya
Orientation and script detection (OSD)
Punjabi
Polish
Portuguese
Pashto
Quechua
Romanian
Russian
Sanskrit
Arabic Script
Armenian Script
Bengali Script
Canadian Aboriginal Script
Cherokee Script
Cyrillic Script
Devanagari Script
Ethiopic Script
Fraktur Script
Georgian Script
Greek Script
Gujarati Script
Gurmukhi Script
HanS Script
HanS (Vertical) Script
HanT Script
HanT (Vertical) Script
Hangul Script
Hangul (Vertical) Script
Hebrew Script
Japanese Script
Japanese (Vertical) Script
Kannada Script
Khmer Script
Lao Script
Latin Script
Malayalam Script
Myanmar Script
Oriya Script
Sinhala Script
Syriac Script
Tamil Script
Telugu Script
Thaana Script
Thai Script
Tibetan Script
Vietnamese Script
Sinhalese
Slovak
Slovenian
Sindhi
Spanish
Old Spanish
Albanian
Serbian
Serbian (Latin)
Sundanese
Swahili
Swedish
Syriac
Tamil
Tatar
Telugu
Tajik
Thai
Tigrinya
Tongan
Turkish
Uighur
Ukrainian
Urdu
Uzbek
Uzbek (Cyrillic)
Vietnamese
Yiddish
Yoruba
pdf_password Enter the password if pdf is password protected text false

Example Request

curl -X POST 'https://pdf.msquare.pro/pdfWithImages-to-text?appName=DocCrafter' \
            -H 'content-type: application/json' \
            -H 'authorization: YourAuthTokenHere' \
            -d '{"pdf_url":"https://www.africau.edu/images/default/sample.pdf","text_language":"eng"}'
             

3.PDF FORM TO JSON


3.1. PDF Form File URL as Input


Available Method

Method POST
URL /pdfFormToJSON
Request Type urlencoded
Query Paramters
Attribute Description Type Required Option
fileURL PDF Form File URL text true
pdf_password Enter the password if pdf is password protected. text false

3.1. PDF Form File


Body Parameters

Available Method

Method POST
URL /pdfFormToJSON
Headers content-Type: multipart/form-data
Body Parameters
Attribute Description Type Required Option
file PDF Form File file True
pdf_password Enter the password if pdf is password protected. text false

Example Request

curl -X POST \
            'https://pdf.msquare.pro/pdfFormToJSON?appName=DocCrafter&fileURL=https://royalegroupnyc.com/wp-content/uploads/seating_areas/sample_pdf.pdf' \
            -H 'Authorization: ***' \
            -H 'Content-Type: application/x-www-form-urlencoded' \
            -d '{}'
          
             


4.PDF TO CSV


PDF to CSV Conversion Service API provides a simple and flexible interface for converting PDF files to CSV format.
The service supports two input types: PDF file URL and PDF File.

Request

PDF URL Input Type

Available Method

Method POST
URL/convertPDFToCSV
Headers:Content-Type: application/x-www-form-urlencoded
Query Parameters Attribute Description Type Required Option fileURL text true pdf_password Enter the password if pdf is password protected. text false

File File Input Type

Available Method

Method POST
URL/convertPDFToCSV
Headers:Content-Type: multipart/form-data
Body Parameters
Attribute Description Type Required Option
file PDF File file True
pdf_password Enter the password if pdf is password protected. text false

Example Request

curl -X POST 'https://pdf.msquare.pro/convertPDFToCSV?appName=DocCrafter' \
            -H 'authorization: ***' \
            -H 'content-type: application/x-www-form-urlencoded' \
            -d '{
                  "fileURL": "https://file-examples.com/storage/fe165662b6657a0549b59ee/2017/10/file-sample_150kB.pdf"
                }'