text-extraction

Retrieve data from two different websites, loading them into the PostgreSQL database using Python, and combine them to get and present new information

postgresql text-extraction constraints data-statictics extract-data data-conversion python-scrapy python-connector categorize-products join-query

Updated Dec 5, 2023
Python

Jaha96 / tesseract-quick-implementation

Star

Tesseract-OCR quick implementation. Linked with stack-overflow question

tesseract text-extraction tesseract-ocr pyinstaller tesseract-4 tesseract-python

Updated Nov 26, 2019
HTML

Aalaa4444 / Text_Processing-and-Unique_Word_Extraction_fromHTML

Star

Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.

tokenizer text-extraction requests data-extraction beautifulsoup text-processing tokenization stemming lemmatization stopwords-removal text-cleaning text-normalization extract-html text-tokenization text-lemmatization

Updated Apr 5, 2024
Jupyter Notebook

Lanjkn / Text-Extractor

Star

Api to get text from multiple types of files

api text-extraction file-processing

Updated Mar 14, 2024
Python

nikolay-malygin / snap-text

Star

A simple web application built with React which allows to upload images containing text, select the language of the text for recognition, and extract the text from the image. As quick as a finger snap - SnapText.

react reactjs web-application text-extraction text-recognition copy-to-clipboard multi-language-support simple-app copy-text-to-clipboard text-extraction-from-image copy-result

Updated Dec 10, 2023
HTML

swingfox / ViTeX

Star

[Thesis] Video Text Extraction

image-processing text-extraction ocr-recognition image-filters blob-detection

Updated Mar 6, 2016
C#

prateeksahu147 / OCR-PDF-Web-Scraper

Star

Engine for automated the process of scraping PDFs into local and convert those PDFs into text by performing OCR.

opencv ocr python3 text-extraction data-preprocessing webscraping data-preparation

Updated Jul 14, 2022

jhw296 / BookScanner

Star

PyQt5를 사용한 간단한 도서 스캐너 프로젝트 (바코드 인식과 텍스트 추출을 통한 도서 정보를 검색 및 표시)

opencv pyqt5 image-processing text-extraction recognizes-images barcode-scanner

Updated Jun 15, 2023
Python

pedrocardoz0 / body-snatcher

Star

custom github action to parse issue body

workflow automation text-extraction issue-parser github-actions

Updated Feb 12, 2023
TypeScript

Asraf2asif / SummifyAI

Star

Harnesses the power of OpenAI's to revolutionize the way you consume information. Say goodbye to information overload and hello to quick and comprehensive understanding. Let our AI-Powered Content Summarizer extract the key insights from any text, allowing you to focus on what matters most.

nodejs app ai postcss chatbot text-extraction openai summarizer summarize vitejs openai-api ai-powered content-summarizer summifyai

Updated Aug 17, 2023
JavaScript

Nishant2018 / Text-Extraction-OCR-OpenCV

Star

Text extraction is the process of automatically extracting text from images or documents. Optical Character Recognition (OCR) is a technology that enables computers to convert images of text into machine-readable text.

python opencv ocr text-extraction

Updated Jun 10, 2024
Jupyter Notebook

divyeshBhartiya / InvoiceReader.OCR

Star

This is a small repository of image parsers in python which would extract the texts in an image. This is being used to extract the texts from invoices and bills. The parsers uses the concepts of OCR.

python ocr text-extraction optical-character-recognition

Updated Aug 11, 2021
Python

akomarla / slack_msg_processing

Star

Processing and hashing Slack communication to enable language modelling

nlp slack text-extraction hashing-algorithm md-5

Updated Jun 7, 2023
Python

dataiku / dss-plugin-tesseract-ocr

Star

Dataiku DSS plugin to perform optical character recognition (OCR) using the Tesseract engine.

ocr tesseract text-extraction tesseract-ocr optical-character-recognition dataiku dss-plugin

Updated Apr 18, 2024
Python

OwenOrcan / YiraBot-Crawler

Star

YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.

open-source machine-learning data-mining scraping python3 text-extraction web-scraping html-parser robots-txt data-extraction seotools command-line-tool beginner-friendly contributions-welcome big-data-analytics seo-analysis good-first-issue sitemap-parser web-crawlers

Updated Mar 3, 2024
Python

cassarpacerobert / PokerStarsTextExtract

Star

A Python script that reads through all of the inputted text files from a poker game on PokerStars and extracts the cards of each player based on the hand ID.

automation python3 text-extraction

Updated Jun 19, 2020
Python

greed2411 / tokyo

Star

tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.

clojure extension filetype text-extraction ring mime-types text-parser extract-text apache-tika document-processing text-parsing

Updated Jun 13, 2020
Clojure

Improve this page

Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-extraction

Here are 213 public repositories matching this topic...

Vineeth-Ellore / text-extraction.github.io

juliandavidmr / text2locale

arachnio / arachnio4j

ParisaArbab / Data-Modeling

Jaha96 / tesseract-quick-implementation

Aalaa4444 / Text_Processing-and-Unique_Word_Extraction_fromHTML

Lanjkn / Text-Extractor

nikolay-malygin / snap-text

swingfox / ViTeX

prateeksahu147 / OCR-PDF-Web-Scraper

jhw296 / BookScanner

pedrocardoz0 / body-snatcher

Asraf2asif / SummifyAI

Nishant2018 / Text-Extraction-OCR-OpenCV

divyeshBhartiya / InvoiceReader.OCR

akomarla / slack_msg_processing

dataiku / dss-plugin-tesseract-ocr

OwenOrcan / YiraBot-Crawler

cassarpacerobert / PokerStarsTextExtract

greed2411 / tokyo

Improve this page

Add this topic to your repo