primelookup

A quick document retrieval system built on the technique of mapping words within a corpus into 4n+3 prime based vectors.

###Rationale: We represent each word in the corpus by a unique vector identity i.e. vector containing mapping of the word's characters to the primes of the form 4n+3. Any finite subset of these primes have elements which if multiplied together, can produce unique results, which help us identifying the closest or corresponding keywords.

###Usage:

>>> from primelookup import primelookup
>>> data = [(id_1, 'document_1'), (id_2, 'document_2'), .. , (id_n, 'document_n')]
>>> pl = primelookup(data)
>>> pl.map_primes()

This will generate mappings and the time depends on the size of the corpus. Once the mappings are generated, a document can be queried via a keyword like:

>>> pl.search(keyword)
>>> [(doc_id, score), (doc_id, score)..]

Note: Currently, only a single keyword based lookup is supported. Very soon, multiple keywords based and sentence based lookups will be added in.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
10.json		10.json
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
prime_gen.py		prime_gen.py
primelookup.py		primelookup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10.json

10.json

LICENSE

LICENSE

README.md

README.md

init.py

init.py

prime_gen.py

prime_gen.py

primelookup.py

primelookup.py

test.py

test.py

Repository files navigation

primelookup

About

Releases

Packages

Languages

License

saifuddin778/primelookup

Folders and files

Latest commit

History

Repository files navigation

primelookup

About

Resources

License

Stars

Watchers

Forks

Languages