Skip to content

Latest commit

 

History

History
22 lines (19 loc) · 1.03 KB

README.md

File metadata and controls

22 lines (19 loc) · 1.03 KB

primelookup

A quick document retrieval system built on the technique of mapping words within a corpus into 4n+3 prime based vectors.

###Rationale: We represent each word in the corpus by a unique vector identity i.e. vector containing mapping of the word's characters to the primes of the form 4n+3. Any finite subset of these primes have elements which if multiplied together, can produce unique results, which help us identifying the closest or corresponding keywords.

###Usage:

>>> from primelookup import primelookup
>>> data = [(id_1, 'document_1'), (id_2, 'document_2'), .. , (id_n, 'document_n')]
>>> pl = primelookup(data)
>>> pl.map_primes()

This will generate mappings and the time depends on the size of the corpus. Once the mappings are generated, a document can be queried via a keyword like:

>>> pl.search(keyword)
>>> [(doc_id, score), (doc_id, score)..]

Note: Currently, only a single keyword based lookup is supported. Very soon, multiple keywords based and sentence based lookups will be added in.