primelookup

A quick document retrieval system built on the technique of mapping words within a corpus into 4n+3 prime based vectors.

###Rationale: We represent each word in the corpus by a unique vector identity i.e. vector containing mapping of the word's characters to the primes of the form 4n+3. Any finite subset of these primes have elements which if multiplied together, can produce unique results, which help us identifying the closest or corresponding keywords.

###Usage:

>>> from primelookup import primelookup
>>> data = [(id_1, 'document_1'), (id_2, 'document_2'), .. , (id_n, 'document_n')]
>>> pl = primelookup(data)
>>> pl.map_primes()

This will generate mappings and the time depends on the size of the corpus. Once the mappings are generated, a document can be queried via a keyword like:

>>> pl.search(keyword)
>>> [(doc_id, score), (doc_id, score)..]

Note: Currently, only a single keyword based lookup is supported. Very soon, multiple keywords based and sentence based lookups will be added in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

primelookup

Files

README.md

Latest commit

History

README.md

File metadata and controls

primelookup