Skip to content

saifuddin778/primelookup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

primelookup

A quick document retrieval system built on the technique of mapping words within a corpus into 4n+3 prime based vectors.

###Rationale: We represent each word in the corpus by a unique vector identity i.e. vector containing mapping of the word's characters to the primes of the form 4n+3. Any finite subset of these primes have elements which if multiplied together, can produce unique results, which help us identifying the closest or corresponding keywords.

###Usage:

>>> from primelookup import primelookup
>>> data = [(id_1, 'document_1'), (id_2, 'document_2'), .. , (id_n, 'document_n')]
>>> pl = primelookup(data)
>>> pl.map_primes()

This will generate mappings and the time depends on the size of the corpus. Once the mappings are generated, a document can be queried via a keyword like:

>>> pl.search(keyword)
>>> [(doc_id, score), (doc_id, score)..]

Note: Currently, only a single keyword based lookup is supported. Very soon, multiple keywords based and sentence based lookups will be added in.

About

A quick document retrieval system built on the technique of mapping words within a corpus into 4n+3 prime based vectors.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages