-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce custom hash table data structures. #3940
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to give feedback on map.h early, before you update set.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, definitely useful to get map.h
tidied up before I port it to set.h
. I think I've responded to all the comments there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending what comments I have so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the feedback so far, PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished a pass through raw_hashtable.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL!
Beyond inline replies, also was able to simplify the specialization strategy for the table type and improved some other comments spotted as I was working.
Also, think this is far enough along in review I'm moving it out of draft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL!
Beyond inline replies, also was able to simplify the specialization strategy for the table type and improved some other comments spotted as I was working.
Also, think this is far enough along in review I'm moving it out of draft.
The hash table design is heavily based on Abseil's ["Swiss Tables"][swiss-tables] design. It uses an array of bytes storing metadata about each entry and an array of entries where each is a pair of key and value. The metadata byte consists of 7-bits of hash of the key (distinct from the bits used to index the table), and one bit indicating the presence of a special entry -- either empty or deleted. [swiss-tables]: https://abseil.io/about/design/swisstables See the comments in `raw_hashtable.h` for a detailed overview of the design. Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
So, I'm most of the way through adding really nice support for handling stateful keys like indices into vectors and such. It turned out to be necessary to even move the toolchain over to these data structures as it also lets us use something other than
Wanted to both let you know @josh11b about my progress there, and also ask whether you'd like me to merge this into this code review or keep it as a follow-up. I can easily manage either way. |
I'm fine either way. If it simplifies the files haven't reviewed yet, it would be a benefit. |
This allows customizing both the hash and equality comparison. It also does so in a way that enables stateful key contexts such as indices into a specific vector. This works by requiring the context to be packaged on every API accepting a key. When the context is stateless and can be default constructed, this works via a default argument. When it is stateful, the argument must be provided explicitly. This allows us to add a stress test that forces colliding hashes in egregious ways, as well as allowing the staeful hashing. There are also a number of basic cleanups to code that were uncovered while working on this.
After discussion, it seems a bit borderline with pros and cons, but on the whole probably simplifies the review. Added and pushed. I did make it a clean follow-up commit so that it is easy to see the diff: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished a pass on raw_hashtable_metadata_group.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect many of the comments about set.h also apply to map.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read through set_test.cpp and started on map_test.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now looked at everything except for files with names containing "benchmark"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished a complete pass
Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed most of the metadata_group stuff. Sorry my search-and-replace for mask
with bits
missed so much, I think I must have messed up that search really badly. =[
Still need to go through comments on other benchmark files, but figured I'd post these updates.
Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I got all the benchmark comments now as well as a collection of other comments I had missed that were still open. PTAL!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still a "TODO" in the PR summary, though hopefully that has been largely addressed by now.
Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
Huge thanks to Josh for all the review on this ... really extensive PR. Also thanks to all the folks who gave feedback on the design or approach or discussed! |
The hash table design is heavily based on Abseil's "Swiss Tables" design. It uses an array of bytes storing metadata about each entry and an array of entries where each is a pair of key and value. The metadata byte consists of 7-bits of hash of the key (distinct from the bits used to index the table), and one bit indicating the presence of a special entry -- either empty or deleted.
There are a large range of optimizations and other nuanced aspects of this hash table design and implementation, a good point to understand that context is
raw_hashtable.h
which has an overview of the design and references to various other files for relevant details.