diff options
Diffstat (limited to 'docs/gettingstarted/developers/bihash.md')
-rw-r--r-- | docs/gettingstarted/developers/bihash.md | 308 |
1 files changed, 0 insertions, 308 deletions
diff --git a/docs/gettingstarted/developers/bihash.md b/docs/gettingstarted/developers/bihash.md deleted file mode 100644 index b0f98869887..00000000000 --- a/docs/gettingstarted/developers/bihash.md +++ /dev/null @@ -1,308 +0,0 @@ -Bounded-index Extensible Hashing (bihash) -========================================= - -Vpp uses bounded-index extensible hashing to solve a variety of -exact-match (key, value) lookup problems. Benefits of the current -implementation: - -* Very high record count scaling, tested to 100,000,000 records. -* Lookup performance degrades gracefully as the number of records increases -* No reader locking required -* Template implementation, it's easy to support arbitrary (key,value) types - -Bounded-index extensible hashing has been widely used in databases for -decades. - -Bihash uses a two-level data structure: - -``` - +-----------------+ - | bucket-0 | - | log2_size | - | backing store | - +-----------------+ - | bucket-1 | - | log2_size | +--------------------------------+ - | backing store | --------> | KVP_PER_PAGE * key-value-pairs | - +-----------------+ | page 0 | - ... +--------------------------------+ - +-----------------+ | KVP_PER_PAGE * key-value-pairs | - | bucket-2**N-1 | | page 1 | - | log2_size | +--------------------------------+ - | backing store | --- - +-----------------+ +--------------------------------+ - | KVP_PER_PAGE * key-value-pairs | - | page 2**(log2(size)) - 1 | - +--------------------------------+ -``` - -Discussion of the algorithm ---------------------------- - -This structure has a couple of major advantages. In practice, each -bucket entry fits into a 64-bit integer. Coincidentally, vpp's target -CPU architectures support 64-bit atomic operations. When modifying the -contents of a specific bucket, we do the following: - -* Make a working copy of the bucket's backing storage -* Atomically swap a pointer to the working copy into the bucket array -* Change the original backing store data -* Atomically swap back to the original - -So, no reader locking is required to search a bihash table. - -At lookup time, the implementation computes a key hash code. We use -the least-significant N bits of the hash to select the bucket. - -With the bucket in hand, we learn log2 (nBackingPages) for the -selected bucket. At this point, we use the next log2_size bits from -the hash code to select the specific backing page in which the -(key,value) page will be found. - -Net result: we search **one** backing page, not 2**log2_size -pages. This is a key property of the algorithm. - -When sufficient collisions occur to fill the backing pages for a given -bucket, we double the bucket size, rehash, and deal the bucket -contents into a double-sized set of backing pages. In the future, we -may represent the size as a linear combination of two powers-of-two, -to increase space efficiency. - -To solve the "jackpot case" where a set of records collide under -hashing in a bad way, the implementation will fall back to linear -search across 2**log2_size backing pages on a per-bucket basis. - -To maintain *space* efficiency, we should configure the bucket array -so that backing pages are effectively utilized. Lookup performance -tends to change *very little* if the bucket array is too small or too -large. - -Bihash depends on selecting an effective hash function. If one were to -use a truly broken hash function such as "return 1ULL." bihash would -still work, but it would be equivalent to poorly-programmed linear -search. - -We often use cpu intrinsic functions - think crc32 - to rapidly -compute a hash code which has decent statistics. - -Bihash Cookbook ---------------- - -### Using current (key,value) template instance types - -It's quite easy to use one of the template instance types. As of this -writing, .../src/vppinfra provides pre-built templates for 8, 16, 20, -24, 40, and 48 byte keys, u8 * vector keys, and 8 byte values. - -See .../src/vppinfra/{bihash_<key-size>_8}.h - -To define the data types, #include a specific template instance, most -often in a subsystem header file: - -```c - #include <vppinfra/bihash_8_8.h> -``` - -If you're building a standalone application, you'll need to define the -various functions by #including the method implementation file in a C -source file. - -The core vpp engine currently uses most if not all of the known bihash -types, so you probably won't need to #include the method -implementation file. - - -```c - #include <vppinfra/bihash_template.c> -``` - -Add an instance of the selected bihash data structure to e.g. a -"main_t" structure: - -```c - typedef struct - { - ... - BVT (clib_bihash) hash_table; - or - clib_bihash_8_8_t hash_table; - ... - } my_main_t; -``` - -The BV macro concatenate its argument with the value of the -preprocessor symbol BIHASH_TYPE. The BVT macro concatenates its -argument with the value of BIHASH_TYPE and the fixed-string "_t". So -in the above example, BVT (clib_bihash) generates "clib_bihash_8_8_t". - -If you're sure you won't decide to change the template / type name -later, it's perfectly OK to code "clib_bihash_8_8_t" and so forth. - -In fact, if you #include multiple template instances in a single -source file, you **must** use fully-enumerated type names. The macros -stand no chance of working. - -### Initializing a bihash table - -Call the init function as shown. As a rough guide, pick a number of -buckets which is approximately -number_of_expected_records/BIHASH_KVP_PER_PAGE from the relevant -template instance header-file. See previous discussion. - -The amount of memory selected should easily contain all of the -records, with a generous allowance for hash collisions. Bihash memory -is allocated separately from the main heap, and won't cost anything -except kernel PTE's until touched, so it's OK to be reasonably -generous. - -For example: - -```c - my_main_t *mm = &my_main; - clib_bihash_8_8_t *h; - - h = &mm->hash_table; - - clib_bihash_init_8_8 (h, "test", (u32) number_of_buckets, - (uword) memory_size); -``` - -### Add or delete a key/value pair - -Use BV(clib_bihash_add_del), or the explicit type variant: - -```c - clib_bihash_kv_8_8_t kv; - clib_bihash_8_8_t * h; - my_main_t *mm = &my_main; - clib_bihash_8_8_t *h; - - h = &mm->hash_table; - kv.key = key_to_add_or_delete; - kv.value = value_to_add_or_delete; - - clib_bihash_add_del_8_8 (h, &kv, is_add /* 1=add, 0=delete */); -``` - -In the delete case, kv.value is irrelevant. To change the value associated -with an existing (key,value) pair, simply re-add the [new] pair. - -### Simple search - -The simplest possible (key, value) search goes like so: - -```c - clib_bihash_kv_8_8_t search_kv, return_kv; - clib_bihash_8_8_t * h; - my_main_t *mm = &my_main; - clib_bihash_8_8_t *h; - - h = &mm->hash_table; - search_kv.key = key_to_add_or_delete; - - if (clib_bihash_search_8_8 (h, &search_kv, &return_kv) < 0) - key_not_found(); - else - key_found(); -``` - -Note that it's perfectly fine to collect the lookup result - -```c - if (clib_bihash_search_8_8 (h, &search_kv, &search_kv)) - key_not_found(); - etc. -``` - -### Bihash vector processing - -When processing a vector of packets which need a certain lookup -performed, it's worth the trouble to compute the key hash, and -prefetch the correct bucket ahead of time. - -Here's a sketch of one way to write the required code: - -Dual-loop: -* 6 packets ahead, prefetch 2x vlib_buffer_t's and 2x packet data - required to form the record keys -* 4 packets ahead, form 2x record keys and call BV(clib_bihash_hash) - or the explicit hash function to calculate the record hashes. - Call 2x BV(clib_bihash_prefetch_bucket) to prefetch the buckets -* 2 packets ahead, call 2x BV(clib_bihash_prefetch_data) to prefetch - 2x (key,value) data pages. -* In the processing section, call 2x BV(clib_bihash_search_inline_with_hash) - to perform the search - -Programmer's choice whether to stash the hash code somewhere in -vnet_buffer(b) metadata, or to use local variables. - -Single-loop: -* Use simple search as shown above. - -### Walking a bihash table - -A fairly common scenario to build "show" commands involves walking a -bihash table. It's simple enough: - -```c - my_main_t *mm = &my_main; - clib_bihash_8_8_t *h; - void callback_fn (clib_bihash_kv_8_8_t *, void *); - - h = &mm->hash_table; - - BV(clib_bihash_foreach_key_value_pair) (h, callback_fn, (void *) arg); -``` -To nobody's great surprise: clib_bihash_foreach_key_value_pair -iterates across the entire table, calling callback_fn with active -entries. - -#### Bihash table iteration safety - -The iterator template "clib_bihash_foreach_key_value_pair" must be -used with a certain amount of care. For one thing, the iterator -template does _not_ take the bihash hash table writer lock. If your -use-case requires it, lock the table. - -For another, the iterator template is not safe under all conditions: - -* It's __OK to delete__ bihash table entries during a table-walk. The -iterator checks whether the current bucket has been freed after each -_callback_fn(...)_ invocation. - -* It is __not OK to add__ entries during a table-walk. - -The add-during-walk case involves a jackpot: while processing a -key-value-pair in a particular bucket, add a certain number of -entries. By luck, assume that one or more of the added entries causes -the __current bucket__ to split-and-rehash. - -Since we rehash KVP's to different pages based on what amounts to a -different hash function, either of these things can go wrong: - -* We may revisit previously-visited entries. Depending on how one -coded the use-case, we could end up in a recursive-add situation. - -* We may skip entries that have not been visited - -One could build an add-safe iterator, at a significant cost in -performance: copy the entire bucket, and walk the copy. - -It's hard to imagine a worthwhile add-during walk use-case in the -first place; let alone one which couldn't be implemented by walking -the table without modifying it, then adding a set of records. - -### Creating a new template instance - -Creating a new template is easy. Use one of the existing templates as -a model, and make the obvious changes. The hash and key_compare -methods are performance-critical in multiple senses. - -If the key compare method is slow, every lookup will be slow. If the -hash function is slow, same story. If the hash function has poor -statistical properties, space efficiency will suffer. In the limit, a -bad enough hash function will cause large portions of the table to -revert to linear search. - -Use of the best available vector unit is well worth the trouble in the -hash and key_compare functions. |