c - Comparing numbers of bits per position in two lists of integers -
i have run problem can circumvent arranging algorithm differently, it's quite interesting , maybe 1 of has idea.
the situation follows: have 2 lists of unsigned long integers, both lists have same size, , if helpful can assume size power of two. size of these lists in range of several hundred. want compute integer has set bit in every position in first list has more set bits second list.
speed everything.
simplified example: list1 list2 1010 0101 1111 0000 1100 0011 1010 0101 result: 1010 because of 4>0, 2<=2, 3>1, 1<=3
edit: alternative arrangement of data result in bit vectors contain bits of position in several different vectors. in case use bit counting algorithm , compare, amount less 30 operations per 64 bits in both lists. have matrix of bits , can use bit vectors columns or rows.
additional structure: john willemse's comment made me realise calculate third list, these 3 lists complement each other bitwise. though don't see how helpful.
you can transposed counters - instead of having int each bit position of data, uint each bit position of count. don't need many bits..
you can addition/subtraction way defined on bitvectors, each "bit" being slice of bit position across counts.
perhaps sounds vague, let's jump right in: (not tested)
// add in item list2 carry0 = count0 & item2; count0 ^= item2; carry1 = count1 & carry0; count1 ^= carry0; .. etc many bits need in counters // subtract item list1 borrow0 = ~count0 & item1; count0 ^= item1; borrow1 = ~count1 & borrow0; count1 ^= borrow0; .. etc
the result signs, last counter you're using.
or, different: maybe can use sub-fields of int, swar style. works if fields small or don't need many, because there isn't space. 4-bit items it's not bad, uint32_t
offering 4 counters range -128 127, might enough (the final difference must in range, intermediate results can wrap safely)
anyway how work spread bits out either lookup table or pdep
, (not tested)
uint32_t spread = _pdep_u32(item, 0x01010101); // or uint32_t table[] = { 0x00000000, 0x00000001, 0x00000100, 0x00000101, 0x00010000, 0x00010001, 0x00010100, 0x00000101, 0x01000000, 0x01000001, 0x01000100, 0x00000101, 0x01010000, 0x01010001, 0x01010100, 0x01010101 }; uint32_t spread = table[item];
then swar addition or subtraction, can optimized bit, because know they're increments or decrements or no change, (not tested)
// add in spread item 2 uint32_t h = 0x80808080; count = ((count &~h) + sp2) ^ (count & h); // subtract spread item 1 count = ((count | h) - sp1) ^ (~count & h);
the result sign of every sub-field, easy extract annoying compress (unless have pext
).
Comments
Post a Comment