Making sense of Ethereum log bloom filters

Posted in code crypto ethereum bloom filter

Ethereum blocks and transaction receipts both carry a bloom filter. They provide a shortcut to check whether a particular event of a particular contract was triggered. The filter in the transaction indexes all events that occurred in the transaction. The filter in the block carries all the filter bits of the individual transactions within the block.

Somewhat surprisingly, there are practically no friendly descriptions out there that tell you how to generate filters to match them with. In fact, the Ethereum Yellow Paper was the only source I found.

So let's make a friendly description

Before we get to the friendliness, though, an ever so slight dose of brain melt is due:

The formalities

The formal definition of the filter calculation in the Ethereum Yellow Paper section 4.3.1 is (here cited in painfully inadequate math formatting):

M(O) ≡ x ∈ {Oa}∪Ot(M3 : 2048(x))

Where Oa is the contract address and Ot is all of the topics.

In the context of Solidity, "all" of the topics means the actual event signature, along with all the indexed parameters to the event.

The M is the filter bit generator. The filter 2048 bits long, and for each entry that is added, 3 bits need to be set.

Reading further, we learn that the bits to set are the "first three" pairs of bytes of the keccak256 hash [1], modulo the length of the filter - 2048.

In the context of hashes, I usually assume "first" means from the left. That happens to be the case here, too. In other words, the "first" bytes are actually the bytes of most significance in big-endian order.

What's less clear, however, is which bits to set in the filter. The paper says:

is the bit reference function such that j(x) equals the bit of index j (indexed from 0) in the byte array x.

Turns out after some trial and error that "index 0" means the right end of the byte array. So when setting the bits, we have to count from the right side.

The friendly recipe

Now let's try to bring this down to a mortal level:

  1. Take the keccak256 hash of the contract address
  2. Take the keccak256 hash of the first topic value
  3. Optionally, take the keccak256 hash of the remaining topic values (the indexed parameters of the event) [2]
  4. Instantiate a byte array with length 256
  5. Now, for each of the hashes:
    1. Get the big-endian integer of the first two bytes of the hash
    2. Calculate the 2048-modulo of that integer [3]
    3. Bitwise or the filter bit at the index corresponding to the modulated value, counting from right to left (i.e. the value 0 corresponds to the rightmost bit).
    4. Repeat 2 and 3 for the two next pairs of bytes of the hash.

The code

All that taken into account, we can take a stab at implementing a filter generator:

 0 # external imports
 1 import sha3
 4 class LogBloom:
 6     def __init__(self):
 7         self.content = bytearray(256)
10     def add(self, element):
11         if not isinstance(element, bytes):
12             raise ValueError('element must be bytes')
13         h = sha3.keccak_256()
14         h.update(element)
15         z = h.digest()
17         for j in range(3):
18             c = j * 2
19             v = int.from_bytes(z[c:c+2], byteorder='big')
20             v &= 0x07ff
21             m = 255 - int(v / 8)
22             n = v % 8
23             self.content[m] |= (1 << n)

Let's say we'd want to check if an Solidity-defined event Foo(uint256,bytes32) emitted by a contract at address 0xeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee exists in a filter.

Using the class above, we would have to do something like this:

# external imports
import sha3
import some_block_getter

# local imports
from the_above import LogBloom

# all set bits in our filter must be set in theirs
def filter_match(theirs, ours):
        for i in range(len(ours)):
                if ours[i] & theirs[i] != ours[i]:
                        return False
        return True

fltr = LogBloom()

# add the contract address to the filter
address = bytes.fromhex('eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee')
h = sha3.keccak_256()
address_element = h.digest()

# create the topic signature for the event
h = sha3.keccak_256()
topic = h.digest()

# add the topic to the filter
h = sha3.keccak_256()
topic_element = h.digest()

# get the filter from a block
block = some_block_getter()
block_bloom = bytes.fromhex(block['logsBloom'][2:]) # assumes there is a 0x-prefix

# check if it matches
match = filter_match(block_bloom, fltr.contents)
[1]Remember the keccak hash used in Ethereum/Bitcoin is _not_ the official sha3 hash. The padding parameter is different. More on that here
[2]Solidity supports up to 3 indexed parameters, which maps to the evm opcodes LOGn where n ∈ {1, 2, 3, 4}
[3]Since 211 = 2048, this is the same as zeroing out all bits above bit number 11; x∧2048. In other words, the result is an 11 bit integer.