Space-efficient probabilistic set with configurable false positive rate
gem install philiprehberger-bloom_filterSpace-efficient probabilistic set with configurable false positive rate
Add to your Gemfile:
gem "philiprehberger-bloom_filter"
Or install directly:
gem install philiprehberger-bloom_filter
require "philiprehberger/bloom_filter"
filter = Philiprehberger::BloomFilter.new(expected_items: 10_000, false_positive_rate: 0.01)
filter.add('hello')
filter.include?('hello') # => true
filter.include?('world') # => false
Compute the optimal bit-array size and hash-function count before allocating a filter — useful for budgeting memory or sharing parameters between processes.
bits = Philiprehberger::BloomFilter.optimal_size(expected_items: 10_000, false_positive_rate: 0.01)
# => 95851
hash_count = Philiprehberger::BloomFilter.optimal_hash_count(size: bits, expected_items: 10_000)
# => 7
Estimate the false-positive rate for any custom size, expected-item count, and hash-function count — useful when you've fixed the bit budget or hash count and want to see what FPR you'll actually get.
Philiprehberger::BloomFilter.expected_false_positive_rate(
size: 95_851,
expected_items: 10_000,
hash_count: 7
)
# => ~0.01
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.add('beta')
a.merge(b)
a.include?('beta') # => true
filter = Philiprehberger::BloomFilter.new(expected_items: 10_000)
filter.bulk_add(%w[alpha beta gamma delta])
filter.include?('beta') # => true
filter = Philiprehberger::BloomFilter.new(expected_items: 10_000)
filter.bulk_add(%w[alpha beta gamma])
filter.bulk_include?(%w[alpha beta unseen]) # => [true, true, false]
filter.count_estimate # => ~4.0 (estimated unique items)
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.bulk_add(%w[shared only-a])
b.bulk_add(%w[shared only-b])
result = a.intersection(b)
result.include?('shared') # => true
result.include?('only-a') # => false
filter.fill_rate # => 0.023 (proportion of set bits)
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('hello')
b.add('hello')
a == b # => true
clone = a.copy
clone.add('world')
a.include?('world') # => false (independent copy)
filter = Philiprehberger::BloomFilter.new(expected_items: 1000, false_positive_rate: 0.01)
100.times { |i| filter.add("item-#{i}") }
filter.false_positive_rate # => ~0.0001 (actual rate based on current fill)
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.bulk_add(%w[alpha beta gamma])
b.add('alpha')
a.superset?(b) # => true
filter = Philiprehberger::BloomFilter.new(expected_items: 1000)
filter.empty? # => true
filter.add('hello')
filter.empty? # => false
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.add('beta')
result = a.union(b)
result.include?('alpha') # => true
result.include?('beta') # => true
a.include?('beta') # => false (a is unchanged)
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.bulk_add(%w[alpha beta gamma])
a.subset?(b) # => true
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.add('beta')
union = a | b # same as a.union(b)
intersection = a & b # same as a.intersection(b)
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.compatible?(b) # => true, safe to merge / intersect / union
filter = Philiprehberger::BloomFilter.new(expected_items: 100)
filter.saturated? # => false
200.times { |i| filter.add("item-#{i}") }
filter.saturated?(threshold: 0.5) # => true
data = filter.serialize
restored = Philiprehberger::BloomFilter.deserialize(data)
restored.include?('hello') # => true
json = filter.to_json
restored = Philiprehberger::BloomFilter.from_json(json)
restored.include?('hello') # => true
| Method | Description |
|---|---|
.new(expected_items:, false_positive_rate:) | Create a new bloom filter |
.optimal_size(expected_items:, false_positive_rate:) | Compute optimal bit-array size for a target false positive rate |
.optimal_hash_count(size:, expected_items:) | Compute optimal hash-function count for a given bit-array size |
.expected_false_positive_rate(size:, expected_items:, hash_count:) | Estimate FPR for a custom size, item count, and hash-function count |
#add(item) | Add an item to the filter |
#include?(item) | Check if an item might be in the filter |
#merge(other) | Merge another compatible filter into this one |
#clear | Reset the filter |
#count | Number of items added |
#memory_usage | Bit array size in bytes |
#serialize | Serialize to a hash |
#bulk_add(items) | Add all items from an enumerable |
#bulk_include?(items) | Check membership for many items, returning an array of booleans |
#count_estimate | Estimate unique item count from fill rate |
#intersection(other) | Create filter matching items in both |
#fill_rate | Proportion of set bits (0.0 to 1.0) |
#==(other) | Structural equality comparison |
#copy | Create an independent deep clone |
#false_positive_rate | Actual FP rate based on current fill |
#to_json | Serialize to JSON string |
#superset?(other) | Check if self contains all bits of other |
#empty? | Check if no items have been added |
#union(other) | Non-mutating OR returning a new filter |
#subset?(other) | Check if every set bit in self is also set in other |
#compatible?(other) | Check structural compatibility with another filter |
#saturated?(threshold:) | True if fill rate is at/above threshold |
#|(other) | Operator alias for #union |
#&(other) | Operator alias for #intersection |
#hash / #eql? | Hash key support consistent with #== |
#inspect | Human-readable representation |
.deserialize(data) | Restore a filter from serialized data |
.from_json(str) | Restore a filter from a JSON string |
bundle install
bundle exec rspec
bundle exec rubocop
If you find this project useful: