Class BlockSplitBloomFilter
- java.lang.Object
-
- org.apache.parquet.column.values.bloomfilter.BlockSplitBloomFilter
-
- All Implemented Interfaces:
BloomFilter
public class BlockSplitBloomFilter extends Object implements BloomFilter
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.parquet.column.values.bloomfilter.BloomFilter
BloomFilter.Algorithm, BloomFilter.Compression, BloomFilter.HashStrategy
-
-
Field Summary
Fields Modifier and Type Field Description static doubleDEFAULT_FPPstatic intHEADER_SIZEstatic intLOWER_BOUND_BYTESstatic intUPPER_BOUND_BYTES
-
Constructor Summary
Constructors Constructor Description BlockSplitBloomFilter(byte[] bitset)Construct the Bloom filter with given bitset, it is used when reconstructing Bloom filter from parquet file.BlockSplitBloomFilter(int numBytes)Constructor of block-based Bloom filter.BlockSplitBloomFilter(int numBytes, int maximumBytes)Constructor of block-based Bloom filter.BlockSplitBloomFilter(int numBytes, int minimumBytes, int maximumBytes, BloomFilter.HashStrategy hashStrategy)Constructor of block-based Bloom filter.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description booleancanMergeFrom(BloomFilter otherBloomFilter)Determines whether a given Bloom filter can be merged into this Bloom filter.booleanequals(Object object)Compare this Bloom filter to the specified object.booleanfindHash(long hash)Determine whether an element is in set or not.BloomFilter.AlgorithmgetAlgorithm()Return the algorithm that the bloom filter apply.intgetBitsetSize()Get the number of bytes for bitset in this Bloom filter.BloomFilter.CompressiongetCompression()Return the compress algorithm that the bloom filter apply.BloomFilter.HashStrategygetHashStrategy()Return the hash strategy that the bloom filter apply.longhash(double value)Compute hash for double value by using its plain encoding result.longhash(float value)Compute hash for float value by using its plain encoding result.longhash(int value)Compute hash for int value by using its plain encoding result.longhash(long value)Compute hash for long value by using its plain encoding result.longhash(Object value)Compute hash for Object value by using its plain encoding result.longhash(Binary value)Compute hash for Binary value by using its plain encoding result.voidinsertHash(long hash)Insert an element to the Bloom filter, the element content is represented by the hash value of its plain encoding result.voidmerge(BloomFilter otherBloomFilter)Merges this Bloom filter with another Bloom filter by performing a bitwise OR of the underlying bitsetsstatic intoptimalNumOfBits(long n, double p)Calculate optimal size according to the number of distinct values and false positive probability.voidwriteTo(OutputStream out)Write the Bloom filter to an output stream.
-
-
-
Field Detail
-
LOWER_BOUND_BYTES
public static final int LOWER_BOUND_BYTES
- See Also:
- Constant Field Values
-
UPPER_BOUND_BYTES
public static final int UPPER_BOUND_BYTES
- See Also:
- Constant Field Values
-
HEADER_SIZE
public static final int HEADER_SIZE
- See Also:
- Constant Field Values
-
DEFAULT_FPP
public static final double DEFAULT_FPP
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
BlockSplitBloomFilter
public BlockSplitBloomFilter(int numBytes)
Constructor of block-based Bloom filter.- Parameters:
numBytes- The number of bytes for Bloom filter bitset. The range of num_bytes should be within [DEFAULT_MINIMUM_BYTES, DEFAULT_MAXIMUM_BYTES], it will be rounded up/down to lower/upper bound if num_bytes is out of range. It will also be rounded up to a power of 2. It uses XXH64 as its default hash function.
-
BlockSplitBloomFilter
public BlockSplitBloomFilter(int numBytes, int maximumBytes)Constructor of block-based Bloom filter.- Parameters:
numBytes- The number of bytes for Bloom filter bitset. The range of num_bytes should be within [DEFAULT_MINIMUM_BYTES, maximumBytes], it will be rounded up/down to lower/upper bound if num_bytes is out of range. It will also be rounded up to a power of 2. It uses XXH64 as its default hash function.maximumBytes- The maximum bytes of the Bloom filter.
-
BlockSplitBloomFilter
public BlockSplitBloomFilter(int numBytes, int minimumBytes, int maximumBytes, BloomFilter.HashStrategy hashStrategy)Constructor of block-based Bloom filter.- Parameters:
numBytes- The number of bytes for Bloom filter bitset. The range of num_bytes should be within [minimumBytes, maximumBytes], it will be rounded up/down to lower/upper bound if num_bytes is out of range. It will also be rounded up to a power of 2.minimumBytes- The minimum bytes of the Bloom filter.maximumBytes- The maximum bytes of the Bloom filter.hashStrategy- The adopted hash strategy of the Bloom filter.
-
BlockSplitBloomFilter
public BlockSplitBloomFilter(byte[] bitset)
Construct the Bloom filter with given bitset, it is used when reconstructing Bloom filter from parquet file. It use XXH64 as its default hash function.- Parameters:
bitset- The given bitset to construct Bloom filter.
-
-
Method Detail
-
writeTo
public void writeTo(OutputStream out) throws IOException
Description copied from interface:BloomFilterWrite the Bloom filter to an output stream. It writes the Bloom filter header including the bitset's length in bytes, the hash strategy, the algorithm, and the bitset.- Specified by:
writeToin interfaceBloomFilter- Parameters:
out- the output stream to write- Throws:
IOException
-
insertHash
public void insertHash(long hash)
Description copied from interface:BloomFilterInsert an element to the Bloom filter, the element content is represented by the hash value of its plain encoding result.- Specified by:
insertHashin interfaceBloomFilter- Parameters:
hash- the hash result of element.
-
findHash
public boolean findHash(long hash)
Description copied from interface:BloomFilterDetermine whether an element is in set or not.- Specified by:
findHashin interfaceBloomFilter- Parameters:
hash- the hash value of element plain encoding result.- Returns:
- false if element is must not in set, true if element probably in set.
-
optimalNumOfBits
public static int optimalNumOfBits(long n, double p)Calculate optimal size according to the number of distinct values and false positive probability.- Parameters:
n- : The number of distinct values.p- : The false positive probability.- Returns:
- optimal number of bits of given n and p.
-
getBitsetSize
public int getBitsetSize()
Description copied from interface:BloomFilterGet the number of bytes for bitset in this Bloom filter.- Specified by:
getBitsetSizein interfaceBloomFilter- Returns:
- The number of bytes for bitset in this Bloom filter.
-
hash
public long hash(Object value)
Description copied from interface:BloomFilterCompute hash for Object value by using its plain encoding result.- Specified by:
hashin interfaceBloomFilter- Parameters:
value- the value to hash- Returns:
- hash result
-
equals
public boolean equals(Object object)
Description copied from interface:BloomFilterCompare this Bloom filter to the specified object.- Specified by:
equalsin interfaceBloomFilter- Overrides:
equalsin classObject- Returns:
- true if the given object represents a Bloom filter equivalent to this Bloom filter, false otherwise.
-
getHashStrategy
public BloomFilter.HashStrategy getHashStrategy()
Description copied from interface:BloomFilterReturn the hash strategy that the bloom filter apply.- Specified by:
getHashStrategyin interfaceBloomFilter- Returns:
- hash strategy that the bloom filter apply
-
getAlgorithm
public BloomFilter.Algorithm getAlgorithm()
Description copied from interface:BloomFilterReturn the algorithm that the bloom filter apply.- Specified by:
getAlgorithmin interfaceBloomFilter- Returns:
- algorithm that the bloom filter apply
-
getCompression
public BloomFilter.Compression getCompression()
Description copied from interface:BloomFilterReturn the compress algorithm that the bloom filter apply.- Specified by:
getCompressionin interfaceBloomFilter- Returns:
- compress algorithm that the bloom filter apply
-
hash
public long hash(int value)
Description copied from interface:BloomFilterCompute hash for int value by using its plain encoding result.- Specified by:
hashin interfaceBloomFilter- Parameters:
value- the value to hash- Returns:
- hash result
-
hash
public long hash(long value)
Description copied from interface:BloomFilterCompute hash for long value by using its plain encoding result.- Specified by:
hashin interfaceBloomFilter- Parameters:
value- the value to hash- Returns:
- hash result
-
hash
public long hash(double value)
Description copied from interface:BloomFilterCompute hash for double value by using its plain encoding result.- Specified by:
hashin interfaceBloomFilter- Parameters:
value- the value to hash- Returns:
- hash result
-
hash
public long hash(float value)
Description copied from interface:BloomFilterCompute hash for float value by using its plain encoding result.- Specified by:
hashin interfaceBloomFilter- Parameters:
value- the value to hash- Returns:
- hash result
-
hash
public long hash(Binary value)
Description copied from interface:BloomFilterCompute hash for Binary value by using its plain encoding result.- Specified by:
hashin interfaceBloomFilter- Parameters:
value- the value to hash- Returns:
- hash result
-
canMergeFrom
public boolean canMergeFrom(BloomFilter otherBloomFilter)
Description copied from interface:BloomFilterDetermines whether a given Bloom filter can be merged into this Bloom filter. For two Bloom filters to merge, they must:- have the same bit size
- have the same algorithm
- have the same hash strategy
- Specified by:
canMergeFromin interfaceBloomFilter- Parameters:
otherBloomFilter- The Bloom filter to merge this Bloom filter with.
-
merge
public void merge(BloomFilter otherBloomFilter) throws IOException
Description copied from interface:BloomFilterMerges this Bloom filter with another Bloom filter by performing a bitwise OR of the underlying bitsets- Specified by:
mergein interfaceBloomFilter- Parameters:
otherBloomFilter- The Bloom filter to merge this Bloom filter with.- Throws:
IOException
-
-