Class BlockSplitBloomFilter

    • Constructor Detail

      • BlockSplitBloomFilter

        public BlockSplitBloomFilter​(int numBytes)
        Constructor of block-based Bloom filter.
        Parameters:
        numBytes - The number of bytes for Bloom filter bitset. The range of num_bytes should be within [DEFAULT_MINIMUM_BYTES, DEFAULT_MAXIMUM_BYTES], it will be rounded up/down to lower/upper bound if num_bytes is out of range. It will also be rounded up to a power of 2. It uses XXH64 as its default hash function.
      • BlockSplitBloomFilter

        public BlockSplitBloomFilter​(int numBytes,
                                     int maximumBytes)
        Constructor of block-based Bloom filter.
        Parameters:
        numBytes - The number of bytes for Bloom filter bitset. The range of num_bytes should be within [DEFAULT_MINIMUM_BYTES, maximumBytes], it will be rounded up/down to lower/upper bound if num_bytes is out of range. It will also be rounded up to a power of 2. It uses XXH64 as its default hash function.
        maximumBytes - The maximum bytes of the Bloom filter.
      • BlockSplitBloomFilter

        public BlockSplitBloomFilter​(int numBytes,
                                     int minimumBytes,
                                     int maximumBytes,
                                     BloomFilter.HashStrategy hashStrategy)
        Constructor of block-based Bloom filter.
        Parameters:
        numBytes - The number of bytes for Bloom filter bitset. The range of num_bytes should be within [minimumBytes, maximumBytes], it will be rounded up/down to lower/upper bound if num_bytes is out of range. It will also be rounded up to a power of 2.
        minimumBytes - The minimum bytes of the Bloom filter.
        maximumBytes - The maximum bytes of the Bloom filter.
        hashStrategy - The adopted hash strategy of the Bloom filter.
      • BlockSplitBloomFilter

        public BlockSplitBloomFilter​(byte[] bitset)
        Construct the Bloom filter with given bitset, it is used when reconstructing Bloom filter from parquet file. It use XXH64 as its default hash function.
        Parameters:
        bitset - The given bitset to construct Bloom filter.
    • Method Detail

      • writeTo

        public void writeTo​(OutputStream out)
                     throws IOException
        Description copied from interface: BloomFilter
        Write the Bloom filter to an output stream. It writes the Bloom filter header including the bitset's length in bytes, the hash strategy, the algorithm, and the bitset.
        Specified by:
        writeTo in interface BloomFilter
        Parameters:
        out - the output stream to write
        Throws:
        IOException
      • insertHash

        public void insertHash​(long hash)
        Description copied from interface: BloomFilter
        Insert an element to the Bloom filter, the element content is represented by the hash value of its plain encoding result.
        Specified by:
        insertHash in interface BloomFilter
        Parameters:
        hash - the hash result of element.
      • findHash

        public boolean findHash​(long hash)
        Description copied from interface: BloomFilter
        Determine whether an element is in set or not.
        Specified by:
        findHash in interface BloomFilter
        Parameters:
        hash - the hash value of element plain encoding result.
        Returns:
        false if element is must not in set, true if element probably in set.
      • optimalNumOfBits

        public static int optimalNumOfBits​(long n,
                                           double p)
        Calculate optimal size according to the number of distinct values and false positive probability.
        Parameters:
        n - : The number of distinct values.
        p - : The false positive probability.
        Returns:
        optimal number of bits of given n and p.
      • getBitsetSize

        public int getBitsetSize()
        Description copied from interface: BloomFilter
        Get the number of bytes for bitset in this Bloom filter.
        Specified by:
        getBitsetSize in interface BloomFilter
        Returns:
        The number of bytes for bitset in this Bloom filter.
      • hash

        public long hash​(Object value)
        Description copied from interface: BloomFilter
        Compute hash for Object value by using its plain encoding result.
        Specified by:
        hash in interface BloomFilter
        Parameters:
        value - the value to hash
        Returns:
        hash result
      • equals

        public boolean equals​(Object object)
        Description copied from interface: BloomFilter
        Compare this Bloom filter to the specified object.
        Specified by:
        equals in interface BloomFilter
        Overrides:
        equals in class Object
        Returns:
        true if the given object represents a Bloom filter equivalent to this Bloom filter, false otherwise.
      • hash

        public long hash​(int value)
        Description copied from interface: BloomFilter
        Compute hash for int value by using its plain encoding result.
        Specified by:
        hash in interface BloomFilter
        Parameters:
        value - the value to hash
        Returns:
        hash result
      • hash

        public long hash​(long value)
        Description copied from interface: BloomFilter
        Compute hash for long value by using its plain encoding result.
        Specified by:
        hash in interface BloomFilter
        Parameters:
        value - the value to hash
        Returns:
        hash result
      • hash

        public long hash​(double value)
        Description copied from interface: BloomFilter
        Compute hash for double value by using its plain encoding result.
        Specified by:
        hash in interface BloomFilter
        Parameters:
        value - the value to hash
        Returns:
        hash result
      • hash

        public long hash​(float value)
        Description copied from interface: BloomFilter
        Compute hash for float value by using its plain encoding result.
        Specified by:
        hash in interface BloomFilter
        Parameters:
        value - the value to hash
        Returns:
        hash result
      • hash

        public long hash​(Binary value)
        Description copied from interface: BloomFilter
        Compute hash for Binary value by using its plain encoding result.
        Specified by:
        hash in interface BloomFilter
        Parameters:
        value - the value to hash
        Returns:
        hash result
      • canMergeFrom

        public boolean canMergeFrom​(BloomFilter otherBloomFilter)
        Description copied from interface: BloomFilter
        Determines whether a given Bloom filter can be merged into this Bloom filter. For two Bloom filters to merge, they must:
        • have the same bit size
        • have the same algorithm
        • have the same hash strategy
        Specified by:
        canMergeFrom in interface BloomFilter
        Parameters:
        otherBloomFilter - The Bloom filter to merge this Bloom filter with.
      • merge

        public void merge​(BloomFilter otherBloomFilter)
                   throws IOException
        Description copied from interface: BloomFilter
        Merges this Bloom filter with another Bloom filter by performing a bitwise OR of the underlying bitsets
        Specified by:
        merge in interface BloomFilter
        Parameters:
        otherBloomFilter - The Bloom filter to merge this Bloom filter with.
        Throws:
        IOException