Class RunLengthBitPackingHybridEncoder


  • public class RunLengthBitPackingHybridEncoder
    extends Object
    Encodes values using a combination of run length encoding and bit packing, according to the following grammar:
     
     rle-bit-packed-hybrid: <length> <encoded-data>
     length := length of the <encoded-data> in bytes stored as 4 bytes little endian
     encoded-data := <run>*
     run := <bit-packed-run> | <rle-run>
     bit-packed-run := <bit-packed-header> <bit-packed-values>
     bit-packed-header := varint-encode(<bit-pack-count> << 1 | 1)
     // we always bit-pack a multiple of 8 values at a time, so we only store the number of values / 8
     bit-pack-count := (number of values in this run) / 8
     bit-packed-values :=  bit packed back to back, from LSB to MSB
     rle-run := <rle-header> <repeated-value>
     rle-header := varint-encode( (number of times repeated) << 1)
     repeated-value := value that is repeated, using a fixed-width of round-up-to-next-byte(bit-width)
     
     
    NOTE: this class is only responsible for creating and returning the <encoded-data> portion of the above grammar. The <length> portion is done by RunLengthBitPackingHybridValuesWriter

    Only supports positive values (including 0) // TODO: is that ok? Should we make a signed version?

    • Constructor Detail

      • RunLengthBitPackingHybridEncoder

        public RunLengthBitPackingHybridEncoder​(int bitWidth,
                                                int initialCapacity,
                                                int pageSize,
                                                org.apache.parquet.bytes.ByteBufferAllocator allocator)
    • Method Detail

      • reset

        public void reset()
        Reset this encoder for re-use
      • close

        public void close()
      • getBufferedSize

        public long getBufferedSize()
      • getAllocatedSize

        public long getAllocatedSize()