Class DeltaBinaryPackingValuesWriter

  • Direct Known Subclasses:
    DeltaBinaryPackingValuesWriterForInteger, DeltaBinaryPackingValuesWriterForLong

    public abstract class DeltaBinaryPackingValuesWriter
    extends ValuesWriter
    Write integers with delta encoding and binary packing The format is as follows:
       
         delta-binary-packing: <page-header> <block>*
         page-header := <block size in values> <number of miniblocks in a block> <total value count> <first value>
         block := <min delta> <list of bitwidths of miniblocks> <miniblocks>
    
         min delta : zig-zag var int encoded
         bitWidthsOfMiniBlock : 1 byte little endian
         blockSizeInValues,blockSizeInValues,totalValueCount,firstValue : unsigned varint
       
     

    The algorithm and format is inspired by D. Lemire's paper: http://lemire.me/blog/archives/2012/09/12/fast-integer-compression-decoding-billions-of-integers-per-second/

    • Field Detail

      • DEFAULT_NUM_BLOCK_VALUES

        public static final int DEFAULT_NUM_BLOCK_VALUES
        See Also:
        Constant Field Values
      • baos

        protected final org.apache.parquet.bytes.CapacityByteArrayOutputStream baos
      • config

        protected final org.apache.parquet.column.values.delta.DeltaBinaryPackingConfig config
        stores blockSizeInValues, miniBlockNumInABlock and miniBlockSizeInValues
      • bitWidths

        protected final int[] bitWidths
        bit width for each mini block, reused between flushes
      • totalValueCount

        protected int totalValueCount
      • deltaValuesToFlush

        protected int deltaValuesToFlush
        a pointer to deltaBlockBuffer indicating the end of deltaBlockBuffer the number of values in the deltaBlockBuffer that haven't flushed to baos it will be reset after each flush
      • miniBlockByteBuffer

        protected byte[] miniBlockByteBuffer
        bytes buffer for a mini block, it is reused for each mini block. Therefore the size of biggest miniblock with bitwith of MAX_BITWITH is allocated
    • Constructor Detail

      • DeltaBinaryPackingValuesWriter

        public DeltaBinaryPackingValuesWriter​(int slabSize,
                                              int pageSize,
                                              org.apache.parquet.bytes.ByteBufferAllocator allocator)
      • DeltaBinaryPackingValuesWriter

        public DeltaBinaryPackingValuesWriter​(int blockSizeInValues,
                                              int miniBlockNum,
                                              int slabSize,
                                              int pageSize,
                                              org.apache.parquet.bytes.ByteBufferAllocator allocator)
    • Method Detail

      • getBufferedSize

        public long getBufferedSize()
        Description copied from class: ValuesWriter
        used to decide if we want to work to the next page
        Specified by:
        getBufferedSize in class ValuesWriter
        Returns:
        the size of the currently buffered data (in bytes)
      • writeBitWidthForMiniBlock

        protected void writeBitWidthForMiniBlock​(int i)
      • getMiniBlockCountToFlush

        protected int getMiniBlockCountToFlush​(double numberCount)
      • getEncoding

        public Encoding getEncoding()
        Description copied from class: ValuesWriter
        called after getBytes() and before reset()
        Specified by:
        getEncoding in class ValuesWriter
        Returns:
        the encoding that was used to encode the bytes
      • reset

        public void reset()
        Description copied from class: ValuesWriter
        called after getBytes() to reset the current buffer and start writing the next page
        Specified by:
        reset in class ValuesWriter
      • close

        public void close()
        Description copied from class: ValuesWriter
        Called to close the values writer. Any output stream is closed and can no longer be used. All resources are released.
        Overrides:
        close in class ValuesWriter