Class DeltaBinaryPackingValuesWriter
- java.lang.Object
-
- org.apache.parquet.column.values.ValuesWriter
-
- org.apache.parquet.column.values.delta.DeltaBinaryPackingValuesWriter
-
- Direct Known Subclasses:
DeltaBinaryPackingValuesWriterForInteger,DeltaBinaryPackingValuesWriterForLong
public abstract class DeltaBinaryPackingValuesWriter extends ValuesWriter
Write integers with delta encoding and binary packing The format is as follows:delta-binary-packing: <page-header> <block>* page-header := <block size in values> <number of miniblocks in a block> <total value count> <first value> block := <min delta> <list of bitwidths of miniblocks> <miniblocks> min delta : zig-zag var int encoded bitWidthsOfMiniBlock : 1 byte little endian blockSizeInValues,blockSizeInValues,totalValueCount,firstValue : unsigned varintThe algorithm and format is inspired by D. Lemire's paper: http://lemire.me/blog/archives/2012/09/12/fast-integer-compression-decoding-billions-of-integers-per-second/
-
-
Field Summary
Fields Modifier and Type Field Description protected org.apache.parquet.bytes.CapacityByteArrayOutputStreambaosprotected int[]bitWidthsbit width for each mini block, reused between flushesprotected org.apache.parquet.column.values.delta.DeltaBinaryPackingConfigconfigstores blockSizeInValues, miniBlockNumInABlock and miniBlockSizeInValuesstatic intDEFAULT_NUM_BLOCK_VALUESstatic intDEFAULT_NUM_MINIBLOCKSprotected intdeltaValuesToFlusha pointer to deltaBlockBuffer indicating the end of deltaBlockBuffer the number of values in the deltaBlockBuffer that haven't flushed to baos it will be reset after each flushprotected byte[]miniBlockByteBufferbytes buffer for a mini block, it is reused for each mini block.protected inttotalValueCount
-
Constructor Summary
Constructors Constructor Description DeltaBinaryPackingValuesWriter(int blockSizeInValues, int miniBlockNum, int slabSize, int pageSize, org.apache.parquet.bytes.ByteBufferAllocator allocator)DeltaBinaryPackingValuesWriter(int slabSize, int pageSize, org.apache.parquet.bytes.ByteBufferAllocator allocator)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()Called to close the values writer.longgetAllocatedSize()longgetBufferedSize()used to decide if we want to work to the next pageEncodinggetEncoding()called after getBytes() and before reset()protected intgetMiniBlockCountToFlush(double numberCount)StringmemUsageString(String prefix)voidreset()called after getBytes() to reset the current buffer and start writing the next pageprotected voidwriteBitWidthForMiniBlock(int i)-
Methods inherited from class org.apache.parquet.column.values.ValuesWriter
getBytes, resetDictionary, toDictPageAndClose, writeBoolean, writeByte, writeBytes, writeDouble, writeFloat, writeInteger, writeLong
-
-
-
-
Field Detail
-
DEFAULT_NUM_BLOCK_VALUES
public static final int DEFAULT_NUM_BLOCK_VALUES
- See Also:
- Constant Field Values
-
DEFAULT_NUM_MINIBLOCKS
public static final int DEFAULT_NUM_MINIBLOCKS
- See Also:
- Constant Field Values
-
baos
protected final org.apache.parquet.bytes.CapacityByteArrayOutputStream baos
-
config
protected final org.apache.parquet.column.values.delta.DeltaBinaryPackingConfig config
stores blockSizeInValues, miniBlockNumInABlock and miniBlockSizeInValues
-
bitWidths
protected final int[] bitWidths
bit width for each mini block, reused between flushes
-
totalValueCount
protected int totalValueCount
-
deltaValuesToFlush
protected int deltaValuesToFlush
a pointer to deltaBlockBuffer indicating the end of deltaBlockBuffer the number of values in the deltaBlockBuffer that haven't flushed to baos it will be reset after each flush
-
miniBlockByteBuffer
protected byte[] miniBlockByteBuffer
bytes buffer for a mini block, it is reused for each mini block. Therefore the size of biggest miniblock with bitwith of MAX_BITWITH is allocated
-
-
Constructor Detail
-
DeltaBinaryPackingValuesWriter
public DeltaBinaryPackingValuesWriter(int slabSize, int pageSize, org.apache.parquet.bytes.ByteBufferAllocator allocator)
-
DeltaBinaryPackingValuesWriter
public DeltaBinaryPackingValuesWriter(int blockSizeInValues, int miniBlockNum, int slabSize, int pageSize, org.apache.parquet.bytes.ByteBufferAllocator allocator)
-
-
Method Detail
-
getBufferedSize
public long getBufferedSize()
Description copied from class:ValuesWriterused to decide if we want to work to the next page- Specified by:
getBufferedSizein classValuesWriter- Returns:
- the size of the currently buffered data (in bytes)
-
writeBitWidthForMiniBlock
protected void writeBitWidthForMiniBlock(int i)
-
getMiniBlockCountToFlush
protected int getMiniBlockCountToFlush(double numberCount)
-
getEncoding
public Encoding getEncoding()
Description copied from class:ValuesWritercalled after getBytes() and before reset()- Specified by:
getEncodingin classValuesWriter- Returns:
- the encoding that was used to encode the bytes
-
reset
public void reset()
Description copied from class:ValuesWritercalled after getBytes() to reset the current buffer and start writing the next page- Specified by:
resetin classValuesWriter
-
close
public void close()
Description copied from class:ValuesWriterCalled to close the values writer. Any output stream is closed and can no longer be used. All resources are released.- Overrides:
closein classValuesWriter
-
getAllocatedSize
public long getAllocatedSize()
Description copied from class:ValuesWriter- Specified by:
getAllocatedSizein classValuesWriter- Returns:
- the allocated size of the buffer
-
memUsageString
public String memUsageString(String prefix)
- Specified by:
memUsageStringin classValuesWriter
-
-