Class DictionaryValuesWriter
- java.lang.Object
-
- org.apache.parquet.column.values.ValuesWriter
-
- org.apache.parquet.column.values.dictionary.DictionaryValuesWriter
-
- All Implemented Interfaces:
RequiresFallback
- Direct Known Subclasses:
DictionaryValuesWriter.PlainBinaryDictionaryValuesWriter,DictionaryValuesWriter.PlainDoubleDictionaryValuesWriter,DictionaryValuesWriter.PlainFloatDictionaryValuesWriter,DictionaryValuesWriter.PlainIntegerDictionaryValuesWriter,DictionaryValuesWriter.PlainLongDictionaryValuesWriter
public abstract class DictionaryValuesWriter extends ValuesWriter implements RequiresFallback
Will attempt to encode values using a dictionary and fall back to plain encoding if the dictionary gets too big
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classDictionaryValuesWriter.PlainBinaryDictionaryValuesWriterstatic classDictionaryValuesWriter.PlainDoubleDictionaryValuesWriterstatic classDictionaryValuesWriter.PlainFixedLenArrayDictionaryValuesWriterstatic classDictionaryValuesWriter.PlainFloatDictionaryValuesWriterstatic classDictionaryValuesWriter.PlainIntegerDictionaryValuesWriterstatic classDictionaryValuesWriter.PlainLongDictionaryValuesWriter
-
Field Summary
Fields Modifier and Type Field Description protected org.apache.parquet.bytes.ByteBufferAllocatorallocatorprotected longdictionaryByteSizeprotected booleandictionaryTooBigprotected IntListencodedValuesprotected EncodingencodingForDictionaryPageprotected booleanfirstPageindicates if this is the first page being processedprotected intlastUsedDictionaryByteSizeprotected intlastUsedDictionarySizeprotected intmaxDictionaryByteSize
-
Constructor Summary
Constructors Modifier Constructor Description protectedDictionaryValuesWriter(int maxDictionaryByteSize, Encoding encodingForDataPage, Encoding encodingForDictionaryPage, org.apache.parquet.bytes.ByteBufferAllocator allocator)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract voidclearDictionaryContent()clear/free the underlying dictionary contentvoidclose()Called to close the values writer.protected DictionaryPagedictPage(ValuesWriter dictPageWriter)voidfallBackAllValuesTo(ValuesWriter writer)When falling back to a different encoding we must re-encode all the values seen so farprotected abstract voidfallBackDictionaryEncodedData(ValuesWriter writer)longgetAllocatedSize()longgetBufferedSize()used to decide if we want to work to the next pageorg.apache.parquet.bytes.BytesInputgetBytes()protected abstract intgetDictionarySize()EncodinggetEncoding()called after getBytes() and before reset()booleanisCompressionSatisfying(long rawSize, long encodedSize)Before writing the first page we will verify if the encoding is worth it.StringmemUsageString(String prefix)voidreset()called after getBytes() to reset the current buffer and start writing the next pagevoidresetDictionary()reset the dictionary when a new block startsbooleanshouldFallBack()In the case of a dictionary based encoding we will fallback if the dictionary becomes too big-
Methods inherited from class org.apache.parquet.column.values.ValuesWriter
toDictPageAndClose, writeBoolean, writeByte, writeBytes, writeDouble, writeFloat, writeInteger, writeLong
-
-
-
-
Field Detail
-
encodingForDictionaryPage
protected final Encoding encodingForDictionaryPage
-
maxDictionaryByteSize
protected final int maxDictionaryByteSize
-
dictionaryTooBig
protected boolean dictionaryTooBig
-
dictionaryByteSize
protected long dictionaryByteSize
-
lastUsedDictionaryByteSize
protected int lastUsedDictionaryByteSize
-
lastUsedDictionarySize
protected int lastUsedDictionarySize
-
encodedValues
protected IntList encodedValues
-
firstPage
protected boolean firstPage
indicates if this is the first page being processed
-
allocator
protected org.apache.parquet.bytes.ByteBufferAllocator allocator
-
-
Method Detail
-
dictPage
protected DictionaryPage dictPage(ValuesWriter dictPageWriter)
-
shouldFallBack
public boolean shouldFallBack()
Description copied from interface:RequiresFallbackIn the case of a dictionary based encoding we will fallback if the dictionary becomes too big- Specified by:
shouldFallBackin interfaceRequiresFallback- Returns:
- true to notify the parent that we should fallback to another encoding
-
isCompressionSatisfying
public boolean isCompressionSatisfying(long rawSize, long encodedSize)Description copied from interface:RequiresFallbackBefore writing the first page we will verify if the encoding is worth it. and fall back if a simpler encoding would be better in that case- Specified by:
isCompressionSatisfyingin interfaceRequiresFallback- Parameters:
rawSize- the size if encoded with plainencodedSize- the size as encoded by the current encoding- Returns:
- true if we keep this encoding
-
fallBackAllValuesTo
public void fallBackAllValuesTo(ValuesWriter writer)
Description copied from interface:RequiresFallbackWhen falling back to a different encoding we must re-encode all the values seen so far- Specified by:
fallBackAllValuesToin interfaceRequiresFallback- Parameters:
writer- the new encoder to write the current values to
-
fallBackDictionaryEncodedData
protected abstract void fallBackDictionaryEncodedData(ValuesWriter writer)
-
getBufferedSize
public long getBufferedSize()
Description copied from class:ValuesWriterused to decide if we want to work to the next page- Specified by:
getBufferedSizein classValuesWriter- Returns:
- the size of the currently buffered data (in bytes)
-
getAllocatedSize
public long getAllocatedSize()
Description copied from class:ValuesWriter- Specified by:
getAllocatedSizein classValuesWriter- Returns:
- the allocated size of the buffer
-
getBytes
public org.apache.parquet.bytes.BytesInput getBytes()
- Specified by:
getBytesin classValuesWriter- Returns:
- the bytes buffered so far to write to the current page
-
getEncoding
public Encoding getEncoding()
Description copied from class:ValuesWritercalled after getBytes() and before reset()- Specified by:
getEncodingin classValuesWriter- Returns:
- the encoding that was used to encode the bytes
-
reset
public void reset()
Description copied from class:ValuesWritercalled after getBytes() to reset the current buffer and start writing the next page- Specified by:
resetin classValuesWriter
-
close
public void close()
Description copied from class:ValuesWriterCalled to close the values writer. Any output stream is closed and can no longer be used. All resources are released.- Overrides:
closein classValuesWriter
-
resetDictionary
public void resetDictionary()
Description copied from class:ValuesWriterreset the dictionary when a new block starts- Overrides:
resetDictionaryin classValuesWriter
-
clearDictionaryContent
protected abstract void clearDictionaryContent()
clear/free the underlying dictionary content
-
getDictionarySize
protected abstract int getDictionarySize()
- Returns:
- size in items
-
memUsageString
public String memUsageString(String prefix)
- Specified by:
memUsageStringin classValuesWriter
-
-