Packages

c

org.apache.spark.sql.execution.datasources.parquet

VectorizedParquetRecordReader

class VectorizedParquetRecordReader extends SpecificParquetRecordReaderBase[AnyRef]

A specialized RecordReader that reads into InternalRows or ColumnarBatches directly using the Parquet column APIs. This is somewhat based on parquet-mr's ColumnReader.

TODO: decimal requiring more than 8 bytes, INT96. Schema mismatch. All of these can be handled efficiently and easily with codegen.

This class can either return InternalRows or ColumnarBatches. With whole stage codegen enabled, this class returns ColumnarBatches which offers significant performance gains. TODO: make this always return ColumnarBatches.

Linear Supertypes
SpecificParquetRecordReaderBase[AnyRef], RecordReader[Void, AnyRef], Closeable, AutoCloseable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. VectorizedParquetRecordReader
  2. SpecificParquetRecordReaderBase
  3. RecordReader
  4. Closeable
  5. AutoCloseable
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new VectorizedParquetRecordReader(useOffHeap: Boolean, capacity: Int)
  2. new VectorizedParquetRecordReader(convertTz: ZoneId, datetimeRebaseMode: String, datetimeRebaseTz: String, int96RebaseMode: String, int96RebaseTz: String, useOffHeap: Boolean, capacity: Int)

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  6. def close(): Unit
    Definition Classes
    VectorizedParquetRecordReaderSpecificParquetRecordReaderBase → RecordReader → Closeable → AutoCloseable
    Annotations
    @Override()
  7. def enableReturningBatches(): Unit

    Can be called before any rows are returned to enable returning columnar batches directly.

  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  11. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  12. def getCurrentKey(): Void
    Definition Classes
    SpecificParquetRecordReaderBase → RecordReader
    Annotations
    @Override()
  13. def getCurrentValue(): AnyRef
    Definition Classes
    VectorizedParquetRecordReader → RecordReader
    Annotations
    @Override()
  14. def getProgress(): Float
    Definition Classes
    VectorizedParquetRecordReader → RecordReader
    Annotations
    @Override()
  15. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  16. def initBatch(partitionColumns: StructType, partitionValues: InternalRow): Unit
  17. def initialize(fileSchema: MessageType, requestedSchema: MessageType, rowGroupReader: ParquetRowGroupReader, totalRowCount: Int): Unit
    Definition Classes
    VectorizedParquetRecordReaderSpecificParquetRecordReaderBase
    Annotations
    @VisibleForTesting() @Override()
  18. def initialize(path: String, columns: List[String]): Unit

    Utility API that will read all the data in path.

    Utility API that will read all the data in path. This circumvents the need to create Hadoop objects to use this class. columns can contain the list of columns to project.

    Definition Classes
    VectorizedParquetRecordReaderSpecificParquetRecordReaderBase
    Annotations
    @Override()
  19. def initialize(inputSplit: InputSplit, taskAttemptContext: TaskAttemptContext, fileFooter: Option[ParquetMetadata]): Unit
    Definition Classes
    VectorizedParquetRecordReaderSpecificParquetRecordReaderBase
    Annotations
    @Override()
  20. def initialize(inputSplit: InputSplit, taskAttemptContext: TaskAttemptContext): Unit

    Implementation of RecordReader API.

    Implementation of RecordReader API.

    Definition Classes
    VectorizedParquetRecordReaderSpecificParquetRecordReaderBase → RecordReader
    Annotations
    @Override()
  21. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  22. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  23. def nextBatch(): Boolean

    Advances to the next batch of rows.

    Advances to the next batch of rows. Returns false if there are no more.

  24. def nextKeyValue(): Boolean
    Definition Classes
    VectorizedParquetRecordReader → RecordReader
    Annotations
    @Override()
  25. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  26. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  27. def resultBatch(): ColumnarBatch

    Returns the ColumnarBatch object that will be used for all rows returned by this reader.

    Returns the ColumnarBatch object that will be used for all rows returned by this reader. This object is reused. Calling this enables the vectorized reader. This should be called before any calls to nextKeyValue/nextBatch.

  28. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  29. def toString(): String
    Definition Classes
    AnyRef → Any
  30. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  31. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  32. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()

Inherited from SpecificParquetRecordReaderBase[AnyRef]

Inherited from RecordReader[Void, AnyRef]

Inherited from Closeable

Inherited from AutoCloseable

Inherited from AnyRef

Inherited from Any

Ungrouped