Packages

package state

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. state
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Type Members

  1. case class AcquiredThreadInfo() extends Product with Serializable
  2. abstract class BaseStateStoreRDD[T, U] extends RDD[U]
  3. class ByteArrayPair extends AnyRef

    Mutable and reusable pair of byte arrays

  4. trait HDFSBackedStateStoreMap extends AnyRef
  5. class InvalidUnsafeRowException extends RuntimeException

    An exception thrown when an invalid UnsafeRow is detected in state store.

  6. class NoPrefixHDFSBackedStateStoreMap extends HDFSBackedStateStoreMap
  7. class NoPrefixKeyStateEncoder extends RocksDBStateEncoder

    Encodes/decodes UnsafeRows to versioned byte arrays.

    Encodes/decodes UnsafeRows to versioned byte arrays. It uses the first byte of the generated byte array to store the version the describes how the row is encoded in the rest of the byte array. Currently, the default version is 0,

    VERSION 0: [ VERSION (1 byte) | ROW (N bytes) ] The bytes of a UnsafeRow is written unmodified to starting from offset 1 (offset 0 is the version byte of value 0). That is, if the unsafe row has N bytes, then the generated array byte will be N+1 bytes.

  8. class PrefixKeyScanStateEncoder extends RocksDBStateEncoder
  9. class PrefixScannableHDFSBackedStateStoreMap extends HDFSBackedStateStoreMap
  10. trait ReadStateStore extends AnyRef

    Base trait for a versioned key-value store which provides read operations.

    Base trait for a versioned key-value store which provides read operations. Each instance of a ReadStateStore represents a specific version of state data, and such instances are created through a StateStoreProvider.

    abort method will be called when the task is completed - please clean up the resources in the method.

    IMPLEMENTATION NOTES: * The implementation can throw exception on calling prefixScan method if the functionality is not supported yet from the implementation. Note that some stateful operations would not work on disabling prefixScan functionality.

  11. class ReadStateStoreRDD[T, U] extends BaseStateStoreRDD[T, U]

    An RDD that allows computations to be executed against ReadStateStores.

    An RDD that allows computations to be executed against ReadStateStores. It uses the StateStoreCoordinator to get the locations of loaded state stores and use that as the preferred locations.

  12. class RocksDB extends Logging

    Class representing a RocksDB instance that checkpoints version of data to DFS.

    Class representing a RocksDB instance that checkpoints version of data to DFS. After a set of updates, a new version can be committed by calling commit(). Any past version can be loaded by calling load(version).

    Note

    This class is not thread-safe, so use it only from one thread.

    See also

    RocksDBFileManager to see how the files are laid out in local disk and DFS.

  13. case class RocksDBCheckpointMetadata(sstFiles: Seq[RocksDBSstFile], logFiles: Seq[RocksDBLogFile], numKeys: Long) extends Product with Serializable

    Classes to represent metadata of checkpoints saved to DFS.

    Classes to represent metadata of checkpoints saved to DFS. Since this is converted to JSON, any changes to this MUST be backward-compatible.

  14. case class RocksDBConf(minVersionsToRetain: Int, minDeltasForSnapshot: Int, compactOnCommit: Boolean, enableChangelogCheckpointing: Boolean, blockSizeKB: Long, blockCacheSizeMB: Long, lockAcquireTimeoutMs: Long, resetStatsOnLoad: Boolean, formatVersion: Int, trackTotalNumberOfRows: Boolean, maxOpenFiles: Int, writeBufferSizeMB: Long, maxWriteBufferNumber: Int, boundedMemoryUsage: Boolean, totalMemoryUsageMB: Long, writeBufferCacheRatio: Double, highPriorityPoolRatio: Double, compressionCodec: String) extends Product with Serializable

    Configurations for optimizing RocksDB

    Configurations for optimizing RocksDB

    compactOnCommit

    Whether to compact RocksDB data before commit / checkpointing

  15. class RocksDBFileManager extends Logging

    Class responsible for syncing RocksDB checkpoint files from local disk to DFS.

    Class responsible for syncing RocksDB checkpoint files from local disk to DFS. For each version, checkpoint is saved in specific directory structure that allows successive versions to reuse to SST data files and archived log files. This allows each commit to be incremental, only new SST files and archived log files generated by RocksDB will be uploaded. The directory structures on local disk and in DFS are as follows.

    Local checkpoint dir structure ------------------------------ RocksDB generates a bunch of files in the local checkpoint directory. The most important among them are the SST files; they are the actual log structured data files. Rest of the files contain the metadata necessary for RocksDB to read the SST files and start from the checkpoint. Note that the SST files are hard links to files in the RocksDB's working directory, and therefore successive checkpoints can share some of the SST files. So these SST files have to be copied to DFS in shared directory such that different committed versions can save them.

    We consider both SST files and archived log files as immutable files which can be shared between different checkpoints.

    localCheckpointDir | +-- OPTIONS-000005 +-- MANIFEST-000008 +-- CURRENT +-- 00007.sst +-- 00011.sst +-- archive | +-- 00008.log | +-- 00013.log ...

    DFS directory structure after saving to DFS as version 10 ----------------------------------------------------------- The SST and archived log files are given unique file names and copied to the shared subdirectory. Every version maintains a mapping of local immutable file name to the unique file name in DFS. This mapping is saved in a JSON file (named metadata), which is zipped along with other checkpoint files into a single file [version].zip.

    dfsRootDir | +-- SSTs | +-- 00007-[uuid1].sst | +-- 00011-[uuid2].sst +-- logs | +-- 00008-[uuid3].log | +-- 00013-[uuid4].log +-- 10.zip | +-- metadata <--- contains mapping between 00007.sst and [uuid1].sst, and the mapping between 00008.log and [uuid3].log | +-- OPTIONS-000005 | +-- MANIFEST-000008 | +-- CURRENT | ... | +-- 9.zip +-- 8.zip ...

    Note the following. - Each [version].zip is a complete description of all the data and metadata needed to recover a RocksDB instance at the corresponding version. The SST files and log files are not included in the zip files, they can be shared cross different versions. This is unlike the [version].delta files of HDFSBackedStateStore where previous delta files needs to be read to be recovered. - This is safe wrt speculatively executed tasks running concurrently in different executors as each task would upload a different copy of the generated immutable files and atomically update the [version].zip. - Immutable files are identified uniquely based on their file name and file size. - Immutable files can be reused only across adjacent checkpoints/versions. - This class is thread-safe. Specifically, it is safe to concurrently delete old files from a different thread than the task thread saving files.

  16. case class RocksDBFileManagerMetrics(filesCopied: Long, bytesCopied: Long, filesReused: Long, zipFileBytesUncompressed: Option[Long] = None) extends Product with Serializable

    Metrics regarding RocksDB file sync between local and DFS.

  17. case class RocksDBFileMappings(versionToRocksDBFiles: ConcurrentHashMap[Long, Seq[RocksDBImmutableFile]], localFilesToDfsFiles: ConcurrentHashMap[String, RocksDBImmutableFile]) extends Product with Serializable

    Track file mappings in RocksDB across local and remote directories

    Track file mappings in RocksDB across local and remote directories

    versionToRocksDBFiles

    Mapping of RocksDB files used across versions for maintenance

    localFilesToDfsFiles

    Mapping of the exact Dfs file used to create a local SST file The reason localFilesToDfsFiles is a separate map because versionToRocksDBFiles can contain multiple similar SST files to a particular local file (for example 1.sst can map to 1-UUID1.sst in v1 and 1-UUID2.sst in v2). We need to capture the exact file used to ensure Version ID compatibility across SST files and RocksDB manifest.

  18. sealed trait RocksDBImmutableFile extends AnyRef

    A RocksDBImmutableFile maintains a mapping between a local RocksDB file name and the name of its copy on DFS.

    A RocksDBImmutableFile maintains a mapping between a local RocksDB file name and the name of its copy on DFS. Since these files are immutable, their DFS copies can be reused.

  19. case class RocksDBMetrics(numCommittedKeys: Long, numUncommittedKeys: Long, totalMemUsageBytes: Long, pinnedBlocksMemUsage: Long, totalSSTFilesBytes: Long, nativeOpsHistograms: Map[String, RocksDBNativeHistogram], lastCommitLatencyMs: Map[String, Long], filesCopied: Long, bytesCopied: Long, filesReused: Long, zipFileBytesUncompressed: Option[Long], nativeOpsMetrics: Map[String, Long]) extends Product with Serializable

    Class to represent stats from each commit.

  20. case class RocksDBNativeHistogram(sum: Long, avg: Double, stddev: Double, median: Double, p95: Double, p99: Double, count: Long) extends Product with Serializable

    Class to wrap RocksDB's native histogram

  21. sealed trait RocksDBStateEncoder extends AnyRef
  22. class StateSchemaCompatibilityChecker extends Logging
  23. case class StateSchemaNotCompatible(message: String) extends Exception with Product with Serializable
  24. trait StateStore extends ReadStateStore

    Base trait for a versioned key-value store which provides both read and write operations.

    Base trait for a versioned key-value store which provides both read and write operations. Each instance of a StateStore represents a specific version of state data, and such instances are created through a StateStoreProvider.

    Unlike ReadStateStore, abort method may not be called if the commit method succeeds to commit the change. (hasCommitted returns true.) Otherwise, abort method will be called. Implementation should deal with resource cleanup in both methods, and also need to guard with double resource cleanup.

  25. class StateStoreChangelogReader extends NextIterator[(Array[Byte], Array[Byte])] with Logging

    Read an iterator of change record from the changelog file.

    Read an iterator of change record from the changelog file. A record is represented by ByteArrayPair(key: Array[Byte], value: Array[Byte]) A put record is returned as a ByteArrayPair(key, value) A delete record is return as a ByteArrayPair(key, null)

  26. class StateStoreChangelogWriter extends Logging

    Write changes to the key value state store instance to a changelog file.

    Write changes to the key value state store instance to a changelog file. There are 2 types of records, put and delete. A put record is written as: | key length | key content | value length | value content | A delete record is written as: | key length | key content | -1 | Write an Int -1 to signal the end of file. The overall changelog format is: | put record | delete record | ... | put record | -1 |

  27. class StateStoreConf extends Serializable

    A class that contains configuration parameters for StateStores.

  28. class StateStoreCoordinatorRef extends AnyRef

    Reference to a StateStoreCoordinator that can be used to coordinate instances of StateStores across all the executors, and get their locations for job scheduling.

  29. trait StateStoreCustomMetric extends AnyRef

    Name and description of custom implementation-specific metrics that a state store may wish to expose.

    Name and description of custom implementation-specific metrics that a state store may wish to expose. Also provides SQLMetric instance to show the metric in UI and accumulate it at the query level.

  30. case class StateStoreCustomSizeMetric(name: String, desc: String) extends StateStoreCustomMetric with Product with Serializable
  31. case class StateStoreCustomSumMetric(name: String, desc: String) extends StateStoreCustomMetric with Product with Serializable
  32. case class StateStoreCustomTimingMetric(name: String, desc: String) extends StateStoreCustomMetric with Product with Serializable
  33. case class StateStoreId(checkpointRootLocation: String, operatorId: Long, partitionId: Int, storeName: String = StateStoreId.DEFAULT_STORE_NAME) extends Product with Serializable

    Unique identifier for a bunch of keyed state data.

    Unique identifier for a bunch of keyed state data.

    checkpointRootLocation

    Root directory where all the state data of a query is stored

    operatorId

    Unique id of a stateful operator

    partitionId

    Index of the partition of an operators state data

    storeName

    Optional, name of the store. Each partition can optionally use multiple state stores, but they have to be identified by distinct names.

  34. case class StateStoreMetrics(numKeys: Long, memoryUsedBytes: Long, customMetrics: Map[StateStoreCustomMetric, Long]) extends Product with Serializable

    Metrics reported by a state store

    Metrics reported by a state store

    numKeys

    Number of keys in the state store

    memoryUsedBytes

    Memory used by the state store

    customMetrics

    Custom implementation-specific metrics The metrics reported through this must have the same name as those reported by StateStoreProvider.customMetrics.

  35. implicit class StateStoreOps[T] extends AnyRef
  36. trait StateStoreProvider extends AnyRef

    Trait representing a provider that provide StateStore instances representing versions of state data.

    Trait representing a provider that provide StateStore instances representing versions of state data.

    The life cycle of a provider and its provide stores are as follows.

    - A StateStoreProvider is created in a executor for each unique StateStoreId when the first batch of a streaming query is executed on the executor. All subsequent batches reuse this provider instance until the query is stopped.

    - Every batch of streaming data request a specific version of the state data by invoking getStore(version) which returns an instance of StateStore through which the required version of the data can be accessed. It is the responsible of the provider to populate this store with context information like the schema of keys and values, etc.

    - After the streaming query is stopped, the created provider instances are lazily disposed off.

  37. case class StateStoreProviderId(storeId: StateStoreId, queryRunId: UUID) extends Product with Serializable

    Unique identifier for a provider, used to identify when providers can be reused.

    Unique identifier for a provider, used to identify when providers can be reused. Note that queryRunId is used uniquely identify a provider, so that the same provider instance is not reused across query restarts.

  38. class StateStoreRDD[T, U] extends BaseStateStoreRDD[T, U]

    An RDD that allows computations to be executed against StateStores.

    An RDD that allows computations to be executed against StateStores. It uses the StateStoreCoordinator to get the locations of loaded state stores and use that as the preferred locations.

  39. sealed trait StreamingAggregationStateManager extends Serializable

    Base trait for state manager purposed to be used from streaming aggregations.

  40. abstract class StreamingAggregationStateManagerBaseImpl extends StreamingAggregationStateManager
  41. class StreamingAggregationStateManagerImplV1 extends StreamingAggregationStateManagerBaseImpl

    The implementation of StreamingAggregationStateManager for state version 1.

    The implementation of StreamingAggregationStateManager for state version 1. In state version 1, the schema of key and value in state are follow:

    - key: Same as key expressions. - value: Same as input row attributes. The schema of value contains key expressions as well.

  42. class StreamingAggregationStateManagerImplV2 extends StreamingAggregationStateManagerBaseImpl

    The implementation of StreamingAggregationStateManager for state version 2.

    The implementation of StreamingAggregationStateManager for state version 2. In state version 2, the schema of key and value in state are follow:

    - key: Same as key expressions. - value: The diff between input row attributes and key expressions.

    The schema of value is changed to optimize the memory/space usage in state, via removing duplicated columns in key-value pair. Hence key columns are excluded from the schema of value.

  43. class StreamingSessionWindowHelper extends AnyRef
  44. sealed trait StreamingSessionWindowStateManager extends Serializable
  45. class StreamingSessionWindowStateManagerImplV1 extends StreamingSessionWindowStateManager with Logging
  46. class SymmetricHashJoinStateManager extends Logging

    Helper class to manage state required by a single side of org.apache.spark.sql.execution.streaming.StreamingSymmetricHashJoinExec.

    Helper class to manage state required by a single side of org.apache.spark.sql.execution.streaming.StreamingSymmetricHashJoinExec. The interface of this class is basically that of a multi-map: - Get: Returns an iterator of multiple values for given key - Append: Append a new value to the given key - Remove Data by predicate: Drop any state using a predicate condition on keys or values

  47. class UnsafeRowPair extends AnyRef

    Mutable, and reusable class for representing a pair of UnsafeRows.

  48. class WrappedReadStateStore extends ReadStateStore

    Wraps the instance of StateStore to make the instance read-only.

Value Members

  1. object FlatMapGroupsWithStateExecHelper
  2. object HDFSBackedStateStoreMap
  3. object RocksDBCheckpointMetadata extends Serializable

    Helper class for RocksDBCheckpointMetadata

  4. object RocksDBConf extends Serializable
  5. object RocksDBFileManagerMetrics extends Serializable

    Metrics to return when requested but no operation has been performed.

  6. object RocksDBImmutableFile
  7. object RocksDBLoader extends Logging

    A wrapper for RocksDB library loading using an uninterruptible thread, as the native RocksDB code will throw an error when interrupted.

  8. object RocksDBMemoryManager extends Logging

    Singleton responsible for managing cache and write buffer manager associated with all RocksDB state store instances running on a single executor if boundedMemoryUsage is enabled for RocksDB.

    Singleton responsible for managing cache and write buffer manager associated with all RocksDB state store instances running on a single executor if boundedMemoryUsage is enabled for RocksDB. If boundedMemoryUsage is disabled, a new cache object is returned.

  9. object RocksDBMetrics extends Serializable
  10. object RocksDBNativeHistogram extends Serializable
  11. object RocksDBStateEncoder
  12. object RocksDBStateStoreProvider
  13. object SchemaHelper

    Helper classes for reading/writing state schema.

  14. object StateSchemaCompatibilityChecker
  15. object StateStore extends Logging

    Companion object to StateStore that provides helper methods to create and retrieve stores by their unique ids.

    Companion object to StateStore that provides helper methods to create and retrieve stores by their unique ids. In addition, when a SparkContext is active (i.e. SparkEnv.get is not null), it also runs a periodic background task to do maintenance on the loaded stores. For each store, it uses the StateStoreCoordinator to ensure whether the current loaded instance of the store is the active instance. Accordingly, it either keeps it loaded and performs maintenance, or unloads the store.

  16. object StateStoreConf extends Serializable
  17. object StateStoreCoordinatorRef extends Logging

    Helper object used to create reference to StateStoreCoordinator.

  18. object StateStoreId extends Serializable
  19. object StateStoreMetrics extends Serializable
  20. object StateStoreProvider
  21. object StateStoreProviderId extends Serializable
  22. object StreamingAggregationStateManager extends Logging with Serializable
  23. object StreamingSessionWindowStateManager extends Serializable
  24. object SymmetricHashJoinStateManager

Inherited from AnyRef

Inherited from Any

Ungrouped