com.datastax.spark.connector.rdd

CassandraRDD

class CassandraRDD[R] extends RDD[R] with Logging

RDD representing a Cassandra table. This class is the main entry point for analyzing data in Cassandra database with Spark. Obtain objects of this class by calling cassandraTable.

Configuration properties should be passed in the SparkConf configuration of SparkContext. CassandraRDD needs to open connection to Cassandra, therefore it requires appropriate connection property values to be present in SparkConf. For the list of required and available properties, see CassandraConnector.

CassandraRDD divides the dataset into smaller partitions, processed locally on every cluster node. A data partition consists of one or more contiguous token ranges. To reduce the number of roundtrips to Cassandra, every partition is fetched in batches. The following properties control the number of partitions and the fetch size:

A CassandraRDD object gets serialized and sent to every Spark executor. By default, reads are performed at ConsistencyLevel.LOCAL_ONE in order to leverage data-locality and minimize network traffic. This read consistency level is controlled by the following property:

If a Cassandra node fails or gets overloaded during read, queries are retried to a different node.

Linear Supertypes
RDD[R], Logging, Serializable, Serializable, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. CassandraRDD
  2. RDD
  3. Logging
  4. Serializable
  5. Serializable
  6. AnyRef
  7. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. def ++(other: RDD[R]): RDD[R]

    Definition Classes
    RDD
  5. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  6. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  7. def aggregate[U](zeroValue: U)(seqOp: (U, R) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): U

    Definition Classes
    RDD
  8. def as[B, A0, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11](f: (A0, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5], arg7: TypeConverter[A6], arg8: TypeConverter[A7], arg9: TypeConverter[A8], arg10: TypeConverter[A9], arg11: TypeConverter[A10], arg12: TypeConverter[A11]): CassandraRDD[B]

  9. def as[B, A0, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10](f: (A0, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5], arg7: TypeConverter[A6], arg8: TypeConverter[A7], arg9: TypeConverter[A8], arg10: TypeConverter[A9], arg11: TypeConverter[A10]): CassandraRDD[B]

  10. def as[B, A0, A1, A2, A3, A4, A5, A6, A7, A8, A9](f: (A0, A1, A2, A3, A4, A5, A6, A7, A8, A9) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5], arg7: TypeConverter[A6], arg8: TypeConverter[A7], arg9: TypeConverter[A8], arg10: TypeConverter[A9]): CassandraRDD[B]

  11. def as[B, A0, A1, A2, A3, A4, A5, A6, A7, A8](f: (A0, A1, A2, A3, A4, A5, A6, A7, A8) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5], arg7: TypeConverter[A6], arg8: TypeConverter[A7], arg9: TypeConverter[A8]): CassandraRDD[B]

  12. def as[B, A0, A1, A2, A3, A4, A5, A6, A7](f: (A0, A1, A2, A3, A4, A5, A6, A7) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5], arg7: TypeConverter[A6], arg8: TypeConverter[A7]): CassandraRDD[B]

  13. def as[B, A0, A1, A2, A3, A4, A5, A6](f: (A0, A1, A2, A3, A4, A5, A6) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5], arg7: TypeConverter[A6]): CassandraRDD[B]

  14. def as[B, A0, A1, A2, A3, A4, A5](f: (A0, A1, A2, A3, A4, A5) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5]): CassandraRDD[B]

  15. def as[B, A0, A1, A2, A3, A4](f: (A0, A1, A2, A3, A4) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4]): CassandraRDD[B]

  16. def as[B, A0, A1, A2, A3](f: (A0, A1, A2, A3) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3]): CassandraRDD[B]

  17. def as[B, A0, A1, A2](f: (A0, A1, A2) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2]): CassandraRDD[B]

  18. def as[B, A0, A1](f: (A0, A1) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1]): CassandraRDD[B]

  19. def as[B, A0](f: (A0) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0]): CassandraRDD[B]

    Maps each row into object of a different type using provided function taking column value(s) as argument(s).

    Maps each row into object of a different type using provided function taking column value(s) as argument(s). Can be used to convert each row to a tuple or a case class object:

    sc.cassandraTable("ks", "table").select("column1").as((s: String) => s)                 // yields CassandraRDD[String]
    sc.cassandraTable("ks", "table").select("column1", "column2").as((_: String, _: Long))  // yields CassandraRDD[(String, Long)]
    
    case class MyRow(key: String, value: Long)
    sc.cassandraTable("ks", "table").select("column1", "column2").as(MyRow)                 // yields CassandraRDD[MyRow]
  20. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  21. def cache(): RDD[R]

    Definition Classes
    RDD
  22. def cartesian[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(R, U)]

    Definition Classes
    RDD
  23. def checkpoint(): Unit

    Definition Classes
    RDD
  24. def clearDependencies(): Unit

    Attributes
    protected
    Definition Classes
    RDD
  25. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. def coalesce(numPartitions: Int, shuffle: Boolean): RDD[R]

    Definition Classes
    RDD
  27. def collect[U](f: PartialFunction[R, U])(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  28. def collect(): Array[R]

    Definition Classes
    RDD
  29. val columnNames: ColumnSelector

  30. def compute(split: Partition, context: TaskContext): Iterator[R]

    Definition Classes
    CassandraRDD → RDD
  31. def context: SparkContext

    Definition Classes
    RDD
  32. def count(): Long

    Definition Classes
    RDD
  33. def countApprox(timeout: Long, confidence: Double): PartialResult[BoundedDouble]

    Definition Classes
    RDD
  34. def countApproxDistinct(relativeSD: Double): Long

    Definition Classes
    RDD
  35. def countByValue(): Map[R, Long]

    Definition Classes
    RDD
  36. def countByValueApprox(timeout: Long, confidence: Double): PartialResult[Map[R, BoundedDouble]]

    Definition Classes
    RDD
  37. final def dependencies: Seq[Dependency[_]]

    Definition Classes
    RDD
  38. def distinct(): RDD[R]

    Definition Classes
    RDD
  39. def distinct(numPartitions: Int): RDD[R]

    Definition Classes
    RDD
  40. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  41. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  42. val fetchSize: Int

    How many rows are fetched at once from server

  43. def filter(f: (R) ⇒ Boolean): RDD[R]

    Definition Classes
    RDD
  44. def filterWith[A](constructA: (Int) ⇒ A)(p: (R, A) ⇒ Boolean)(implicit arg0: ClassTag[A]): RDD[R]

    Definition Classes
    RDD
  45. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  46. def first(): R

    Definition Classes
    RDD
  47. def firstParent[U](implicit arg0: ClassTag[U]): RDD[U]

    Attributes
    protected[org.apache.spark]
    Definition Classes
    RDD
  48. def flatMap[U](f: (R) ⇒ TraversableOnce[U])(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  49. def flatMapWith[A, U](constructA: (Int) ⇒ A, preservesPartitioning: Boolean)(f: (R, A) ⇒ Seq[U])(implicit arg0: ClassTag[A], arg1: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  50. def fold(zeroValue: R)(op: (R, R) ⇒ R): R

    Definition Classes
    RDD
  51. def foreach(f: (R) ⇒ Unit): Unit

    Definition Classes
    RDD
  52. def foreachPartition(f: (Iterator[R]) ⇒ Unit): Unit

    Definition Classes
    RDD
  53. def foreachWith[A](constructA: (Int) ⇒ A)(f: (R, A) ⇒ Unit)(implicit arg0: ClassTag[A]): Unit

    Definition Classes
    RDD
  54. var generator: String

    Definition Classes
    RDD
  55. def getCheckpointFile: Option[String]

    Definition Classes
    RDD
  56. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  57. def getDependencies: Seq[Dependency[_]]

    Attributes
    protected
    Definition Classes
    RDD
  58. def getPartitions: Array[Partition]

    Definition Classes
    CassandraRDD → RDD
  59. def getPreferredLocations(split: Partition): Seq[String]

    Attributes
    protected
    Definition Classes
    RDD
  60. def getStorageLevel: StorageLevel

    Definition Classes
    RDD
  61. def glom(): RDD[Array[R]]

    Definition Classes
    RDD
  62. def groupBy[K](f: (R) ⇒ K, p: Partitioner)(implicit arg0: ClassTag[K]): RDD[(K, Seq[R])]

    Definition Classes
    RDD
  63. def groupBy[K](f: (R) ⇒ K, numPartitions: Int)(implicit arg0: ClassTag[K]): RDD[(K, Seq[R])]

    Definition Classes
    RDD
  64. def groupBy[K](f: (R) ⇒ K)(implicit arg0: ClassTag[K]): RDD[(K, Seq[R])]

    Definition Classes
    RDD
  65. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  66. val id: Int

    Definition Classes
    RDD
  67. val inputConsistencyLevel: ConsistencyLevel

    consistency level for reads

  68. def isCheckpointed: Boolean

    Definition Classes
    RDD
  69. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  70. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  71. final def iterator(split: Partition, context: TaskContext): Iterator[R]

    Definition Classes
    RDD
  72. def keyBy[K](f: (R) ⇒ K): RDD[(K, R)]

    Definition Classes
    RDD
  73. val keyspaceName: String

  74. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  75. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  76. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  77. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  78. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  79. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  80. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  81. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  82. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  83. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  84. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  85. def map[U](f: (R) ⇒ U)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  86. def mapPartitions[U](f: (Iterator[R]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  87. def mapPartitionsWithContext[U](f: (TaskContext, Iterator[R]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  88. def mapPartitionsWithIndex[U](f: (Int, Iterator[R]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  89. def mapWith[A, U](constructA: (Int) ⇒ A, preservesPartitioning: Boolean)(f: (R, A) ⇒ U)(implicit arg0: ClassTag[A], arg1: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
  90. var name: String

    Definition Classes
    RDD
  91. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  92. final def notify(): Unit

    Definition Classes
    AnyRef
  93. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  94. val partitioner: Option[Partitioner]

    Definition Classes
    RDD
  95. final def partitions: Array[Partition]

    Definition Classes
    RDD
  96. def persist(): RDD[R]

    Definition Classes
    RDD
  97. def persist(newLevel: StorageLevel): RDD[R]

    Definition Classes
    RDD
  98. def pipe(command: Seq[String], env: Map[String, String], printPipeContext: ((String) ⇒ Unit) ⇒ Unit, printRDDElement: (R, (String) ⇒ Unit) ⇒ Unit): RDD[String]

    Definition Classes
    RDD
  99. def pipe(command: String, env: Map[String, String]): RDD[String]

    Definition Classes
    RDD
  100. def pipe(command: String): RDD[String]

    Definition Classes
    RDD
  101. final def preferredLocations(split: Partition): Seq[String]

    Definition Classes
    RDD
  102. def reduce(f: (R, R) ⇒ R): R

    Definition Classes
    RDD
  103. def repartition(numPartitions: Int): RDD[R]

    Definition Classes
    RDD
  104. def sample(withReplacement: Boolean, fraction: Double, seed: Int): RDD[R]

    Definition Classes
    RDD
  105. def saveAsObjectFile(path: String): Unit

    Definition Classes
    RDD
  106. def saveAsTextFile(path: String, codec: Class[_ <: CompressionCodec]): Unit

    Definition Classes
    RDD
  107. def saveAsTextFile(path: String): Unit

    Definition Classes
    RDD
  108. def select(columns: String*): CassandraRDD[R]

    Narrows down the selected set of columns.

    Narrows down the selected set of columns. Use this for better performance, when you don't need all the columns in the result RDD. When called multiple times, it selects the subset of the already selected columns, so after a column was removed by the previous select call, it is not possible to add it back.

  109. lazy val selectedColumnNames: Seq[String]

    Returns the names of columns to be selected from the table.

  110. def setGenerator(_generator: String): Unit

    Definition Classes
    RDD
  111. def setName(_name: String): RDD[R]

    Definition Classes
    RDD
  112. def sparkContext: SparkContext

    Definition Classes
    RDD
  113. val splitSize: Int

    How many rows to fetch in a single Spark Task.

  114. def subtract(other: RDD[R], p: Partitioner): RDD[R]

    Definition Classes
    RDD
  115. def subtract(other: RDD[R], numPartitions: Int): RDD[R]

    Definition Classes
    RDD
  116. def subtract(other: RDD[R]): RDD[R]

    Definition Classes
    RDD
  117. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  118. lazy val tableDef: TableDef

  119. val tableName: String

  120. def take(num: Int): Array[R]

    Definition Classes
    RDD
  121. def takeOrdered(num: Int)(implicit ord: Ordering[R]): Array[R]

    Definition Classes
    RDD
  122. def takeSample(withReplacement: Boolean, num: Int, seed: Int): Array[R]

    Definition Classes
    RDD
  123. def toArray(): Array[R]

    Definition Classes
    RDD
  124. def toDebugString: String

    Definition Classes
    RDD
  125. def toJavaRDD(): CassandraJavaRDD[R]

    Definition Classes
    CassandraRDD → RDD
  126. def toString(): String

    Definition Classes
    RDD → AnyRef → Any
  127. def top(num: Int)(implicit ord: Ordering[R]): Array[R]

    Definition Classes
    RDD
  128. def union(other: RDD[R]): RDD[R]

    Definition Classes
    RDD
  129. def unpersist(blocking: Boolean): RDD[R]

    Definition Classes
    RDD
  130. lazy val verify: Unit

  131. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  132. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  133. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  134. def where(cql: String, values: Any*): CassandraRDD[R]

    Adds a CQL WHERE predicate(s) to the query.

    Adds a CQL WHERE predicate(s) to the query. Useful for leveraging secondary indexes in Cassandra. Implicitly adds an ALLOW FILTERING clause to the WHERE clause, however beware that some predicates might be rejected by Cassandra, particularly in cases when they filter on an unindexed, non-clustering column.

  135. val where: CqlWhereClause

  136. def zip[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(R, U)]

    Definition Classes
    RDD
  137. def zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D])(f: (Iterator[R], Iterator[B], Iterator[C], Iterator[D]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  138. def zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D], preservesPartitioning: Boolean)(f: (Iterator[R], Iterator[B], Iterator[C], Iterator[D]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  139. def zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C])(f: (Iterator[R], Iterator[B], Iterator[C]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  140. def zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C], preservesPartitioning: Boolean)(f: (Iterator[R], Iterator[B], Iterator[C]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  141. def zipPartitions[B, V](rdd2: RDD[B])(f: (Iterator[R], Iterator[B]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]

    Definition Classes
    RDD
  142. def zipPartitions[B, V](rdd2: RDD[B], preservesPartitioning: Boolean)(f: (Iterator[R], Iterator[B]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]

    Definition Classes
    RDD

Deprecated Value Members

  1. def mapPartitionsWithSplit[U](f: (Int, Iterator[R]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]

    Definition Classes
    RDD
    Annotations
    @deprecated
    Deprecated

    (Since version 0.7.0) use mapPartitionsWithIndex

Inherited from RDD[R]

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped