com.datastax.spark.connector

RDDFunctions

class RDDFunctions[T] extends Serializable

Provides Cassandra-specific methods on RDD

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. RDDFunctions
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RDDFunctions(rdd: RDD[T])(implicit arg0: ClassTag[T])

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  12. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  14. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  15. final def notify(): Unit

    Definition Classes
    AnyRef
  16. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  17. def saveToCassandra(keyspaceName: String, tableName: String, columnNames: Seq[String], batchSize: Int)(implicit rwf: RowWriterFactory[T]): Unit

    Saves the data from RDD to a Cassandra table in batches of given size.

    Saves the data from RDD to a Cassandra table in batches of given size. Use this overload only if you find automatically tuned batch size doesn't result in optimal performance.

    Larger batches raise memory use by temporary buffers and may incur larger GC pressure on the server. Small batches would result in more roundtrips and worse throughput. Typically sending a few kilobytes of data per every batch is enough to achieve good performance.

    By default, writes are performed at ConsistencyLevel.ONE in order to leverage data-locality and minimize network traffic. This write consistency level is controlled by the following property:

    • spark.cassandra.output.consistency.level: consistency level for RDD writes, string matching the ConsistencyLevel enum name.
  18. def saveToCassandra(keyspaceName: String, tableName: String, columnNames: Seq[String])(implicit rwf: RowWriterFactory[T]): Unit

    Saves the data from RDD to a Cassandra table.

    Saves the data from RDD to a Cassandra table. The RDD object properties must match Cassandra table column names. Non-selected property/column names are left unchanged in Cassandra. All primary key columns must be selected.

    Example:

    CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 };
    CREATE TABLE test.words(word VARCHAR PRIMARY KEY, count INT, other VARCHAR);
    case class WordCount(word: String, count: Int, other: String)
    val rdd = sc.parallelize(Seq(WordCount("foo", 5, "bar")))
    rdd.saveToCassandra("test", "words", Seq("word", "count"))   // will not save the "other" column

    By default, writes are performed at ConsistencyLevel.ONE in order to leverage data-locality and minimize network traffic. This write consistency level is controlled by the following property:

    • spark.cassandra.output.consistency.level: consistency level for RDD writes, string matching the ConsistencyLevel enum name.
  19. def saveToCassandra(keyspaceName: String, tableName: String)(implicit rwf: RowWriterFactory[T]): Unit

    Saves the data from RDD to a Cassandra table.

    Saves the data from RDD to a Cassandra table. Saves all properties that have corresponding Cassandra columns. The underlying RDD class must provide data for all columns.

    Example:

    CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 };
    CREATE TABLE test.words(word VARCHAR PRIMARY KEY, count INT, other VARCHAR);
    case class WordCount(word: String, count: Int, other: String)
    val rdd = sc.parallelize(Seq(WordCount("foo", 5, "bar")))
    rdd.saveToCassandra("test", "words")

    By default, writes are performed at ConsistencyLevel.ONE in order to leverage data-locality and minimize network traffic. This write consistency level is controlled by the following property:

    • spark.cassandra.output.consistency.level: consistency level for RDD writes, string matching the ConsistencyLevel enum name.
  20. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  21. def toString(): String

    Definition Classes
    AnyRef → Any
  22. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped