Maps each row into object of a different type using provided function taking column value(s) as argument(s).
Maps each row into object of a different type using provided function taking column value(s) as argument(s). Can be used to convert each row to a tuple or a case class object:
sc.cassandraTable("ks", "table").select("column1").as((s: String) => s) // yields CassandraRDD[String] sc.cassandraTable("ks", "table").select("column1", "column2").as((_: String, _: Long)) // yields CassandraRDD[(String, Long)] case class MyRow(key: String, value: Long) sc.cassandraTable("ks", "table").select("column1", "column2").as(MyRow) // yields CassandraRDD[MyRow]
How many rows are fetched at once from server
consistency level for reads
Narrows down the selected set of columns.
Narrows down the selected set of columns.
Use this for better performance, when you don't need all the columns in the result RDD.
When called multiple times, it selects the subset of the already selected columns, so
after a column was removed by the previous select call, it is not possible to
add it back.
Returns the names of columns to be selected from the table.
How many rows to fetch in a single Spark Task.
Adds a CQL WHERE predicate(s) to the query.
Adds a CQL WHERE predicate(s) to the query.
Useful for leveraging secondary indexes in Cassandra.
Implicitly adds an ALLOW FILTERING clause to the WHERE clause, however beware that some predicates
might be rejected by Cassandra, particularly in cases when they filter on an unindexed, non-clustering column.
RDD representing a Cassandra table. This class is the main entry point for analyzing data in Cassandra database with Spark. Obtain objects of this class by calling cassandraTable.
Configuration properties should be passed in the
SparkConfconfiguration ofSparkContext.CassandraRDDneeds to open connection to Cassandra, therefore it requires appropriate connection property values to be present inSparkConf. For the list of required and available properties, see CassandraConnector.CassandraRDDdivides the dataset into smaller partitions, processed locally on every cluster node. A data partition consists of one or more contiguous token ranges. To reduce the number of roundtrips to Cassandra, every partition is fetched in batches. The following properties control the number of partitions and the fetch size:A
CassandraRDDobject gets serialized and sent to every Spark executor. By default, reads are performed at ConsistencyLevel.LOCAL_ONE in order to leverage data-locality and minimize network traffic. This read consistency level is controlled by the following property:If a Cassandra node fails or gets overloaded during read, queries are retried to a different node.