Class UserDefinedPredicate<T extends Comparable<T>>
- java.lang.Object
-
- org.apache.parquet.filter2.predicate.UserDefinedPredicate<T>
-
- Type Parameters:
T- The type of the column this predicate is applied to.
public abstract class UserDefinedPredicate<T extends Comparable<T>> extends Object
A UserDefinedPredicate decides whether a record should be kept or dropped, first by inspecting meta data about a group of records to see if the entire group can be dropped, then by inspecting actual values of a single column. These predicates can be combined into a complex boolean expression via theFilterApi.
-
-
Constructor Summary
Constructors Constructor Description UserDefinedPredicate()A udp must have a default constructor.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description booleanacceptsNullValue()Returns whether this predicate acceptsnullvalues.abstract booleancanDrop(Statistics<T> statistics)Given information about a group of records (eg, the min and max value) Return true to drop all the records in this group, false to keep them for further inspection.abstract booleaninverseCanDrop(Statistics<T> statistics)Same ascanDrop(org.apache.parquet.filter2.predicate.Statistics<T>)except this method describes the logical inverse behavior of this predicate.abstract booleankeep(T value)Return true to keep the record with this value, false to drop it.
-
-
-
Constructor Detail
-
UserDefinedPredicate
public UserDefinedPredicate()
A udp must have a default constructor. The udp passed toFilterApiwill not be serialized along with its state. Only its class name will be recorded, it will be instantiated reflectively via the default constructor.
-
-
Method Detail
-
acceptsNullValue
public boolean acceptsNullValue()
Returns whether this predicate acceptsnullvalues.- Returns:
trueif this predicate acceptsnullvalues,falseotherwise
-
keep
public abstract boolean keep(T value)
Return true to keep the record with this value, false to drop it.This method shall handle
nullvalues returning whether this user defined predicate acceptsnullvalues or not.- Parameters:
value- a value (might benull)- Returns:
- true to keep the record with the value, false to drop it
-
canDrop
public abstract boolean canDrop(Statistics<T> statistics)
Given information about a group of records (eg, the min and max value) Return true to drop all the records in this group, false to keep them for further inspection. Returning false here will cause the records to be loaded and each value will be passed tokeep(T)to make the final decision. It is safe to always return false here, if you simply want to visit each record via thekeep(T)method, though it is much more efficient to drop entire chunks of records here if you can.- Parameters:
statistics- statistics for the column- Returns:
- true if none of the values described by statistics can match the predicate
-
inverseCanDrop
public abstract boolean inverseCanDrop(Statistics<T> statistics)
Same ascanDrop(org.apache.parquet.filter2.predicate.Statistics<T>)except this method describes the logical inverse behavior of this predicate. If this predicate is passed to the not() operator, then this method will be called instead ofcanDrop(org.apache.parquet.filter2.predicate.Statistics<T>)It is safe to always return false here, if you simply want to visit each record via thekeep(T)method, though it is much more efficient to drop entire chunks of records here if you can. It may be valid to simply return !canDrop(statistics) but that is not always the case. To illustrate, look at this re-implementation of a UDP that checks for values greater than 7:// This is just an example, you should use the built in
FilterApi.gt(C, T)operator instead of // implementing your own like this. public class IntGreaterThan7UDP extends UserDefinedPredicate<Integer> { public boolean keep(Integer value) { // here we just check if the value is greater than 7. // here, parquet knows that if the predicate not(columnX, IntGreaterThan7UDP) is being evaluated, // it is safe to simply use !IntEquals7UDP.keep(value) return value > 7; } public boolean canDrop(Statistics<Integer> statistics) { // here we drop a group of records if they are all less than or equal to 7, // (there can't possibly be any values greater than 7 in this group of records) return statistics.getMax() <= 7; } public boolean inverseCanDrop(Statistics<Integer> statistics) { // here the predicate not(columnX, IntGreaterThan7UDP) is being evaluated, which means we want // to keep all records whose value is is not greater than 7, or, rephrased, whose value is less than or equal to 7. // notice what would happen if parquet just tried to evaluate !IntGreaterThan7UDP.canDrop(): // !IntGreaterThan7UDP.canDrop(stats) == !(stats.getMax() <= 7) == (stats.getMax() < 7) // it would drop the following group of records: [100, 1, 2, 3], even though this group of records contains values // less than than or equal to 7. // what we actually want to do is drop groups of records where the *min* is greater than 7, (not the max) // for example: the group of records: [100, 8, 9, 10] has a min of 8, so there's no way there are going // to be records with a value // less than or equal to 7 in this group. return statistics.getMin() > 7; } }- Parameters:
statistics- statistics for the column- Returns:
- false if none of the values described by statistics can match the predicate
-
-