T - The type of the column this predicate is applied to.public abstract class UserDefinedPredicate<T extends Comparable<T>> extends Object
FilterApi.| Constructor and Description |
|---|
UserDefinedPredicate()
A udp must have a default constructor.
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
acceptsNullValue()
Returns whether this predicate accepts
null values. |
abstract boolean |
canDrop(Statistics<T> statistics)
Given information about a group of records (eg, the min and max value)
Return true to drop all the records in this group, false to keep them for further
inspection.
|
abstract boolean |
inverseCanDrop(Statistics<T> statistics)
Same as
canDrop(org.apache.parquet.filter2.predicate.Statistics<T>) except this method describes the logical inverse
behavior of this predicate. |
abstract boolean |
keep(T value)
Return true to keep the record with this value, false to drop it.
|
public UserDefinedPredicate()
FilterApi will not be serialized along with its state.
Only its class name will be recorded, it will be instantiated reflectively via the default
constructor.public boolean acceptsNullValue()
null values.true if this predicate accepts null values, false otherwisepublic abstract boolean keep(T value)
This method shall handle null values returning whether this user defined predicate accepts null
values or not.
value - a value (might be null)public abstract boolean canDrop(Statistics<T> statistics)
keep(T) to make the final decision.
It is safe to always return false here, if you simply want to visit each record via the keep(T) method,
though it is much more efficient to drop entire chunks of records here if you can.
statistics - statistics for the columnpublic abstract boolean inverseCanDrop(Statistics<T> statistics)
canDrop(org.apache.parquet.filter2.predicate.Statistics<T>) except this method describes the logical inverse
behavior of this predicate. If this predicate is passed to the not() operator, then
this method will be called instead of canDrop(org.apache.parquet.filter2.predicate.Statistics<T>)
It is safe to always return false here, if you simply want to visit each record via the keep(T) method,
though it is much more efficient to drop entire chunks of records here if you can.
It may be valid to simply return !canDrop(statistics) but that is not always the case. To illustrate, look at this re-implementation of a UDP that checks for values greater than 7:
// This is just an example, you should use the built in FilterApi.gt(C, T) operator instead of
// implementing your own like this.
public class IntGreaterThan7UDP extends UserDefinedPredicate<Integer> {
public boolean keep(Integer value) {
// here we just check if the value is greater than 7.
// here, parquet knows that if the predicate not(columnX, IntGreaterThan7UDP) is being evaluated,
// it is safe to simply use !IntEquals7UDP.keep(value)
return value > 7;
}
public boolean canDrop(Statistics<Integer> statistics) {
// here we drop a group of records if they are all less than or equal to 7,
// (there can't possibly be any values greater than 7 in this group of records)
return statistics.getMax() <= 7;
}
public boolean inverseCanDrop(Statistics<Integer> statistics) {
// here the predicate not(columnX, IntGreaterThan7UDP) is being evaluated, which means we want
// to keep all records whose value is is not greater than 7, or, rephrased, whose value is less than or equal to 7.
// notice what would happen if parquet just tried to evaluate !IntGreaterThan7UDP.canDrop():
// !IntGreaterThan7UDP.canDrop(stats) == !(stats.getMax() <= 7) == (stats.getMax() < 7)
// it would drop the following group of records: [100, 1, 2, 3], even though this group of records contains values
// less than than or equal to 7.
// what we actually want to do is drop groups of records where the *min* is greater than 7, (not the max)
// for example: the group of records: [100, 8, 9, 10] has a min of 8, so there's no way there are going
// to be records with a value
// less than or equal to 7 in this group.
return statistics.getMin() > 7;
}
}
statistics - statistics for the columnCopyright © 2024 The Apache Software Foundation. All rights reserved.