Class Types


  • public class Types
    extends Object
    This class provides fluent builders that produce Parquet schema Types.

    The most basic use is to build primitive types:

       Types.required(INT64).named("id");
       Types.optional(INT32).named("number");
     

    The required(PrimitiveType.PrimitiveTypeName) factory method produces a primitive type builder, and the Types.Builder.named(String) builds the PrimitiveType. Between required and named, other builder methods can be used to add type annotations or other type metadata:

       Types.required(BINARY).as(UTF8).named("username");
       Types.optional(FIXED_LEN_BYTE_ARRAY).length(20).named("sha1");
     

    Optional types are built using optional(PrimitiveTypeName) to get the builder.

    Groups are built similarly, using requiredGroup() (or the optional version) to return a group builder. Group builders provide required and optional to add primitive types, which return primitive builders like the versions above.

       // This produces:
       // required group User {
       //   required int64 id;
       //   optional binary email (UTF8);
       // }
       Types.requiredGroup()
                .required(INT64).named("id")
                .optional(BINARY).as(UTF8).named("email")
            .named("User")
     

    When required is called on a group builder, the builder it returns will add the type to the parent group when it is built and named will return its parent group builder (instead of the type) so more fields can be added.

    Sub-groups can be created using requiredGroup() to get a group builder that will create the group type, add it to the parent builder, and return the parent builder for more fields.

       // required group User {
       //   required int64 id;
       //   optional binary email (UTF8);
       //   optional group address {
       //     required binary street (UTF8);
       //     required int32 zipcode;
       //   }
       // }
       Types.requiredGroup()
                .required(INT64).named("id")
                .optional(BINARY).as(UTF8).named("email")
                .optionalGroup()
                    .required(BINARY).as(UTF8).named("street")
                    .required(INT32).named("zipcode")
                .named("address")
            .named("User")
     

    Maps are built similarly, using requiredMap() (or the optionalMap() version) to return a map builder. Map builders provide key to add a primitive as key or a groupKey to add a group as key. key() returns a MapKey builder, which extends a primitive builder. On the other hand, groupKey() returns a MapGroupKey builder, which extends a group builder. A key in a map is always required.

    Once a key is built, a primitive map value can be built using requiredValue() (or the optionalValue() version) that returns MapValue builder. A group map value can be built using requiredGroupValue() (or the optionalGroupValue() version) that returns MapGroupValue builder.

       // required group zipMap (MAP) {
       //   repeated group map (MAP_KEY_VALUE) {
       //     required float key
       //     optional int32 value
       //   }
       // }
       Types.requiredMap()
                .key(FLOAT)
                .optionalValue(INT32)
            .named("zipMap")
    
    
       // required group zipMap (MAP) {
       //   repeated group map (MAP_KEY_VALUE) {
       //     required group key {
       //       optional int64 first;
       //       required group second {
       //         required float inner_id_1;
       //         optional int32 inner_id_2;
       //       }
       //     }
       //     optional group value {
       //       optional group localGeoInfo {
       //         required float inner_value_1;
       //         optional int32 inner_value_2;
       //       }
       //       optional int32 zipcode;
       //     }
       //   }
       // }
       Types.requiredMap()
                .groupKey()
                  .optional(INT64).named("id")
                  .requiredGroup()
                    .required(FLOAT).named("inner_id_1")
                    .required(FLOAT).named("inner_id_2")
                  .named("second")
                .optionalGroup()
                  .optionalGroup()
                    .required(FLOAT).named("inner_value_1")
                    .optional(INT32).named("inner_value_2")
                  .named("localGeoInfo")
                  .optional(INT32).named("zipcode")
            .named("zipMap")
     

    Message types are built using buildMessage() and function just like group builders.

       // message User {
       //   required int64 id;
       //   optional binary email (UTF8);
       //   optional group address {
       //     required binary street (UTF8);
       //     required int32 zipcode;
       //   }
       // }
       Types.buildMessage()
                .required(INT64).named("id")
                .optional(BINARY).as(UTF8).named("email")
                .optionalGroup()
                    .required(BINARY).as(UTF8).named("street")
                    .required(INT32).named("zipcode")
                .named("address")
            .named("User")
     

    These builders enforce consistency checks based on the specifications in the parquet-format documentation. For example, if DECIMAL is used to annotate a FIXED_LEN_BYTE_ARRAY that is not long enough for its maximum precision, these builders will throw an IllegalArgumentException:

       // throws IllegalArgumentException with message:
       // "FIXED(4) is not long enough to store 10 digits"
       Types.required(FIXED_LEN_BYTE_ARRAY).length(4)
            .as(DECIMAL).precision(10)
            .named("badDecimal");