- checkFieldPolicy(int, int, Object, String) - Method in class org.jwat.warc.WarcHeader
-
Given a WARC record type and a WARC field looks up the policy in a
matrix build from the WARC ISO standard.
- checkFields() - Method in class org.jwat.warc.WarcHeader
-
Validate the WARC header relative to the WARC-Type and according to the
WARC ISO standard.
- close() - Method in class org.jwat.warc.WarcFileWriter
-
Close writer and release all resources.
- close() - Method in class org.jwat.warc.WarcReader
-
Close current record resource(s) and input stream(s).
- close() - Method in class org.jwat.warc.WarcReaderCompressed
-
- close() - Method in class org.jwat.warc.WarcReaderUncompressed
-
- close() - Method in class org.jwat.warc.WarcRecord
-
Close resources associated with the WARC record.
- close() - Method in class org.jwat.warc.WarcWriter
-
Close WARC writer and free its resources.
- close() - Method in class org.jwat.warc.WarcWriterCompressed
-
- close() - Method in class org.jwat.warc.WarcWriterUncompressed
-
- closeRecord() - Method in class org.jwat.warc.WarcWriter
-
Close the WARC record in an implementation specific way.
- closeRecord() - Method in class org.jwat.warc.WarcWriterCompressed
-
- closeRecord() - Method in class org.jwat.warc.WarcWriterUncompressed
-
- closeRecord_impl() - Method in class org.jwat.warc.WarcWriter
-
Closes the WARC record by writing two newlines and comparing the amount of
payload data streamed with the content-length supplied with the header.
- computedBlockDigest - Variable in class org.jwat.warc.WarcRecord
-
Computed block digest.
- computedPayloadDigest - Variable in class org.jwat.warc.WarcRecord
-
Computed payload digest.
- consumed - Variable in class org.jwat.warc.WarcReader
-
Number of bytes consumed by this reader.
- consumed - Variable in class org.jwat.warc.WarcRecord
-
Uncompressed bytes consumed while validating this record.
- CONTENT_TYPE_FORMAT - Static variable in class org.jwat.warc.WarcConstants
-
Content-type format string as specified in RFC2616.
- CONTENT_TYPE_METADATA - Static variable in class org.jwat.warc.WarcConstants
-
Suggested content-type for metadata records and others.
- contentLength - Variable in class org.jwat.warc.WarcHeader
-
Content-Length converted to a Long object, if valid.
- contentLengthStr - Variable in class org.jwat.warc.WarcHeader
-
Content-Length field string value.
- contentType - Variable in class org.jwat.warc.WarcHeader
-
Content-Type converted to a ContentType object, if valid.
- contentTypeStr - Variable in class org.jwat.warc.WarcHeader
-
Content-Type field string value.
- createRecord(WarcWriter) - Static method in class org.jwat.warc.WarcRecord
-
Create a WarcRecord and prepare it for writing.
- createWarcDigest(String, byte[], String, String) - Static method in class org.jwat.warc.WarcDigest
-
Create an object with the supplied parameters.
- CT_APP_WARC_FIELDS - Static variable in class org.jwat.warc.WarcConstants
-
Suggested content-type/media-type for metadata records and others.
- currentEntry - Variable in class org.jwat.warc.WarcReaderCompressed
-
GZip entry for the current record, if random access methods used.
- currentReader - Variable in class org.jwat.warc.WarcReaderCompressed
-
GZip reader used for the current record, if random access methods used.
- currentRecord - Variable in class org.jwat.warc.WarcReader
-
Current WARC record object.
- FDT_CONTENTTYPE - Static variable in class org.jwat.warc.WarcConstants
-
WARC ContentType field datatype identifier.
- FDT_DATE - Static variable in class org.jwat.warc.WarcConstants
-
WARC Date field datatype identifier.
- FDT_DIGEST - Static variable in class org.jwat.warc.WarcConstants
-
WARC Digest field datatype identifier.
- FDT_IDX_STRINGS - Static variable in class org.jwat.warc.WarcConstants
-
WARC field datatype id to field datatype name mapping table.
- FDT_INETADDRESS - Static variable in class org.jwat.warc.WarcConstants
-
WARC InetAddress field datatype identifier.
- FDT_INTEGER - Static variable in class org.jwat.warc.WarcConstants
-
WARC Integer field datatype identifier.
- FDT_LONG - Static variable in class org.jwat.warc.WarcConstants
-
WARC Long field datatype identifier.
- FDT_STRING - Static variable in class org.jwat.warc.WarcConstants
-
WARC String field datatype identifier.
- FDT_URI - Static variable in class org.jwat.warc.WarcConstants
-
WARC URI field datatype identifier.
- field_policy - Static variable in class org.jwat.warc.WarcConstants
-
A (Warc-Types x Warc-Header-Fields) matrix used for policy validation.
- fieldNameIdxMap - Static variable in class org.jwat.warc.WarcConstants
-
Map used to identify known warc field names.
- fieldNamesRepeatableLookup - Static variable in class org.jwat.warc.WarcConstants
-
Lookup table of Warc fields that can have multiple occurrences.
- fieldParsers - Variable in class org.jwat.warc.WarcHeader
-
WARC field parser used.
- fieldParsers - Variable in class org.jwat.warc.WarcReader
-
WARC field parser used.
- fieldParsers - Variable in class org.jwat.warc.WarcWriter
-
WARC field parser used.
- filename - Variable in class org.jwat.warc.WarcFileNamingSingleFile
-
File name to use.
- filePrefix - Variable in class org.jwat.warc.WarcFileNamingDefault
-
Prefix component.
- FN_CONTENT_LENGTH - Static variable in class org.jwat.warc.WarcConstants
-
Content-length field name.
- FN_CONTENT_TYPE - Static variable in class org.jwat.warc.WarcConstants
-
Content-type field name.
- FN_IDX_CONTENT_LENGTH - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader content-length field name id.
- FN_IDX_CONTENT_TYPE - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader content-type field name id.
- FN_IDX_DT - Static variable in class org.jwat.warc.WarcConstants
-
Array to lookup WARC field datatypes.
- FN_IDX_STRINGS - Static variable in class org.jwat.warc.WarcConstants
-
WARC field name id to field name mapping table.
- FN_IDX_WARC_BLOCK_DIGEST - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-block-digest field name id.
- FN_IDX_WARC_CONCURRENT_TO - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-concurrent-to field name id.
- FN_IDX_WARC_DATE - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-date field name id.
- FN_IDX_WARC_FILENAME - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-filename field name id.
- FN_IDX_WARC_IDENTIFIED_PAYLOAD_TYPE - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-identified-payload-type field name id.
- FN_IDX_WARC_IP_ADDRESS - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-ip-address field name id.
- FN_IDX_WARC_PAYLOAD_DIGEST - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-payload-digest field name id.
- FN_IDX_WARC_PROFILE - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-profile field name id.
- FN_IDX_WARC_RECORD_ID - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-record-id field name id.
- FN_IDX_WARC_REFERS_TO - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-refers-to field name id.
- FN_IDX_WARC_REFERS_TO_DATE - Static variable in class org.jwat.warc.WarcConstants
-
WARC-Refers-To-Date field name id.
- FN_IDX_WARC_REFERS_TO_TARGET_URI - Static variable in class org.jwat.warc.WarcConstants
-
WARC-Refers-To-Target-URI field name id.
- FN_IDX_WARC_SEGMENT_NUMBER - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-segment-number field name id.
- FN_IDX_WARC_SEGMENT_ORIGIN_ID - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-segment-origin-id field name id.
- FN_IDX_WARC_SEGMENT_TOTAL_LENGTH - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-segment-totalt-length field name id.
- FN_IDX_WARC_TARGET_URI - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-target-uri field name id.
- FN_IDX_WARC_TRUNCATED - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-truncated field name id.
- FN_IDX_WARC_TYPE - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-type field name id.
- FN_IDX_WARC_WARCINFO_ID - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader warc-warcinfo-id field name id.
- FN_INDEX_OF_LAST - Static variable in class org.jwat.warc.WarcConstants
-
Index of last WARC field (zero-indexed).
- FN_NUMBER - Static variable in class org.jwat.warc.WarcConstants
-
Number of WARC fields.
- FN_WARC_BLOCK_DIGEST - Static variable in class org.jwat.warc.WarcConstants
-
Warc-block-digest field name.
- FN_WARC_CONCURRENT_TO - Static variable in class org.jwat.warc.WarcConstants
-
Warc-concurrent-to field name.
- FN_WARC_DATE - Static variable in class org.jwat.warc.WarcConstants
-
Warc-date field name.
- FN_WARC_FILENAME - Static variable in class org.jwat.warc.WarcConstants
-
Warc-filename field name.
- FN_WARC_IDENTIFIED_PAYLOAD_TYPE - Static variable in class org.jwat.warc.WarcConstants
-
Warc-identified-payload-type field name.
- FN_WARC_IP_ADDRESS - Static variable in class org.jwat.warc.WarcConstants
-
Warc-ip-address field name.
- FN_WARC_PAYLOAD_DIGEST - Static variable in class org.jwat.warc.WarcConstants
-
Warc-payload-digest field name.
- FN_WARC_PROFILE - Static variable in class org.jwat.warc.WarcConstants
-
Warc-profile field name.
- FN_WARC_RECORD_ID - Static variable in class org.jwat.warc.WarcConstants
-
Warc-record-id field name.
- FN_WARC_REFERS_TO - Static variable in class org.jwat.warc.WarcConstants
-
Warc-refers-to field name.
- FN_WARC_REFERS_TO_DATE - Static variable in class org.jwat.warc.WarcConstants
-
WARC-Refers-To-Date field name.
- FN_WARC_REFERS_TO_TARGET_URI - Static variable in class org.jwat.warc.WarcConstants
-
WARC-Refers-To-Target-URI field name.
- FN_WARC_SEGMENT_NUMBER - Static variable in class org.jwat.warc.WarcConstants
-
Warc-segment-number field name.
- FN_WARC_SEGMENT_ORIGIN_ID - Static variable in class org.jwat.warc.WarcConstants
-
Warc-segment-origin-id field name.
- FN_WARC_SEGMENT_TOTAL_LENGTH - Static variable in class org.jwat.warc.WarcConstants
-
Warc-segment-totalt-length field name.
- FN_WARC_TARGET_URI - Static variable in class org.jwat.warc.WarcConstants
-
Warc-target-uri field name.
- FN_WARC_TRUNCATED - Static variable in class org.jwat.warc.WarcConstants
-
Warc-truncated field name.
- FN_WARC_TYPE - Static variable in class org.jwat.warc.WarcConstants
-
Warc-type field name.
- FN_WARC_WARCINFO_ID - Static variable in class org.jwat.warc.WarcConstants
-
Warc-warcinfo-id field name.
- getBlockDigestAlgorithm() - Method in class org.jwat.warc.WarcReader
-
Get the default block digest algorithm.
- getBlockDigestEnabled() - Method in class org.jwat.warc.WarcReader
-
Get the readers block digest on/off status.
- getBlockDigestEncoding() - Method in class org.jwat.warc.WarcReader
-
Get the default block digest encoding scheme.
- getConsumed() - Method in class org.jwat.warc.WarcReader
-
Get number of bytes consumed by this reader.
- getConsumed() - Method in class org.jwat.warc.WarcReaderCompressed
-
Get number of bytes consumed by the WARC GzipReader.
- getConsumed() - Method in class org.jwat.warc.WarcReaderUncompressed
-
- getConsumed() - Method in class org.jwat.warc.WarcRecord
-
Return number of uncompressed bytes consumed validating this record.
- getDate(String) - Static method in class org.jwat.warc.WarcDateParser
-
Parses the date using the format "yyyy-MM-ddTHH:mm:ssZ".
- getDateFormat() - Static method in class org.jwat.warc.WarcDateParser
-
Return a DateFormat object which can be used to string
format WARC dates.
- getFile() - Method in class org.jwat.warc.WarcFileWriter
-
Returns the current EARC file object.
- getFilename(int, boolean) - Method in interface org.jwat.warc.WarcFileNaming
-
Return the next file name to use.
- getFilename(int, boolean) - Method in class org.jwat.warc.WarcFileNamingDefault
-
- getFilename(int, boolean) - Method in class org.jwat.warc.WarcFileNamingSingleFile
-
- getHeader(String) - Method in class org.jwat.warc.WarcHeader
-
Get a header line structure or null, if no header line structure is
stored with the given header name.
- getHeader(String) - Method in class org.jwat.warc.WarcRecord
-
Get a non-standard WARC header or null, if nothing is stored for this
header name.
- getHeaderList() - Method in class org.jwat.warc.WarcHeader
-
Get a List of all the headers found during parsing.
- getHeaderList() - Method in class org.jwat.warc.WarcRecord
-
Get a List of all the non-standard WARC headers found
during parsing.
- getHttpHeader() - Method in class org.jwat.warc.WarcRecord
-
Returns the HttpHeader object if identified in the payload,
or null.
- getIteratorExceptionThrown() - Method in class org.jwat.warc.WarcReader
-
Gets an exception thrown in the iterator if any or null.
- getNextRecord() - Method in class org.jwat.warc.WarcReader
-
Parses and gets the next record.
- getNextRecord() - Method in class org.jwat.warc.WarcReaderCompressed
-
- getNextRecord() - Method in class org.jwat.warc.WarcReaderUncompressed
-
- getNextRecordFrom(InputStream, long) - Method in class org.jwat.warc.WarcReader
-
Parses and gets the next record from an Inputstream.
- getNextRecordFrom(InputStream, long, int) - Method in class org.jwat.warc.WarcReader
-
Parses and gets the next record from an Inputstream wrapped
by a BufferedInputStream.
- getNextRecordFrom(InputStream, long) - Method in class org.jwat.warc.WarcReaderCompressed
-
- getNextRecordFrom(InputStream, long, int) - Method in class org.jwat.warc.WarcReaderCompressed
-
- getNextRecordFrom(InputStream, long) - Method in class org.jwat.warc.WarcReaderUncompressed
-
- getNextRecordFrom(InputStream, long, int) - Method in class org.jwat.warc.WarcReaderUncompressed
-
- getOffset() - Method in class org.jwat.warc.WarcReader
-
Get the current offset in the WARC InputStream.
- getOffset() - Method in class org.jwat.warc.WarcReaderCompressed
-
Get the current offset in the WARC GzipReader.
- getOffset() - Method in class org.jwat.warc.WarcReaderUncompressed
-
- getPayload() - Method in class org.jwat.warc.WarcRecord
-
Return Payload object.
- getPayloadContent() - Method in class org.jwat.warc.WarcRecord
-
Payload content InputStream getter.
- getPayloadDigestAlgorithm() - Method in class org.jwat.warc.WarcReader
-
Get the default payload digest algorithm.
- getPayloadDigestEnabled() - Method in class org.jwat.warc.WarcReader
-
Get the readers payload digest on/off status.
- getPayloadDigestEncoding() - Method in class org.jwat.warc.WarcReader
-
Get the default payload digest encoding scheme.
- getPayloadHeaderMaxSize() - Method in class org.jwat.warc.WarcReader
-
Get the max size allowed for a payload header.
- getReader(InputStream, int) - Static method in class org.jwat.warc.WarcReaderFactory
-
Creates a new WarcReader from an InputStream
wrapped by a BufferedInputStream.
- getReader(InputStream) - Static method in class org.jwat.warc.WarcReaderFactory
-
Creates a new WarcReader from an InputStream.
- getReaderCompressed() - Static method in class org.jwat.warc.WarcReaderFactory
-
Creates a new WarcReader without any associated
InputStream for random access to GZip compressed records.
- getReaderCompressed(InputStream) - Static method in class org.jwat.warc.WarcReaderFactory
-
Creates a new WarcReader from an InputStream
primarily for random access to GZip compressed records.
- getReaderCompressed(InputStream, int) - Static method in class org.jwat.warc.WarcReaderFactory
-
Creates a new WarcReader from an InputStream
wrapped by a BufferedInputStream primarily for random
access to GZip compressed records.
- getReaderUncompressed() - Static method in class org.jwat.warc.WarcReaderFactory
-
Creates a new WarcReader without any associated
InputStream for random access to uncompressed records.
- getReaderUncompressed(InputStream) - Static method in class org.jwat.warc.WarcReaderFactory
-
Creates a new WarcReader from an InputStream
primarily for random access to uncompressed records.
- getReaderUncompressed(InputStream, int) - Static method in class org.jwat.warc.WarcReaderFactory
-
Creates a new WarcReader from an InputStream
wrapped by a BufferedInputStream primarily for random
access to uncompressed records.
- getRecordHeaderMaxSize() - Method in class org.jwat.warc.WarcReader
-
Get the max size allowed for a record header.
- getSequenceNr() - Method in class org.jwat.warc.WarcFileWriter
-
Returns the current sequence number.
- getStartOffset() - Method in class org.jwat.warc.WarcHeader
-
Returns the starting offset of the record in the containing WARC.
- getStartOffset() - Method in class org.jwat.warc.WarcReader
-
Get the offset of the current WARC record or -1 if none have been read.
- getStartOffset() - Method in class org.jwat.warc.WarcReaderCompressed
-
Get the offset of the current WARC record from the GZip entry or -1 if
no records have been read yet.
- getStartOffset() - Method in class org.jwat.warc.WarcReaderUncompressed
-
- getStartOffset() - Method in class org.jwat.warc.WarcRecord
-
Get the record offset relative to the start of the WARC file
InputStream.
- getUriProfile() - Method in class org.jwat.warc.WarcReader
-
Get the URI profile used to validate URIs.
- getUriProfile() - Method in class org.jwat.warc.WarcWriter
-
Get the URI profile used to validate URIs.
- getWarcTargetUriProfile() - Method in class org.jwat.warc.WarcReader
-
Get the URI profile used to validate WARC-Target URIs.
- getWarcTargetUriProfile() - Method in class org.jwat.warc.WarcWriter
-
Get the URI profile used to validate WARC-Target URIs.
- getWarcWriterInstance(WarcFileNaming, WarcFileWriterConfig) - Static method in class org.jwat.warc.WarcFileWriter
-
Returns a configured WARC file writer.
- getWriter() - Method in class org.jwat.warc.WarcFileWriter
-
Returns the current WARC writer object.
- getWriter(OutputStream, boolean) - Static method in class org.jwat.warc.WarcWriterFactory
-
Creates a new unbuffered WarcWriter from an
OutputStream.
- getWriter(OutputStream, int, boolean) - Static method in class org.jwat.warc.WarcWriterFactory
-
Creates a new buffered WarcWriter from an
OutputStream.
- getWriterCompressed(OutputStream) - Static method in class org.jwat.warc.WarcWriterFactory
-
Creates a new unbuffered compressing WarcWriter from an
OutputStream.
- getWriterCompressed(OutputStream, int) - Static method in class org.jwat.warc.WarcWriterFactory
-
Creates a new buffered compressing WarcWriter from an
OutputStream.
- getWriterUncompressed(OutputStream) - Static method in class org.jwat.warc.WarcWriterFactory
-
Creates a new unbuffered non compressing WarcWriter from an
OutputStream.
- getWriterUncompressed(OutputStream, int) - Static method in class org.jwat.warc.WarcWriterFactory
-
Creates a new buffered non compressing WarcWriter from an
OutputStream.
- P_IDX_STRINGS - Static variable in class org.jwat.warc.WarcConstants
-
WARC profile id to field name mapping table.
- parseContentType(String, String) - Method in class org.jwat.warc.WarcFieldParsers
-
Parse and validate content-type string with optional parameters.
- parseDate(String, String) - Method in class org.jwat.warc.WarcFieldParsers
-
Parses WARC record date.
- parseDigest(String, String) - Method in class org.jwat.warc.WarcFieldParsers
-
Parse and validate WARC digest string.
- parseHeader(ByteCountingPushBackInputStream) - Method in class org.jwat.warc.WarcHeader
-
Try to parse a WARC header and return a boolean indicating the success or
failure of this.
- parseHeaders(ByteCountingPushBackInputStream) - Method in class org.jwat.warc.WarcHeader
-
Reads WARC header lines one line at a time until an empty line is
encountered.
- parseInteger(String, String) - Method in class org.jwat.warc.WarcFieldParsers
-
Returns an Integer object holding the value of the specified string.
- parseIpAddress(String, String) - Method in class org.jwat.warc.WarcFieldParsers
-
Parse and validate an IP address.
- parseLong(String, String) - Method in class org.jwat.warc.WarcFieldParsers
-
Returns a Long object holding the value of the specified string.
- parseRecord(ByteCountingPushBackInputStream, WarcReader) - Static method in class org.jwat.warc.WarcRecord
-
Given an InputStream it tries to read and validate a WARC
header block.
- parseString(String, String) - Method in class org.jwat.warc.WarcFieldParsers
-
Validates that the string is not null.
- parseUri(String, boolean, UriProfile, String) - Method in class org.jwat.warc.WarcFieldParsers
-
Returns an URI object holding the value of the specified string.
- parseVersion(ByteCountingPushBackInputStream) - Method in class org.jwat.warc.WarcHeader
-
Looks forward in the input stream for a valid WARC version line.
- parseWarcDigest(String) - Static method in class org.jwat.warc.WarcDigest
-
Parse and validate the format of a WARC digest header value.
- payload - Variable in class org.jwat.warc.WarcRecord
-
Payload object if any exists.
- payloadClosed() - Method in class org.jwat.warc.WarcRecord
-
Called when the payload object is closed and final steps in the
validation process can be performed.
- payloadDigestAlgorithm - Variable in class org.jwat.warc.WarcReader
-
Default payload digest algorithm to use if none is present in the
record.
- payloadDigestEncoding - Variable in class org.jwat.warc.WarcReader
-
Default encoding scheme used to encode payload digest into a string,
if none is detected from the record.
- payloadHeaderMaxSize - Variable in class org.jwat.warc.WarcReader
-
Max size allowed for a payload header.
- payloadWrittenTotal - Variable in class org.jwat.warc.WarcWriter
-
Total bytes written for current record payload.
- POLICY_IGNORE - Static variable in class org.jwat.warc.WarcConstants
-
Warc header can be ignored.
- POLICY_MANDATORY - Static variable in class org.jwat.warc.WarcConstants
-
Warc header is mandatory (equal to shall).
- POLICY_MAY - Static variable in class org.jwat.warc.WarcConstants
-
Warc header can be present.
- POLICY_MAY_NOT - Static variable in class org.jwat.warc.WarcConstants
-
Warc header should not be present.
- POLICY_SHALL - Static variable in class org.jwat.warc.WarcConstants
-
Warc header must be present.
- POLICY_SHALL_NOT - Static variable in class org.jwat.warc.WarcConstants
-
Warc header must not be present.
- processComputedDigest(WarcDigest, String, String, String) - Method in class org.jwat.warc.WarcRecord
-
Adjust algorithm and encoding information about computed block digest.
- processWarcDigest(WarcDigest, WarcDigest, String) - Method in class org.jwat.warc.WarcRecord
-
Auto-detect encoding used in WARC digest header and compare it to the
internal one, if it has been computed.
- PROFILE_IDENTICAL_PAYLOAD_DIGEST - Static variable in class org.jwat.warc.WarcConstants
-
Revisit WARC-Profile id for identical payload digest.
- PROFILE_IDX_IDENTICAL_PAYLOAD_DIGEST - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader id for identical payload digest profile.
- PROFILE_IDX_SERVER_NOT_MODIFIED - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader id for server not modified profile.
- PROFILE_IDX_UNKNOWN - Static variable in class org.jwat.warc.WarcConstants
-
Warc reader id for unknown profile.
- PROFILE_SERVER_NOT_MODIFIED - Static variable in class org.jwat.warc.WarcConstants
-
Revisit WARC-Profile id for server not modified.
- profileIdxMap - Static variable in class org.jwat.warc.WarcConstants
-
Profile lookup map used to identify WARC-Profile values.
- PUSHBACK_BUFFER_SIZE - Static variable in class org.jwat.warc.WarcReaderCompressed
-
Buffer size used by PushbackInputStream.
- PUSHBACK_BUFFER_SIZE - Static variable in class org.jwat.warc.WarcReaderFactory
-
Buffer size used by PushbackInputStream.
- PUSHBACK_BUFFER_SIZE - Static variable in class org.jwat.warc.WarcReaderUncompressed
-
Buffer size used by PushbackInputStream.
- S_HEADER_WRITTEN - Static variable in class org.jwat.warc.WarcWriter
-
State after header has been written.
- S_INIT - Static variable in class org.jwat.warc.WarcWriter
-
State after writer has been constructed and before records have been written.
- S_PAYLOAD_WRITTEN - Static variable in class org.jwat.warc.WarcWriter
-
State after payload has been written.
- S_RECORD_CLOSED - Static variable in class org.jwat.warc.WarcWriter
-
State after record has been closed.
- seen - Variable in class org.jwat.warc.WarcHeader
-
Array used for duplicate header detection.
- sequenceNr - Variable in class org.jwat.warc.WarcFileWriter
-
Current sequence number.
- setBlockDigestAlgorithm(String) - Method in class org.jwat.warc.WarcReader
-
Tries to set the default block digest algorithm and returns a boolean
indicating whether the algorithm was accepted or not.
- setBlockDigestEnabled(boolean) - Method in class org.jwat.warc.WarcReader
-
Set the readers block digest on/off status.
- setBlockDigestEncoding(String) - Method in class org.jwat.warc.WarcReader
-
Set the default block digest encoding scheme.
- setExceptionOnContentLengthMismatch(boolean) - Method in class org.jwat.warc.WarcWriter
-
Tell the writer what to do in case of mismatch between content-length
and amount payload written.
- setPayloadDigestAlgorithm(String) - Method in class org.jwat.warc.WarcReader
-
Tries to set the default payload digest algorithm and returns a boolean
indicating whether the algorithm was accepted or not.
- setPayloadDigestEnabled(boolean) - Method in class org.jwat.warc.WarcReader
-
Set the readers payload digest on/off status.
- setPayloadDigestEncoding(String) - Method in class org.jwat.warc.WarcReader
-
Set the default payload digest encoding scheme.
- setPayloadHeaderMaxSize(int) - Method in class org.jwat.warc.WarcReader
-
Set the max size allowed for a payload header.
- setRecordHeaderMaxSize(int) - Method in class org.jwat.warc.WarcReader
-
Set the max size allowed for a record header.
- setUriProfile(UriProfile) - Method in class org.jwat.warc.WarcReader
-
Set the URI profile used to validate URIs.
- setUriProfile(UriProfile) - Method in class org.jwat.warc.WarcWriter
-
Set the URI profile used to validate URIs.
- setWarcTargetUriProfile(UriProfile) - Method in class org.jwat.warc.WarcReader
-
Set the URI profile used to validate WARC-Target URIs.
- setWarcTargetUriProfile(UriProfile) - Method in class org.jwat.warc.WarcWriter
-
Set the URI profile used to validate WARC-Target URIs.
- startOffset - Variable in class org.jwat.warc.WarcHeader
-
WARC record starting offset relative to the source WARC file input
stream.
- startOffset - Variable in class org.jwat.warc.WarcReaderCompressed
-
Cached start offset used after the reader is closed.
- startOffset - Variable in class org.jwat.warc.WarcReaderUncompressed
-
Start offset of current or next valid record.
- startOffset - Variable in class org.jwat.warc.WarcRecord
-
WARC record parsing start offset relative to the source WARC file input
stream.
- state - Variable in class org.jwat.warc.WarcWriter
-
Current state of writer.
- stream_copy_buffer - Variable in class org.jwat.warc.WarcWriter
-
Buffer used by streamPayload() to copy from one stream to another.
- streamPayload(InputStream) - Method in class org.jwat.warc.WarcWriter
-
Stream the content of an input stream to the payload content.
- streamPayload(InputStream) - Method in class org.jwat.warc.WarcWriterCompressed
-
- supportMultipleFiles() - Method in interface org.jwat.warc.WarcFileNaming
-
Does this naming implementation support multiple files.
- supportMultipleFiles() - Method in class org.jwat.warc.WarcFileNamingDefault
-
- supportMultipleFiles() - Method in class org.jwat.warc.WarcFileNamingSingleFile
-
- WARC_DATE_FORMAT - Static variable in class org.jwat.warc.WarcConstants
-
WARC date format string as specified by the WARC ISO standard.
- WARC_DIGEST_FORMAT - Static variable in class org.jwat.warc.WarcConstants
-
WARC digest format string as specified by the WARC ISO standard.
- WARC_MAGIC_HEADER - Static variable in class org.jwat.warc.WarcConstants
-
A WARC header block starts with this string including trailing version
information.
- WARC_MIME_TYPE - Static variable in class org.jwat.warc.WarcConstants
-
WARC mime type.
- WARC_RECORD_TRAILING_NEWLINES - Static variable in class org.jwat.warc.WarcConstants
-
Trailing newlines after each record as per the WARC ISO standard.
- warcBlockDigest - Variable in class org.jwat.warc.WarcHeader
-
WARC-Block-Digest converted to a WarcDigest object, if valid.
- warcBlockDigestStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Block-Digest field string value.
- WarcConcurrentTo - Class in org.jwat.warc
-
Simple wrapper for a (non) validated WARC ConcurrentTo header.
- WarcConcurrentTo() - Constructor for class org.jwat.warc.WarcConcurrentTo
-
- warcConcurrentToList - Variable in class org.jwat.warc.WarcHeader
-
List of WARC-Concurrent-To field string values and converted URI objects, if valid.
- warcConcurrentToStr - Variable in class org.jwat.warc.WarcConcurrentTo
-
Warc-Concurrent-To string representation.
- warcConcurrentToUri - Variable in class org.jwat.warc.WarcConcurrentTo
-
Warc-Concurrent-To Uri object.
- WarcConstants - Class in org.jwat.warc
-
Class containing all relevant WARC constants and structures.
- WarcConstants() - Constructor for class org.jwat.warc.WarcConstants
-
This utility class does not require instantiation.
- warcDate - Variable in class org.jwat.warc.WarcHeader
-
WARC-Date converted to a Date object, if valid.
- warcDateFormat - Variable in class org.jwat.warc.WarcHeader
-
WARC DateFormat as specified by the WARC ISO standard.
- warcDateFormat - Variable in class org.jwat.warc.WarcWriter
-
WARC DateFormat as specified by the WARC ISO standard.
- WarcDateParser - Class in org.jwat.warc
-
WARC-Date parser and format validator.
- warcDateStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Date field string value.
- WarcDigest - Class in org.jwat.warc
-
This class represents the parsed and format validated information provided
from a WARC digest header value.
- WarcDigest() - Constructor for class org.jwat.warc.WarcDigest
-
Package level constructor.
- WarcDigest(String, String) - Constructor for class org.jwat.warc.WarcDigest
-
Construct an object with the supplied parameters.
- WarcFieldParsers - Class in org.jwat.warc
-
Separate class containing all the different types of field parser.
- WarcFieldParsers() - Constructor for class org.jwat.warc.WarcFieldParsers
-
- warcFileConfig - Variable in class org.jwat.warc.WarcFileWriter
-
Overall WARC file writer configuration.
- warcFilename - Variable in class org.jwat.warc.WarcHeader
-
WARC-Filename field string value.
- WarcFileNaming - Interface in org.jwat.warc
-
Implementations of this interface are used to name the WARC files written by the WarcFileWriter.
- warcFileNaming - Variable in class org.jwat.warc.WarcFileWriter
-
WARC file naming Configuration.
- WarcFileNamingDefault - Class in org.jwat.warc
-
Default WARC file naming implementation used for writing to multiple files.
- WarcFileNamingDefault(String, Date, String, String) - Constructor for class org.jwat.warc.WarcFileNamingDefault
-
Construct file naming instance.
- WarcFileNamingSingleFile - Class in org.jwat.warc
-
Simple WARC file naming implementation used for writing to a single file only.
- WarcFileNamingSingleFile(String) - Constructor for class org.jwat.warc.WarcFileNamingSingleFile
-
Construct a new instance with the filename to return.
- WarcFileNamingSingleFile(File) - Constructor for class org.jwat.warc.WarcFileNamingSingleFile
-
Construct a new instance with the file whose filename to return.
- WarcFileWriter - Class in org.jwat.warc
-
Simple WARC file writer wrapping some of the trivial code related to writing records.
- WarcFileWriter() - Constructor for class org.jwat.warc.WarcFileWriter
-
Constructor for internal and unit test use.
- WarcFileWriterConfig - Class in org.jwat.warc
-
General configuration of WarcFileWriter.
- WarcFileWriterConfig() - Constructor for class org.jwat.warc.WarcFileWriterConfig
-
Construct instance with largely default values, except the targetDir which is null.
- WarcFileWriterConfig(File, boolean, long, boolean) - Constructor for class org.jwat.warc.WarcFileWriterConfig
-
Construct an instance with custom values.
- WarcHeader - Class in org.jwat.warc
-
Central class for working with WARC headers.
- WarcHeader() - Constructor for class org.jwat.warc.WarcHeader
-
Non public constructor to allow unit testing.
- warcIdentifiedPayloadType - Variable in class org.jwat.warc.WarcHeader
-
WARC-Identified-Payload-Type converted to a ContentType object, if valid.
- warcIdentifiedPayloadTypeStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Identified-Payload-Type field string value.
- warcInetAddress - Variable in class org.jwat.warc.WarcHeader
-
WARC-IP-Address converted to an InetAddress object, if valid.
- warcinfoRecordId - Variable in class org.jwat.warc.WarcFileWriter
-
Generated WARC-Info-Record-ID for the current file.
- warcIpAddress - Variable in class org.jwat.warc.WarcHeader
-
WARC-IP-Address field string value.
- warcPayloadDigest - Variable in class org.jwat.warc.WarcHeader
-
WARC-Payload-Digest converted to a WarcDigest object, if valid.
- warcPayloadDigestStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Payload-Digest field string value.
- warcProfileIdx - Variable in class org.jwat.warc.WarcHeader
-
WARC-Profile converted to an integer id, if valid.
- warcProfileStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Profile field string value.
- warcProfileUri - Variable in class org.jwat.warc.WarcHeader
-
WARC-Profile field converted to an Uri object, if valid.
- WarcReader - Class in org.jwat.warc
-
Base class for WARC reader implementations.
- WarcReader() - Constructor for class org.jwat.warc.WarcReader
-
- WarcReaderCompressed - Class in org.jwat.warc
-
WARC Reader implementation for reading GZip compressed files.
- WarcReaderCompressed() - Constructor for class org.jwat.warc.WarcReaderCompressed
-
This constructor is used to get random access to records.
- WarcReaderCompressed(GzipReader) - Constructor for class org.jwat.warc.WarcReaderCompressed
-
Construct reader using the supplied input stream.
- WarcReaderCompressed(GzipReader, int) - Constructor for class org.jwat.warc.WarcReaderCompressed
-
Construct object using supplied GzipInputStream.
- WarcReaderFactory - Class in org.jwat.warc
-
Factory used for creating WarcReader instances.
- WarcReaderFactory() - Constructor for class org.jwat.warc.WarcReaderFactory
-
Private constructor to enforce factory methods.
- WarcReaderUncompressed - Class in org.jwat.warc
-
WARC Reader implementation for reading uncompressed files.
- WarcReaderUncompressed() - Constructor for class org.jwat.warc.WarcReaderUncompressed
-
This constructor is used to get random access to records.
- WarcReaderUncompressed(ByteCountingPushBackInputStream) - Constructor for class org.jwat.warc.WarcReaderUncompressed
-
Construct reader using the supplied input stream.
- WarcRecord - Class in org.jwat.warc
-
This class represents a parsed WARC record header block including
possible validation and format warnings/errors encountered in the process.
- WarcRecord() - Constructor for class org.jwat.warc.WarcRecord
-
Non public constructor to allow unit testing.
- warcRecordIdStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Record-Id field string value.
- warcRecordIdUri - Variable in class org.jwat.warc.WarcHeader
-
WARC-Record-Id converted to an Uri object, if valid.
- warcRefersToDate - Variable in class org.jwat.warc.WarcHeader
-
WARC-Date converted to a Date object, if valid.
- warcRefersToDateStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Refers-To-Date
- warcRefersToStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Refers-To field string value.
- warcRefersToTargetUriStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Refers-To-Target-URI field string value.
- warcRefersToTargetUriUri - Variable in class org.jwat.warc.WarcHeader
-
WARC-Refers-To-Target-URI converted to an Uri object, if valid.
- warcRefersToUri - Variable in class org.jwat.warc.WarcHeader
-
WARC-Refers-To converted to an Uri object, if valid.
- warcSegmentNumber - Variable in class org.jwat.warc.WarcHeader
-
WARC-Segment-Number converted to an Integer object, if valid.
- warcSegmentNumberStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Segment-Number field string value.
- warcSegmentOriginIdStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Segment-Origin-Id field string value.
- warcSegmentOriginIdUrl - Variable in class org.jwat.warc.WarcHeader
-
WARC-Segment-Origin-Id converted to an Uri object, if valid.
- warcSegmentTotalLength - Variable in class org.jwat.warc.WarcHeader
-
WARC-Segment-Total-Length converted to a Long object, if valid.
- warcSegmentTotalLengthStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Segment-Total-Length field string value.
- warcTargetUriProfile - Variable in class org.jwat.warc.WarcHeader
-
WARC-Target-URI profile.
- warcTargetUriProfile - Variable in class org.jwat.warc.WarcReader
-
WARC-Target-URI profile.
- warcTargetUriProfile - Variable in class org.jwat.warc.WarcWriter
-
WARC-Target-URI profile.
- warcTargetUriStr - Variable in class org.jwat.warc.WarcHeader
-
WARC_Target-URI field string value.
- warcTargetUriUri - Variable in class org.jwat.warc.WarcHeader
-
WARC-TargetURI converted to an Uri object, if valid.
- warcTruncatedIdx - Variable in class org.jwat.warc.WarcHeader
-
WARC-Truncated converted to an integer id, if valid.
- warcTruncatedStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Truncated field string value.
- warcTypeIdx - Variable in class org.jwat.warc.WarcHeader
-
WARC-Type converted to an integer id, if identified.
- warcTypeStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Type field string value.
- warcWarcinfoIdStr - Variable in class org.jwat.warc.WarcHeader
-
WARC-Warcinfo-Id field string value.
- warcWarcinfoIdUri - Variable in class org.jwat.warc.WarcHeader
-
WARC-Warcinfo-Id converted to an Uri object, if valid.
- WarcWriter - Class in org.jwat.warc
-
Base class for WARC writer implementations.
- WarcWriter() - Constructor for class org.jwat.warc.WarcWriter
-
- WarcWriterCompressed - Class in org.jwat.warc
-
WARC Writer implementation for writing GZip compressed files.
- WarcWriterFactory - Class in org.jwat.warc
-
Factory used for creating WarcWriter instances.
- WarcWriterFactory() - Constructor for class org.jwat.warc.WarcWriterFactory
-
Private constructor to enforce factory methods.
- WarcWriterUncompressed - Class in org.jwat.warc
-
WARC Writer implementation for writing uncompressed files.
- warnings - Variable in class org.jwat.warc.WarcReader
-
Aggregate number of warnings encountered while parsing.
- writeHeader(WarcRecord) - Method in class org.jwat.warc.WarcWriter
-
Write a WARC header to the WARC output stream.
- writeHeader(WarcRecord) - Method in class org.jwat.warc.WarcWriterCompressed
-
- writeHeader(WarcRecord) - Method in class org.jwat.warc.WarcWriterUncompressed
-
- writeHeader_impl(WarcRecord) - Method in class org.jwat.warc.WarcWriter
-
Write a WARC header to the WARC output stream.
- writePayload(byte[]) - Method in class org.jwat.warc.WarcWriter
-
Append the content of a byte array to the payload content.
- writePayload(byte[], int, int) - Method in class org.jwat.warc.WarcWriter
-
Append the partial content of a byte array to the payload content.
- writePayload(byte[]) - Method in class org.jwat.warc.WarcWriterCompressed
-
- writePayload(byte[], int, int) - Method in class org.jwat.warc.WarcWriterCompressed
-
- writer - Variable in class org.jwat.warc.WarcFileWriter
-
Current WARC writer.
- writer - Variable in class org.jwat.warc.WarcWriterCompressed
-
GZip Writer used.
- writer_raf - Variable in class org.jwat.warc.WarcFileWriter
-
Current random access file.
- writer_rafout - Variable in class org.jwat.warc.WarcFileWriter
-
Current random access output stream.
- writeRawHeader(byte[], Long) - Method in class org.jwat.warc.WarcWriter
-
Write a raw WARC header to the WARC output stream.
- writeRawHeader(byte[], Long) - Method in class org.jwat.warc.WarcWriterCompressed
-
- writerFile - Variable in class org.jwat.warc.WarcFileWriter
-
Current WARC file.