public class WarcReaderCompressed extends WarcReader
| Modifier and Type | Field and Description |
|---|---|
protected int |
bufferSize
Buffer size, if any, to use on GZip entry
InputStream. |
protected GzipEntry |
currentEntry
GZip entry for the current record, if random access methods used.
|
protected GzipReader |
currentReader
GZip reader used for the current record, if random access methods used.
|
static int |
PUSHBACK_BUFFER_SIZE
Buffer size used by
PushbackInputStream. |
protected GzipReader |
reader
WARC file
InputStream. |
protected long |
startOffset
Cached start offset used after the reader is closed.
|
bBlockDigest, bIsCompliant, blockDigestAlgorithm, blockDigestEncoding, bPayloadDigest, consumed, currentRecord, diagnostics, errors, fieldParsers, headerLineReader, iteratorExceptionThrown, lineReader, payloadDigestAlgorithm, payloadDigestEncoding, payloadHeaderMaxSize, recordHeaderMaxSize, records, uriProfile, warcTargetUriProfile, warnings| Constructor and Description |
|---|
WarcReaderCompressed()
This constructor is used to get random access to records.
|
WarcReaderCompressed(GzipReader reader)
Construct reader using the supplied input stream.
|
WarcReaderCompressed(GzipReader reader,
int buffer_size)
Construct object using supplied
GzipInputStream. |
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Close current record resource(s) and input stream(s).
|
long |
getConsumed()
Get number of bytes consumed by the WARC
GzipReader. |
WarcRecord |
getNextRecord()
Parses and gets the next record.
|
WarcRecord |
getNextRecordFrom(InputStream rin,
long offset)
Parses and gets the next record from an
Inputstream. |
WarcRecord |
getNextRecordFrom(InputStream rin,
long offset,
int buffer_size)
Parses and gets the next record from an
Inputstream wrapped
by a BufferedInputStream. |
long |
getOffset()
Get the current offset in the WARC
GzipReader. |
long |
getStartOffset()
Get the offset of the current WARC record from the GZip entry or -1 if
no records have been read yet.
|
boolean |
isCompressed()
Is this reader assuming GZip compressed input.
|
protected void |
recordClosed()
Callback method called when the payload has been processed.
|
getBlockDigestAlgorithm, getBlockDigestEnabled, getBlockDigestEncoding, getIteratorExceptionThrown, getPayloadDigestAlgorithm, getPayloadDigestEnabled, getPayloadDigestEncoding, getPayloadHeaderMaxSize, getRecordHeaderMaxSize, getUriProfile, getWarcTargetUriProfile, init, isCompliant, iterator, reset, setBlockDigestAlgorithm, setBlockDigestEnabled, setBlockDigestEncoding, setPayloadDigestAlgorithm, setPayloadDigestEnabled, setPayloadDigestEncoding, setPayloadHeaderMaxSize, setRecordHeaderMaxSize, setUriProfile, setWarcTargetUriProfilepublic static final int PUSHBACK_BUFFER_SIZE
PushbackInputStream.protected GzipReader reader
InputStream.protected int bufferSize
InputStream.protected GzipReader currentReader
protected GzipEntry currentEntry
protected long startOffset
public WarcReaderCompressed()
public WarcReaderCompressed(GzipReader reader)
reader - GZip readerpublic WarcReaderCompressed(GzipReader reader, int buffer_size)
GzipInputStream.
This method is primarily for sequential access to records.reader - GZip readerbuffer_size - buffer size used on entriespublic boolean isCompressed()
WarcReaderisCompressed in class WarcReaderpublic void close()
WarcReaderclose in interface Closeableclose in interface AutoCloseableclose in class WarcReaderprotected void recordClosed()
WarcReaderrecordClosed in class WarcReaderpublic long getStartOffset()
getStartOffset in class WarcReaderpublic long getOffset()
GzipReader.getOffset in class WarcReaderInputStreampublic long getConsumed()
GzipReader.getConsumed in class WarcReaderGzipReaderpublic WarcRecord getNextRecord() throws IOException
WarcReadergetNextRecord in class WarcReaderIOException - i/o exception in parsing processpublic WarcRecord getNextRecordFrom(InputStream rin, long offset) throws IOException
WarcReaderInputstream.
This method is mainly for random access use since there are serious
side-effects involved in using multiple PushBackInputStream
instances.getNextRecordFrom in class WarcReaderrin - InputStream used to read next recordoffset - offset provided by callerIOException - i/o exception in parsing processpublic WarcRecord getNextRecordFrom(InputStream rin, long offset, int buffer_size) throws IOException
WarcReaderInputstream wrapped
by a BufferedInputStream.
This method is mainly for random access use since there are serious
side-effects involved in using multiple PushBackInputStream
instances.getNextRecordFrom in class WarcReaderrin - InputStream used to read next recordoffset - offset provided by callerbuffer_size - buffer size to useIOException - i/o exception in parsing processCopyright © 2011–2015. All rights reserved.