public class WarcReaderFactory extends ReaderFactoryAbstract
WarcReader instances.
The general getReader methods will auto-detect Gzip'ed data
and return the appropriate WarcReader instances.
The other factory methods can be used to return specific
WarcReader instances for compressed or uncompressed records.
Readers are available for both sequential and random reading of records.
Use of buffered methods and/or buffering speeds up the reader considerably.| Modifier and Type | Field and Description |
|---|---|
static int |
PUSHBACK_BUFFER_SIZE
Buffer size used by
PushbackInputStream. |
ARC_MAGIC_HEADER, GZIP_MAGIC, WARC_MAGIC_HEADER| Modifier | Constructor and Description |
|---|---|
protected |
WarcReaderFactory()
Private constructor to enforce factory methods.
|
| Modifier and Type | Method and Description |
|---|---|
static WarcReader |
getReader(InputStream in)
Creates a new
WarcReader from an InputStream. |
static WarcReader |
getReader(InputStream in,
int buffer_size)
Creates a new
WarcReader from an InputStream
wrapped by a BufferedInputStream. |
static WarcReaderCompressed |
getReaderCompressed()
Creates a new
WarcReader without any associated
InputStream for random access to GZip compressed records. |
static WarcReaderCompressed |
getReaderCompressed(InputStream in)
Creates a new
WarcReader from an InputStream
primarily for random access to GZip compressed records. |
static WarcReaderCompressed |
getReaderCompressed(InputStream in,
int buffer_size)
Creates a new
WarcReader from an InputStream
wrapped by a BufferedInputStream primarily for random
access to GZip compressed records. |
static WarcReaderUncompressed |
getReaderUncompressed()
Creates a new
WarcReader without any associated
InputStream for random access to uncompressed records. |
static WarcReaderUncompressed |
getReaderUncompressed(InputStream in)
Creates a new
WarcReader from an InputStream
primarily for random access to uncompressed records. |
static WarcReaderUncompressed |
getReaderUncompressed(InputStream in,
int buffer_size)
Creates a new
WarcReader from an InputStream
wrapped by a BufferedInputStream primarily for random
access to uncompressed records. |
static boolean |
isWarcFile(ByteCountingPushBackInputStream pbin)
Check head of
PushBackInputStream for a WARC file identifier. |
static boolean |
isWarcRecord(ByteCountingPushBackInputStream pbin)
Check head of
PushBackInputStream for a WARC record identifier. |
isArcFile, isArcRecord, isGzippedpublic static final int PUSHBACK_BUFFER_SIZE
PushbackInputStream.protected WarcReaderFactory()
public static boolean isWarcFile(ByteCountingPushBackInputStream pbin) throws IOException
PushBackInputStream for a WARC file identifier.
The identifier for WARC files is "WARC/" in the beginning.pbin - PushBackInputStream with WARC recordsIOException - if an i/o error occurs while examining head of streampublic static boolean isWarcRecord(ByteCountingPushBackInputStream pbin) throws IOException
PushBackInputStream for a WARC record identifier.
The identifier for WARC records is "WARC/" in the beginning.pbin - PushBackInputStream with WARC recordsIOException - if an i/o error occurs while examining head of streampublic static WarcReader getReader(InputStream in, int buffer_size) throws IOException
WarcReader from an InputStream
wrapped by a BufferedInputStream.
The WarcReader implementation returned is chosen based on
GZip auto detection.in - WARC File represented as InputStreambuffer_size - buffer size to useWarcReader based on data read from
InputStreamIOException - if an i/o exception occurs during initializationpublic static WarcReader getReader(InputStream in) throws IOException
WarcReader from an InputStream.
The WarcReader implementation returned is chosen based on
GZip auto detection.in - WARC File represented as InputStreamWarcReader based on data read from
InputStreamIOException - if an i/o exception occurs during initializationpublic static WarcReaderUncompressed getReaderUncompressed()
WarcReader without any associated
InputStream for random access to uncompressed records.WarcReader for uncompressed records read from
InputStreampublic static WarcReaderUncompressed getReaderUncompressed(InputStream in) throws IOException
WarcReader from an InputStream
primarily for random access to uncompressed records.in - WARC File represented as InputStreamWarcReader for uncompressed records read from
InputStreamIOException - i/o exception while initializing readerpublic static WarcReaderUncompressed getReaderUncompressed(InputStream in, int buffer_size) throws IOException
WarcReader from an InputStream
wrapped by a BufferedInputStream primarily for random
access to uncompressed records.in - WARC File represented as InputStreambuffer_size - buffer size to useWarcReader for uncompressed records read from
InputStreamIOException - i/o exception while initializing readerpublic static WarcReaderCompressed getReaderCompressed()
WarcReader without any associated
InputStream for random access to GZip compressed records.WarcReader for GZip compressed records read from
InputStreampublic static WarcReaderCompressed getReaderCompressed(InputStream in) throws IOException
WarcReader from an InputStream
primarily for random access to GZip compressed records.in - WARC File represented as InputStreamWarcReader for GZip compressed records read from
InputStreamIOException - i/o exception while initializing readerpublic static WarcReaderCompressed getReaderCompressed(InputStream in, int buffer_size) throws IOException
WarcReader from an InputStream
wrapped by a BufferedInputStream primarily for random
access to GZip compressed records.in - WARC File represented as InputStreambuffer_size - buffer size to useWarcReader for GZip compressed records read from
InputStreamIOException - i/o exception while initializing readerCopyright © 2011–2015. All rights reserved.