public class WarcRecord extends Object implements PayloadOnClosedHandler, Closeable
| Modifier and Type | Field and Description |
|---|---|
protected boolean |
bClosed
Has record been closed before.
|
protected boolean |
bIsCompliant
Is this record compliant ie.
|
protected boolean |
bPayloadClosed
Has payload been closed before.
|
WarcDigest |
computedBlockDigest
Computed block digest.
|
WarcDigest |
computedPayloadDigest
Computed payload digest.
|
protected long |
consumed
Uncompressed bytes consumed while validating this record.
|
Diagnostics<Diagnosis> |
diagnostics
Validation errors and warnings.
|
WarcHeader |
header
WARC header.
|
protected HttpHeader |
httpHeader
HTTP header content parsed from payload.
|
protected ByteCountingPushBackInputStream |
in
Input stream used to read this record.
|
Boolean |
isValidBlockDigest
Is Warc-Block-Digest valid.
|
Boolean |
isValidPayloadDigest
Is Warc-Payload-Digest valid.
|
NewlineParser |
nlp
Newline parser for counting/validating trailing newlines.
|
protected Payload |
payload
Payload object if any exists.
|
protected WarcReader |
reader
Reader instance used, required for file compliance.
|
protected long |
startOffset
WARC record parsing start offset relative to the source WARC file input
stream.
|
int |
trailingNewlines
Number of trailing newlines after record.
|
| Modifier | Constructor and Description |
|---|---|
protected |
WarcRecord()
Non public constructor to allow unit testing.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
addErrorDiagnosis(DiagnosisType type,
String entity,
String... information)
Add an error diagnosis of the given type on a specific entity with
optional extra information.
|
void |
close()
Close resources associated with the WARC record.
|
static WarcRecord |
createRecord(WarcWriter writer)
Create a
WarcRecord and prepare it for writing. |
long |
getConsumed()
Return number of uncompressed bytes consumed validating this record.
|
HeaderLine |
getHeader(String field)
Get a non-standard WARC header or null, if nothing is stored for this
header name.
|
List<HeaderLine> |
getHeaderList()
Get a
List of all the non-standard WARC headers found
during parsing. |
HttpHeader |
getHttpHeader()
Returns the
HttpHeader object if identified in the payload,
or null. |
Payload |
getPayload()
Return Payload object.
|
InputStream |
getPayloadContent()
Payload content
InputStream getter. |
long |
getStartOffset()
Get the record offset relative to the start of the WARC file
InputStream. |
boolean |
hasPayload()
Specifies whether this record has a payload or not.
|
boolean |
isClosed()
Check to see if the record has been closed.
|
boolean |
isCompliant()
Returns a boolean indicating the ISO compliance status of this record.
|
static WarcRecord |
parseRecord(ByteCountingPushBackInputStream in,
WarcReader reader)
Given an
InputStream it tries to read and validate a WARC
header block. |
void |
payloadClosed()
Called when the payload object is closed and final steps in the
validation process can be performed.
|
protected void |
processComputedDigest(WarcDigest computedDigest,
String digestAlgorithm,
String digestEncoding,
String digestName)
Adjust algorithm and encoding information about computed block digest.
|
protected Boolean |
processWarcDigest(WarcDigest warcDigest,
WarcDigest computedDigest,
String digestName)
Auto-detect encoding used in WARC digest header and compare it to the
internal one, if it has been computed.
|
protected WarcReader reader
protected ByteCountingPushBackInputStream in
protected boolean bIsCompliant
protected long startOffset
protected long consumed
public final Diagnostics<Diagnosis> diagnostics
public NewlineParser nlp
public Boolean isValidBlockDigest
public Boolean isValidPayloadDigest
public int trailingNewlines
public WarcHeader header
protected boolean bPayloadClosed
protected boolean bClosed
protected HttpHeader httpHeader
public WarcDigest computedBlockDigest
public WarcDigest computedPayloadDigest
protected WarcRecord()
public static WarcRecord createRecord(WarcWriter writer)
WarcRecord and prepare it for writing.writer - writer which will be used to write the recordWarcRecord ready to be changed and then writtenpublic static WarcRecord parseRecord(ByteCountingPushBackInputStream in, WarcReader reader) throws IOException
InputStream it tries to read and validate a WARC
header block.in - InputStream containing WARC record datareader - WarcReader used, with access to user defined
optionsWarcRecord or nullIOException - i/o exception in the process of reading recordpublic void payloadClosed() throws IOException
payloadClosed in interface PayloadOnClosedHandlerIOException - i/o exception in final validation processingprotected Boolean processWarcDigest(WarcDigest warcDigest, WarcDigest computedDigest, String digestName)
warcDigest - digest from WARC headercomputedDigest - internally compute digestdigestName - used to identify the digest ("block" or "payload")protected void processComputedDigest(WarcDigest computedDigest, String digestAlgorithm, String digestEncoding, String digestName)
computedDigest - internally compute digestdigestAlgorithm - default algorithmdigestEncoding - default encodingdigestName - used to identify the digest ("block" or "payload")public boolean isClosed()
public void close() throws IOException
close in interface Closeableclose in interface AutoCloseableIOException - if unable to close resourcespublic boolean isCompliant()
public long getStartOffset()
InputStream.public long getConsumed()
public List<HeaderLine> getHeaderList()
List of all the non-standard WARC headers found
during parsing.List of HeaderLinepublic HeaderLine getHeader(String field)
field - header nameHeaderLine structure or nullpublic boolean hasPayload()
public Payload getPayload()
nullpublic InputStream getPayloadContent()
InputStream getter.InputStreampublic HttpHeader getHttpHeader()
HttpHeader object if identified in the payload,
or null.HttpHeader object if identified or nullprotected void addErrorDiagnosis(DiagnosisType type, String entity, String... information)
type - diagnosis typeentity - entity examinedinformation - optional extra informationCopyright © 2011–2015. All rights reserved.