|
PDF Carver (pdfcarve) can be found on
GitHub.
You can clone the repository directly from github:
% git clone https:/github.com/rondilley/pdfcarve.git
|
What is pdfcarve?
PDFCarver is a command line PDF object decoder. It is used to enumerate PDF objects and extract streams.
Once built, just pass the name of a suspect PDF as an argument to pdfcarve and you are in business!
syntax: pdfcarve [options] {filename} [{filename} ...] -d|--debug (0-9) enable debugging info -h|--help this info -v|--version display version information -w|--write write streams to disk
|
The output of a run is summarized below in gory detail.
Comment: PDF-1.6 Comment: вгПУ Object: 15 0 Name: Linearized Integer: 1 Name: L Integer: 9168 Name: O Integer: 28 Name: E Integer: 3653 Name: N Integer: 1 Name: T Integer: 8850 Name: H Integer: 457 Integer: 182 Object: 22 0 Name: DecodeParms Name: Columns Integer: 4 Name: Predictor Integer: 12 Name: Filter Name: FlateDecode Name: ID HEX String: 582A3DB73C0EAB408F658D8613D1CACA 00000000 58 2a 3d b7 3c 0e ab 40 8f 65 8d 86 13 d1 ca ca X*=.<..@.e...... HEX String: 989BC77CF0D27C438C2D81FDE0BA3812 00000000 98 9b c7 7c f0 d2 7c 43 8c 2d 81 fd e0 ba 38 12 ...|..|C.-....8. Name: Index Integer: 25 Integer: 22 Name: Info ObjRef: 24 0 Name: Length Integer: 55 Name: Prev Integer: 8851 Name: Root ObjRef: 16 0 Name: Size Integer: 47 Name: Type Name: XRef Name: W Integer: 1 Integer: 2 Integer: 1 Stream: 27 bytes Name: Linearized Name: L Name: O Name: E Name: N Name: T Name: H Name: DecodeParms Name: Columns Name: Predictor Name: Filter Name: FlateDecode Name: ID Name: Index Name: Info Name: Length Name: Prev Name: Root Name: Size Name: Type Name: XRef Name: W <... snip ...>
|
Why use it?
I built this tool to help in analysis of a suspect PDF file. I
have used it many times to find the mechanism used by the bad guys to
execute code or compromize a system using malicious payloads. It
is also handy for extracting files from PDF files.
What is in the works?
I am working on improving the processing of dictionary objects and
adding some huristics including stream anomalies, object corruption,
object linkage and reference errors and object/document revision
disection.
|