cpython歷史漏洞分析及其fuzzer編寫

2021-03-02 安全客
主要歷史漏洞來源於cpython hackerone
這篇文章首先分析三個cpython歷史漏洞，在我們簡單熟悉了cpython的源碼結構以後，再來編寫一個fuzzer，其實算是添加fuzzerInteger overflow in _json_encode_unicode
kali x86GNU gdb (Debian 9.2-1) 9.2gcc (Debian 9.3.0-13) 9.3.0
➜  cpython git:(master) git log 
commit bdaeb7d237462a629e6c85001317faa85f94a0c6Author: Victor Stinner <victor.stinner@gmail.com>Date:   Mon Oct 16 08:44:31 2017 -0700
    bpo-31773: _PyTime_GetPerfCounter() uses _PyTime_t (GH-3983)
    * Rewrite win_perf_counter() to only use integers internally.    * Add _PyTime_MulDiv() which compute "ticks * mul / div"      in two parts (int part and remaining) to prevent integer overflow.    * Clock frequency is checked at initialization for integer overflow.    * Enhance also pymonotonic() to reduce the precision loss on macOS      (mach_absolute_time() clock).
commit 7b78d4364da086baf77202e6e9f6839128a366ffAuthor: Benjamin Peterson <benjamin@python.org>Date:   Sat Jun 27 15:01:51 2015 -0500
    prevent integer overflow in escape_unicode (closes 
➜  cpython git:(master) git checkout -f 7b78d4364da086baf77202e6e9f6839128a366ff➜  cpython git:(7b78d4364d) git log
commit 7b78d4364da086baf77202e6e9f6839128a366ff (HEAD)Author: Benjamin Peterson <benjamin@python.org>Date:   Sat Jun 27 15:01:51 2015 -0500
    prevent integer overflow in escape_unicode (closes 
commit 758d60baaa3c041d0982c84d514719ab197bd6ed //  未修復Merge: 7763c68dcd acac1e0e3bAuthor: Benjamin Peterson <benjamin@python.org>Date:   Sat Jun 27 14:26:21 2015 -0500
    merge 3.4
commit acac1e0e3bf564fbad2107d8f50d7e9c42e5ef22Merge: ff0f322edb dac3ab84c7Author: Benjamin Peterson <benjamin@python.org>Date:   Sat Jun 27 14:26:15 2015 -0500
    merge 3.3
commit dac3ab84c73eb99265f0cf4863897c8e8302dbfdAuthor: Benjamin Peterson <benjamin@python.org>Date:   Sat Jun 27 14:25:50 2015 -0500...➜  cpython git:(7b78d4364d) git checkout -f 758d60baaa3c041d0982c84d514719ab197bd6edPrevious HEAD position was 7b78d4364d prevent integer overflow in escape_unicode (closes HEAD is now at 758d60baaa merge 3.4
確定漏洞復現commit: 758d60baaa3c041d0982c84d514719ab197bd6ed
使用gcc編譯該commit代碼
➜  cpython git:(7b78d4364d) export ASAN_OPTIONS=exitcode=0 ➜  cpython git:(7b78d4364d) CC="gcc -g -fsanitize=address" ./configure --disable-ipv6➜  cpython git:(7b78d4364d) make➜  cpython git:(758d60baaa) ./python --versionPython 3.5.0b2+
import json
sp = "x13"*715827883 json.dumps([sp], ensure_ascii=False)
(gdb) b Modules/_json.c:265No source file named Modules/_json.c.Make breakpoint pending on future shared library load? (y or [n]) yBreakpoint 1 (Modules/_json.c:265) pending.(gdb) r poc.pyStarting program: /root/cpython/python poc.py[Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Breakpoint 1, escape_unicode (pystr=0x85c54800) at /root/cpython/Modules/_json.c:265265        rval = PyUnicode_New(output_size, maxchar);(gdb) p output_size$1 = <optimized out>(gdb) cContinuing.
Program received signal SIGSEGV, Segmentation fault.0xb6028131 in escape_unicode (pystr=0x85c54800) at /root/cpython/Modules/_json.c:302302            ENCODE_OUTPUT;
可以發現程序確實是崩潰了，但是我們沒有看到output_size的值，為了觀察其值，我們將Makefile中的-O3優化改為-O0,重新編譯，再次使用gdb調試
(gdb) b Modules/_json.c:265No source file named Modules/_json.c.Make breakpoint pending on future shared library load? (y or [n]) yBreakpoint 1 (Modules/_json.c:265) pending.(gdb) r poc.pyStarting program: /root/cpython/python poc.py[Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Breakpoint 1, escape_unicode (pystr=0x85c54800) at /root/cpython/Modules/_json.c:265265        rval = PyUnicode_New(output_size, maxchar);
(gdb) p input_chars$1 = 715827883
(gdb) p output_size$2 = 4 <== 整數溢出
來分析一下溢出原因，溢出出現在_json.c:escape_unicode函數中
maxchar = PyUnicode_MAX_CHAR_VALUE(pystr);input_chars = PyUnicode_GET_LENGTH(pystr);input = PyUnicode_DATA(pystr);kind = PyUnicode_KIND(pystr);
/* Compute the output size */for (i = 0, output_size = 2; i < input_chars; i++) {    Py_UCS4 c = PyUnicode_READ(kind, input, i);    switch (c) {    case '\': case '"': case 'b': case 'f':    case 'n': case 'r': case 't':        output_size += 2;        break;    default:        if (c <= 0x1f)            output_size += 6; // 溢出，最後始終沒有檢測output_size的值，直接帶入下面的New        else            output_size++;    }}
rval = PyUnicode_New(output_size, maxchar);
maxchar = PyUnicode_MAX_CHAR_VALUE(pystr);input_chars = PyUnicode_GET_LENGTH(pystr);input = PyUnicode_DATA(pystr);kind = PyUnicode_KIND(pystr);
/* Compute the output size */for (i = 0, output_size = 2; i < input_chars; i++) {    Py_UCS4 c = PyUnicode_READ(kind, input, i);    Py_ssize_t d;    switch (c) {    case '\': case '"': case 'b': case 'f':    case 'n': case 'r': case 't':        d = 2;        break;    default:        if (c <= 0x1f)            d = 6;        else            d = 1;    }    if (output_size > PY_SSIZE_T_MAX - d) { // 每次都需要做溢出判斷        PyErr_SetString(PyExc_OverflowError, "string is too long to escape");        return NULL;    }    output_size += d;}
rval = PyUnicode_New(output_size, maxchar);
Integer overflow in _pickle.c漏洞官方issue
利用上面的方法找到最近的未修復commit:614bfcc953141cfdd38606f87a09d39f17367fa3
import picklepickle.loads(b'I1nrx00x00x00x20x2e')
編譯之後直接利用gdb調試poc(編譯不使用-fsanitize選項)
(gdb) r poc.pyStarting program: /root/cpython/python poc.py[Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.0xb7875252 in _Unpickler_ResizeMemoList (self=0xb789c2fc, new_size=1073741824) at /root/cpython/Modules/_pickle.c:10691069            self->memo[i] = NULL;(gdb) bt#0  0xb7875252 in _Unpickler_ResizeMemoList (self=0xb789c2fc, new_size=1073741824) at /root/cpython/Modules/_pickle.c:1069#1  0xb78752da in _Unpickler_MemoPut (self=0xb789c2fc, idx=536870912, value=0x664540 <small_ints+96>) at /root/cpython/Modules/_pickle.c:1092#2  0xb787d75e in load_long_binput (self=0xb789c2fc) at /root/cpython/Modules/_pickle.c:5028#3  0xb787e6bd in load (self=0xb789c2fc) at /root/cpython/Modules/_pickle.c:5409#4  0xb78802e4 in pickle_loads (self=0xb78cb50c, args=0xb7931eac, kwds=0x0) at /root/cpython/Modules/_pickle.c:6336#5  0x00569701 in PyCFunction_Call (func=0xb789d92c, arg=0xb7931eac, kw=0x0) at Objects/methodobject.c:84#6  0x0048f744 in call_function (pp_stack=0xbfffeb80, oparg=1) at Python/ceval.c:4066#7  0x0048b279 in PyEval_EvalFrameEx (f=0xb79b584c, throwflag=0) at Python/ceval.c:2679#8  0x0048dc95 in PyEval_EvalCodeEx (_co=0xb79355c0, globals=0xb797666c, locals=0xb797666c, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0,    kwdefs=0x0, closure=0x0) at Python/ceval.c:3436#9  0x00482287 in PyEval_EvalCode (co=0xb79355c0, globals=0xb797666c, locals=0xb797666c) at Python/ceval.c:771#10 0x004b464a in run_mod (mod=0x701b50, filename=0xb799bd98 "poc.py", globals=0xb797666c, locals=0xb797666c, flags=0xbffff478, arena=0x6aab10)    at Python/pythonrun.c:1996#11 0x004b44ba in PyRun_FileExFlags (fp=0x6f3e80, filename=0xb799bd98 "poc.py", start=257, globals=0xb797666c, locals=0xb797666c, closeit=1,    flags=0xbffff478) at Python/pythonrun.c:1952#12 0x004b3048 in PyRun_SimpleFileExFlags (fp=0x6f3e80, filename=0xb799bd98 "poc.py", closeit=1, flags=0xbffff478) at Python/pythonrun.c:1452#13 0x004b251c in PyRun_AnyFileExFlags (fp=0x6f3e80, filename=0xb799bd98 "poc.py", closeit=1, flags=0xbffff478) at Python/pythonrun.c:1174#14 0x004ccdc2 in run_file (fp=0x6f3e80, filename=0x6697d0 L"poc.py", p_cf=0xbffff478) at Modules/main.c:307#15 0x004cd8e0 in Py_Main (argc=2, argv=0x6661a0) at Modules/main.c:744#16 0x0042569a in main (argc=2, argv=0xbffff5d4) at ./Modules/python.c:62
(gdb) x/10x self->memo0x6af900:    0x00000000    0x00000000    0x00000000    0x000000810x6af910:    0x006d2da8    0xb7e8e778    0x00000000    0x000000000x6af920:    0x00000000    0x00000000
(gdb) x/10x self->memo+i0x73d000:    Cannot access memory at address 0x73d000
(gdb) p new_size$3 = 1073741824
(gdb) p/x new_size$4 = 0x40000000
(gdb) p PY_SSIZE_T_MAXNo symbol "PY_SSIZE_T_MAX" in current context.
(gdb) p new_size * sizeof(PyObject *)$5 = 0 <== 溢出
(gdb) p sizeof(PyObject *)$6 = 4
(gdb) p memo$7 = (PyObject **) 0x6af900
(gdb) p *memo$8 = (PyObject *) 0x0
(gdb) p self->memo_size$9 = 32
可以發現由於整數溢出，已經導致了一個越界寫的漏洞。
根據其調用棧，我們來一步一步分析其溢出的原因
來看一下最後出錯函數
static int_Unpickler_ResizeMemoList(UnpicklerObject *self, Py_ssize_t new_size){    Py_ssize_t i;    PyObject **memo;
    assert(new_size > self->memo_size);
    memo = PyMem_REALLOC(self->memo, new_size * sizeof(PyObject *));    if (memo == NULL) {        PyErr_NoMemory();        return -1;    }    self->memo = memo;    for (i = self->memo_size; i < new_size; i++)        self->memo[i] = NULL;    self->memo_size = new_size;    return 0;}
根據gdb調試顯示，由於溢出導致new_size * sizeof(PyObject *)數值為0，當其為0時傳入
#define PyMem_REALLOC(p, n)    ((size_t)(n) > (size_t)PY_SSIZE_T_MAX  ? NULL                : realloc((p), (n) ? (n) : 1))
也就是realloc(p, 1)，執行成功，接下來就會造成越界寫
static int_Unpickler_MemoPut(UnpicklerObject *self, Py_ssize_t idx, PyObject *value){    PyObject *old_item;
    if (idx >= self->memo_size) {         if (_Unpickler_ResizeMemoList(self, idx * 2) < 0)            return -1;        assert(idx < self->memo_size);    }    Py_INCREF(value);    old_item = self->memo[idx];    self->memo[idx] = value;    Py_XDECREF(old_item);    return 0;}
static intload_long_binput(UnpicklerObject *self){    PyObject *value;    Py_ssize_t idx;    char *s;
    if (_Unpickler_Read(self, &s, 4) < 0)        return -1;
    if (Py_SIZE(self->stack) <= 0)        return stack_underflow();    value = self->stack->data[Py_SIZE(self->stack) - 1];
    idx = calc_binsize(s, 4);    if (idx < 0) {        PyErr_SetString(PyExc_ValueError,                        "negative LONG_BINPUT argument");        return -1;    }
    return _Unpickler_MemoPut(self, idx, value);}
static Py_ssize_tcalc_binsize(char *bytes, int size){    unsigned char *s = (unsigned char *)bytes;    size_t x = 0;
    assert(size == 4);
    x =  (size_t) s[0];    x |= (size_t) s[1] << 8;    x |= (size_t) s[2] << 16;    x |= (size_t) s[3] << 24;
    if (x > PY_SSIZE_T_MAX)        return -1;    else        return (Py_ssize_t) x;}
其最終來源於我們的輸入值，所以通過修改我們的輸入值，可以成功導致基於堆的越界寫
#define PyMem_RESIZE(p, type, n)  ( (p) = ((size_t)(n) > PY_SSIZE_T_MAX / sizeof(type)) ? NULL :    (type *) PyMem_REALLOC((p), (n) * sizeof(type)) 
int and float constructing from non NUL-terminated buffer找到未修復commit:9ad0aae6566311c6982a20955381cda5a2954519
官方issues這個issue我找到了commit，搭建了環境，但是沒有復現成功，最主要的是，對我們尋找fuzz方面沒有太大幫助，但是對我們理解字符串轉換的危害還是很有幫助的，所以我們從原理上來跟一下源碼
那就通過issue中提到的代碼，從理論上來復現一下
import arrayfloat(array.array("B",b"A"*0x10))
STACK_TEXT:0080f328 651ac6e9 ffffffff 000000c8 00000000 python35!unicode_fromformat_write_cstr+0x100080f384 651ac955 0080f39c 090a2fe8 65321778 python35!unicode_fromformat_arg+0x4090080f3d8 651f1a1a 65321778 0080f404 090a2fe8 python35!PyUnicode_FromFormatV+0x650080f3f4 652070a9 6536bd38 65321778 090a2fe8 python35!PyErr_Format+0x1a0080f42c 6516be70 090a2fe8 0080f484 00000000 python35!PyOS_string_to_double+0xa90080f4f4 6514808b 06116b00 6536d658 6536d658 python35!PyFloat_FromString+0x1000080f554 6516e6e2 06116b00 06116b00 06116b00 python35!PyNumber_Float+0xcb...
直接看代碼，首先是floatobject.c中的PyFloat_FromString
PyObject *PyFloat_FromString(PyObject *v){    const char *s, *last, *end;    double x;    PyObject *s_buffer = NULL;    Py_ssize_t len;    Py_buffer view = {NULL, NULL};    PyObject *result = NULL;
    if (PyUnicode_Check(v)) {        s_buffer = _PyUnicode_TransformDecimalAndSpaceToASCII(v);        if (s_buffer == NULL)            return NULL;        s = PyUnicode_AsUTF8AndSize(s_buffer, &len);        if (s == NULL) {            Py_DECREF(s_buffer);            return NULL;        }    }    else if (PyObject_GetBuffer(v, &view, PyBUF_SIMPLE) == 0) {        s = (const char *)view.buf;    <<<<< 確定s的數據        len = view.len;    }    else {        PyErr_Format(PyExc_TypeError,            "float() argument must be a string or a number, not '%.200s'",            Py_TYPE(v)->tp_name);        return NULL;    }    last = s + len;        while (s < last && Py_ISSPACE(*s))        s++;    while (s < last - 1 && Py_ISSPACE(last[-1]))        last--;        x = PyOS_string_to_double(s, (char **)&end, NULL);    ...}
if (errno == ENOMEM) {        PyErr_NoMemory();        fail_pos = (char *)s;    }else if (!endptr && (fail_pos == s || *fail_pos != ''))    PyErr_Format(PyExc_ValueError,                    "could not convert string to float: "                    "%.200s", s);else if (fail_pos == s)    PyErr_Format(PyExc_ValueError,                    "could not convert string to float: "                    "%.200s", s);else if (errno == ERANGE && fabs(x) >= 1.0 && overflow_exception)    PyErr_Format(overflow_exception,                    "value too large to convert to float: "                    "%.200s", s);else    result = x;
PyObject *PyErr_Format(PyObject *exception, const char *format, ...){    va_list vargs;    PyObject* string;
#ifdef HAVE_STDARG_PROTOTYPES    va_start(vargs, format);#else    va_start(vargs);#endif
#ifdef Py_DEBUG        PyErr_Clear();#endif
    string = PyUnicode_FromFormatV(format, vargs);    PyErr_SetObject(exception, string);    Py_XDECREF(string);    va_end(vargs);    return NULL;}
繼續跟進PyUnicode_FromFormatV
yObject *PyUnicode_FromFormatV(const char *format, va_list vargs){    va_list vargs2;    const char *f;    _PyUnicodeWriter writer;
    _PyUnicodeWriter_Init(&writer);    writer.min_length = strlen(format) + 100;    writer.overallocate = 1;
        Py_VA_COPY(vargs2, vargs);
    for (f = format; *f; ) {        if (*f == '%') {            f = unicode_fromformat_arg(&writer, f, &vargs2);            if (f == NULL)                goto fail;        }    ...
根據調用棧跟進unicode_fromformat_arg
由於format是由%s構成，所以我們只看s部分
unicode_fromformat_arg
...case 's':    {                const char *s = va_arg(*vargs, const char*);        if (unicode_fromformat_write_cstr(writer, s, width, precision) < 0)            return NULL;        break;    }...
利用va_arg直接讀取了參數，並將指針s指向該地址，繼續跟進unicode_fromformat_write_cstr
static intunicode_fromformat_write_cstr(_PyUnicodeWriter *writer, const char *str,                              Py_ssize_t width, Py_ssize_t precision){        Py_ssize_t length;    PyObject *unicode;    int res;
    length = strlen(str);    if (precision != -1)        length = Py_MIN(length, precision);    unicode = PyUnicode_DecodeUTF8Stateful(str, length, "replace", NULL);    if (unicode == NULL)        return -1;
    res = unicode_fromformat_write_str(writer, unicode, width, -1);    Py_DECREF(unicode);    return res;}
直接利用strlen計算上面的參數長度，如果str不是一個以結尾的字符串，那麼接下來利用長度訪問該地址的數據將會出現越界讀寫的問題該漏洞主要原因來源於floatobject.c中的代碼，%s的數據由強制轉換而來
else if (PyObject_GetBuffer(v, &view, PyBUF_SIMPLE) == 0) {        s = (const char *)view.buf;    <<<<< 強制轉換        len = view.len;    }
提醒我們，在做強制轉換時，要注意檢查是否可以轉換，轉換後會不會造成漏洞上文我們已經分析完cpython的三個漏洞了，對cpython有了一定的了解，那麼我們就開始編寫cpython的fuzzer代碼。
在編寫前，我們來看看cpython自己有沒有fuzz測試模塊，簡單搜索一下，發現在Modules/_xxtestfuzz/目錄下存在fuzz代碼，這就好辦了，我們直接在此基礎上添加我們想要測試的模塊的fuzz代碼就行首先閱讀一下fuzz.c大概的代碼邏輯就會發現，如果想要添加模塊的fuzz代碼，還是很簡單的
主要需要修改的就兩個部分，拿struck.unpack來舉例子
PyObject* struct_unpack_method = NULL;PyObject* struct_error = NULL;static int init_struct_unpack() {        PyObject* struct_module = PyImport_ImportModule("struct");     if (struct_module == NULL) {        return 0;    }    struct_error = PyObject_GetAttrString(struct_module, "error");     if (struct_error == NULL) {        return 0;    }    struct_unpack_method = PyObject_GetAttrString(struct_module, "unpack");     return struct_unpack_method != NULL;}
第二步，調用需要fuzz的函數，並過濾一些不必要的錯誤
static int fuzz_struct_unpack(const char* data, size_t size) {        const char* first_null = memchr(data, '', size);    if (first_null == NULL) {        return 0;    }
    size_t format_length = first_null - data;    size_t buffer_length = size - format_length - 1;
    PyObject* pattern = PyBytes_FromStringAndSize(data, format_length);    if (pattern == NULL) {        return 0;    }    PyObject* buffer = PyBytes_FromStringAndSize(first_null + 1, buffer_length);    if (buffer == NULL) {        Py_DECREF(pattern);        return 0;    }
    PyObject* unpacked = PyObject_CallFunctionObjArgs(        struct_unpack_method, pattern, buffer, NULL);         if (unpacked == NULL && PyErr_ExceptionMatches(PyExc_OverflowError)) {         PyErr_Clear();    }        if (unpacked == NULL && PyErr_ExceptionMatches(PyExc_SystemError)) {        PyErr_Clear();    }        if (unpacked == NULL && PyErr_ExceptionMatches(struct_error)) {        PyErr_Clear();    }
    Py_XDECREF(unpacked);    Py_DECREF(pattern);    Py_DECREF(buffer);    return 0;}
#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_struct_unpack)    static int STRUCT_UNPACK_INITIALIZED = 0;    if (!STRUCT_UNPACK_INITIALIZED && !init_struct_unpack()) {        PyErr_Print();        abort();    } else {        STRUCT_UNPACK_INITIALIZED = 1;    }    rv |= _run_fuzz(data, size, fuzz_struct_unpack);#endif
這裡其實比較麻煩的是過濾錯誤信息，因為你不一定能知道你要fuzz的模塊的所有錯誤信息，很有可能過濾不全，在fuzz的時候會出錯，導致需要重新添加過濾條件，再重新開啟fuzz，整個過程，我也沒有很好的辦法，就是不停的試錯，最後把無關的錯誤信息都過濾，下面就會遇到這樣的問題我們上面分析的第一個漏洞json已經存在fuzz模塊了，那麼我們就添加第二個pickle模塊的fuzz代碼
PyObject* pickle_loads_method = NULL;
static int init_pickle_loads() {        PyObject* pickle_module = PyImport_ImportModule("pickle");    if (pickle_module == NULL) {        return 0;    }    pickle_loads_method = PyObject_GetAttrString(pickle_module, "loads");    return pickle_loads_method != NULL;}
pickle本身的錯誤對象，我們需要到_pickle.c裡面去找，在該文件的最後我們找到了添加錯誤對象的代碼
PyMODINIT_FUNCPyInit__pickle(void){    PyObject *m;    PickleState *st;
    m = PyState_FindModule(&_picklemodule);    if (m) {        Py_INCREF(m);        return m;    }
    if (PyType_Ready(&Pdata_Type) < 0)        return NULL;    if (PyType_Ready(&PicklerMemoProxyType) < 0)        return NULL;    if (PyType_Ready(&UnpicklerMemoProxyType) < 0)        return NULL;
        m = PyModule_Create(&_picklemodule);    if (m == NULL)        return NULL;
        if (PyModule_AddType(m, &Pickler_Type) < 0) {        return NULL;    }    if (PyModule_AddType(m, &Unpickler_Type) < 0) {        return NULL;    }    if (PyModule_AddType(m, &PyPickleBuffer_Type) < 0) {        return NULL;    }
    st = _Pickle_GetState(m);
        st->PickleError = PyErr_NewException("_pickle.PickleError", NULL, NULL);     if (st->PickleError == NULL)        return NULL;    st->PicklingError =        PyErr_NewException("_pickle.PicklingError", st->PickleError, NULL)      if (st->PicklingError == NULL)        return NULL;    st->UnpicklingError =        PyErr_NewException("_pickle.UnpicklingError", st->PickleError, NULL);     if (st->UnpicklingError == NULL)        return NULL;
    Py_INCREF(st->PickleError);    if (PyModule_AddObject(m, "PickleError", st->PickleError) < 0)        return NULL;    Py_INCREF(st->PicklingError);    if (PyModule_AddObject(m, "PicklingError", st->PicklingError) < 0)        return NULL;    Py_INCREF(st->UnpicklingError);    if (PyModule_AddObject(m, "UnpicklingError", st->UnpicklingError) < 0)        return NULL;
    if (_Pickle_InitState(st) < 0)        return NULL;    return m;}
PyObject* pickle_loads_method = NULL;PyObject* pickle_error = NULL;PyObject* pickling_error = NULL;PyObject* unpickling_error = NULL;
static int init_pickle_loads() {        PyObject* pickle_module = PyImport_ImportModule("pickle");    if (pickle_module == NULL) {        return 0;    }        pickle_error = PyObject_GetAttrString(pickle_module, "PickleError");    if (pickle_error == NULL) {        return 0;    }    pickling_error = PyObject_GetAttrString(pickle_module, "PicklingError");    if (pickling_error == NULL) {        return 0;    }    unpickling_error = PyObject_GetAttrString(pickle_module, "UnpicklingError");    if (unpickling_error == NULL) {        return 0;    }    pickle_loads_method = PyObject_GetAttrString(pickle_module, "loads");    return pickle_loads_method != NULL;}
#define MAX_PICKLE_TEST_SIZE 0x10000static int fuzz_pickle_loads(const char* data, size_t size) {    if (size > MAX_PICKLE_TEST_SIZE) {        return 0;    }    PyObject* input_bytes = PyBytes_FromStringAndSize(data, size);    if (input_bytes == NULL) {        return 0;    }    PyObject* parsed = PyObject_CallOneArg(pickle_loads_method, input_bytes);        if (parsed == NULL &&             (PyErr_ExceptionMatches(PyExc_ValueError) ||            PyErr_ExceptionMatches(PyExc_AttributeError) ||            PyErr_ExceptionMatches(PyExc_KeyError) ||            PyErr_ExceptionMatches(PyExc_TypeError) ||            PyErr_ExceptionMatches(PyExc_OverflowError) ||            PyErr_ExceptionMatches(PyExc_EOFError) ||            PyErr_ExceptionMatches(PyExc_MemoryError) ||            PyErr_ExceptionMatches(PyExc_ModuleNotFoundError) ||            PyErr_ExceptionMatches(PyExc_IndexError) ||            PyErr_ExceptionMatches(PyExc_UnicodeDecodeError)))    {        PyErr_Clear();    }
        if (parsed == NULL && (           PyErr_ExceptionMatches(pickle_error) ||           PyErr_ExceptionMatches(pickling_error) ||           PyErr_ExceptionMatches(unpickling_error)    ))    {        PyErr_Clear();    }    Py_DECREF(input_bytes);    Py_XDECREF(parsed);    return 0;}
#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_pickle_loads)    static int PICKLE_LOADS_INITIALIZED = 0;    if (!PICKLE_LOADS_INITIALIZED && !init_pickle_loads()) {        PyErr_Print();        abort();    } else {        PICKLE_LOADS_INITIALIZED = 1;    }
    rv |= _run_fuzz(data, size, fuzz_pickle_loads);#endif
這裡需要有一點注意的，如果我們直接利用上面的編譯，可以使用，但是很快fuzz_pickle_loads就會退出，
退出的原因在於libfuzzer會有內存限制，即使提高了libfuzzer的內存使用量，但隨著我們測試的深入，依然會因為內存不足
導致出問題，這個問題困擾了我很久，在不斷試錯，不斷調試後發現最後通過修改cpython的源碼解決
#define PY_SSIZE_T_MAX ((Py_ssize_t)(((size_t)-1)>>1))
#define PY_SSIZE_T_MAX 838860800  
這樣就解決了libfuzzer內存限制，導致fuzz不斷失敗的問題
修改完後，可能cpython某些模塊會因為內存過小導致編譯失敗，這裡可以略過，只要我們的fuzzer程序能跑起來就行整個過程折騰了我兩天的時間，各種編譯和運行錯誤，最後成功執行
tmux new -s fuzz_pickle ./out/fuzz_pickle_loads -jobs=60 -workers=6
我用六個線程，大概跑了一周的時間，沒有發現任何crash，果然這種頂級開源項目相對來說代碼質量還是不錯的。有興趣的可以自己跑一下，萬一跑出來漏洞了呢 🙂最近大部分時間都是在看開源軟體的漏洞，比如網絡組件，開源語言等等，開源軟體的好處就是我們可以直接根據commit，定位到漏洞，了解其漏洞原理和修複方法，之後就是不斷分析其中的漏洞，然後想辦法能不能自己編寫一個fuzzer把這些漏洞跑出來，整個過程不斷提高自己編寫fuzzer的能力和分析漏洞的能力。這類文章我應該會有一個開源漏洞fuzz系列，這個是第一篇，感興趣的話可以關注一下我的博客譯文聲明
譯文僅供參考，具體內容表達以及含義原文為準。
精彩推薦
政治黑客行動：入侵遊戲帳號只為用於支持川普選
cpython歷史漏洞分析及其fuzzer編寫

相關焦點

漏洞挖掘|開源Fuzzer和Fuzzing學習資源整理

LibFuzzer workshop學習之路

websocket-fuzzer : WebSocket Fuzz 測試工具;Bash讀取/etc/passwd技巧

MikroTik-SMB 測試之 Mutiny-Fuzzer

fuzz實戰之libfuzzer

PTFuzzer:一個基於硬體加速的二進位程序Fuzz工具

PXE Dust:Windows Servers Deployment Services漏洞分析

想學CPython,Python之父Guido親上陣

Windows RDP協議 Fuzzing 漏洞挖掘研究

VNC安全性分析

如何自己動手編寫漏洞POC

CSS大會 | 打破常「規」:挖掘語法解析器規則漏洞

Ruckus 路由器多個漏洞分析

F-Secure Internet Gatekeeper堆緩衝區溢出漏洞分析

利用XSStrike Fuzzing XSS漏洞

CodeMeter產品 CVE-2020-14517高危漏洞分析

挖掘VirtualBox漏洞,以3個CVE為例

移動APP漏洞自動化檢測平臺建設

CVE-2019-2234組件暴露漏洞分析