[BTC-dev] util.cpp phexdigit array Bug Analysis and Fix

Shane Kinney modsix@gmail.com
Mon Apr 30 23:24:19 UTC 2018


URL: <http://therealbitcoin.org/ml/btc-dev/attachments/20180430/attachment.txt?sha1=f2ca38a70031dcbd3a24f7278566f5bb87eed450>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hello,

The util.cpp phexdigit array bug analysis is attached to this email, along with
a fix vpatch for consideration.

Regards,
mod6
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iQIcBAEBCgAGBQJa57FhAAoJEHIXBai3Hq2vYkcP/A/QweQgU6r+sF7xIjm6DMR9
c57jwm9cGJP7bPl45rdCTBIElVKUAbWm2T4KnKWuXtwsUtPiJ1MC0qJ0TrPc4CCC
+Wb3SZz/RJvIFZp6ZLTa5x4JytnHx5Z/y7jNst3UfoaxtpsGXeCN5pL0rg7pxy/9
BXjD22r5SsuAT5FdPqSpkJRtYrAlJWvwfswXd3bRoZuadwgCsU2Xd7095ITEstwJ
Mw7MV7M/rBWjcoRQ9yuli0kga8CjPpnr5HnGyCjvadAKGL3xVitL6GBXPHHr4/Tx
+Cbi8tU6iLt2Wz+Xsiav7x/adyqmoZxsXPX33ifH84lwcTZCNOEU6mQkPobC6s6I
N2Ps3JlFmOnfzygMoVJjY6g8dje8PupKZVcxGUL3Gns3bRni96Jo4uVnOxecYPN8
PpSDTiqanmRP9VbosP+GrA5t6Ohlkk/pHkdT0AFXJq1KxKkmG0dU3c5liTR+QHe6
LOxdKwndSMdSGobxhhzEsY+1MXtgkEAbGOp4JIvsMTjYPwNhlk4zdMCYrHxQm9bU
+tURbS2mTGTR5xgpX7tKZVCUgGlyM3PZY7F5tsGMYm+Xh6w0K1D6691m3XOJXBOC
vwgGcScPmR8QC+ftkdlcpEz4QBgdZR+tTkTNIgsYEu9JA/DqKqphbbKeMebAMj8m
nvfzJT9PVfMwJTWqkpqK
=Jrth
-----END PGP SIGNATURE-----
-------------- next part --------------
Name: phexdigit_post.txt
URL: <http://therealbitcoin.org/ml/btc-dev/attachments/20180430/phexdigit_post.txt?sha1=00b5bb007e1d0e1f268f0edf35e652295a9df516>
     ..::[ util.cpp phexdigit array Bug Analysis ]::..

         [ Date: 2018.04.29                      ]
         [ Block Height: 520373                  ]
         [ Author: mod6                          ]


0x00: [ Introduction ]

  Yesterday, I was adventuring in the TRB code, considering possible raw
transaction changes, when I happened across a bug in the wild.

  This nasty critter was easy to overlook in the forest of other commas used to
define the phexdigit[256] array; it is simply missing a ',' separating two of
it's values [R.01].

0x01: [ Investigation: Stage 1 (From whence it came) ]

  The first thing that came to mind was, "Where was this introduced?!".  I went
and looked first at The Bitcoin Foundation's 'genesis.vpatch', and sure enough
there it was, just waiting for us.  Next, I went up to shithub to see if I could
find the responsible party.  Now you would think that they, would have all of
the previous history easy to browse at shithub.com/bitcoin.  But au contraire!
They only keep history back to tag v0.7.1.  Derp.  

  I started with that tagged version (v0.7.1) and found it is indeed fixed
there.  I discovered that the fix for this bug was placed in prb [R.02] code in
v0.7.0.  But that didn't satisfy me yet, I wanted to know who was responsible.
I went around chasing down all of the older versions to see where it was
introduced.  

  The util.cpp phexdigit array was introduced, with bug included, in version
'v0.1.6test1'.  Worth noting, there is only one version I could even find before
'v0.1.6test1'; 'v0.1.5', which contained a util.cpp, but not a ParseHex function
or the phexdigit array.  Since all I could find were tarballs of the ancient
versions, there was obviously no file history, V had obviously not been invented
yet, so no one specifically at whom to point the finger.

Let's go over the places in the TRB v054 where ParseHex(const char* psz) from
util.cpp is called:

Line 1673 of bitcoinrpc.cpp in: 'Value getwork(const Array& params, bool fHelp)'
Line 1775 of bitcoinrpc.cpp in: 'Value getmemorypool(const Array& params, bool fHelp)'
Line 1475 of main.cpp in: 'bool LoadBlockIndex(bool fAllowNew)'

The other places are either it's own declaration in util.h and implementation
util.cpp.  As well, there are, ironically, tests around this in 
test/util_tests.cpp of this beginning on line 55.

  I did actually attempt to compile and run these tests; however, even though I
was able to compile the source, it seems, the linker failed on the boost unit
test framework.  This is a discussion for a different time perhaps.

  Interestingly enough, the unit tests are attempting to parse the same hex
value (the genesis block scriptPubKey) that I attempt further into this report.
This is pure accident as I didn't even examine the unit tests until after I had
created my own tests.




0x02: [ Investigation: Stage 2 (Basic Entomology) ]

  I started to walk down the mental path of what this actually means in terms of
the code, and I become worried that the ParseHex(const char* psz) in util.cpp
would be doing whatever arithmetic off-by-one.  And to make matters even worse
for the man adventuring in the forest, this is a double-headed bug: Not only is
the array length of phexdigit[256] actually shortened by one value, now instead
of using 0xff to do whatever arithmetic values, now we're using (-2) 0xfe some
where!

  The first thing that I wanted to do was to show that if I used the phexdigit
array in a piece of test code, we would get differing values; before the fix
and after the fix.  However, before I even jumped into that, I also wanted to
first show how missing a ',' in an array definition as this would effect the
array's values themselves:

  I'll start by showing that if we initialize an array with a missing comma,
how this will push a '-2' onto our stack by adding the values not separated
by comma together, followed by the comma separated '-1':

mod6@gentoo ~/phexdigit $ cat missing_comma.c    
#include <unistd.h>

void main() {
  const char bytes[] = { -1 -1, -1 };
  write(1, bytes, sizeof(bytes));
}
mod6@gentoo ~/phexdigit $ gcc -g -o missing_comma missing_comma.c 
mod6@gentoo ~/phexdigit $ gcc -S missing_comma.c && cat missing_comma.s
        .file   "missing_comma.c"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movb    $-2, -16(%rbp)
        movb    $-1, -15(%rbp)
        leaq    -16(%rbp), %rax
        movl    $2, %edx
        movq    %rax, %rsi
        movl    $1, %edi
        call    write
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (Gentoo 4.8.4 p1.6, pie-0.6.1) 4.8.4"
        .section        .note.GNU-stack,"",@progbits

Now that we've seen that, what happens if we run 'missing_comma'?

mod6@gentoo ~/phexdigit $ ./missing_comma | xxd
0000000: feff                                     ..
mod6@gentoo ~/phexdigit $

As you can see, there we are, first value out is 0xfe (-2), followed by 0xff
(-1).

Let's show how this looks in its corrected form, where we do have a comma
separating all of our array values:

mod6@gentoo ~/phexdigit $ cat no_missing_comma.c 
#include <unistd.h>

void main() {
  const char bytes[] = { -1, -1, -1 };
  write(1, bytes, sizeof(bytes));
}
mod6@gentoo ~/phexdigit $ gcc -g -o no_missing_comma no_missing_comma.c 
mod6@gentoo ~/phexdigit $ gcc -S no_missing_comma.c && cat no_missing_comma.s
        .file   "no_missing_comma.c"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movb    $-1, -16(%rbp)
        movb    $-1, -15(%rbp)
        movb    $-1, -14(%rbp)
        leaq    -16(%rbp), %rax
        movl    $3, %edx
        movq    %rax, %rsi
        movl    $1, %edi
        call    write
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (Gentoo 4.8.4 p1.6, pie-0.6.1) 4.8.4"
        .section        .note.GNU-stack,"",@progbits

Alright, as we can see, as expected, '-1' gets pushed onto our stack
three times.

What happens if we run no_missing_comma?

mod6@gentoo ~/phexdigit $ ./no_missing_comma | xxd
0000000: ffff ff                                  ...
mod6@gentoo ~/phexdigit $

Alright, as expected, we get 0xff (-1) output three times.

The next thing that I wanted to try was to take the whole existing (not fixed)
phexdigit array into a test program and output its hex values, then compare
those values with a fixed version to see the differences.

mod6@gentoo ~/phexdigit $ cat phexdigit_before.c 
#include <unistd.h>

void main() {
  static char phexdigit[256] =
    { -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      0,1,2,3,4,5,6,7,8,9,-1,-1,-1,-1,-1,-1,
      -1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, };

  write(1, phexdigit, sizeof(phexdigit));
}
mod6@gentoo ~/phexdigit $ cat phexdigit_after.c  
#include <unistd.h>

void main() {
  static char phexdigit[256] =
    { -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      0,1,2,3,4,5,6,7,8,9,-1,-1,-1,-1,-1,-1,
      -1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, };

  write(1, phexdigit, sizeof(phexdigit));
}

mod6@gentoo ~/phexdigit $ gcc -g -o phexdigit_before phexdigit_before.c 
mod6@gentoo ~/phexdigit $ gcc -g -o phexdigit_after phexdigit_after.c  
mod6@gentoo ~/phexdigit $ ./phexdigit_before | xxd > phexdigit_before.hex
mod6@gentoo ~/phexdigit $ ./phexdigit_after  | xxd > phexdigit_after.hex

Let's take a look at the output from both: 

mod6@gentoo ~/phexdigit $ cat phexdigit_before.hex                     
0000000: ffff ffff ffff ffff ffff ffff ffff ffff  ................
0000010: ffff ffff ffff ffff ffff ffff ffff ffff  ................
0000020: ffff ffff ffff ffff ffff ffff ffff ffff  ................
0000030: 0001 0203 0405 0607 0809 ffff ffff ffff  ................
0000040: ff0a 0b0c 0d0e 0fff ffff ffff ffff ffff  ................
0000050: ffff ffff ffff ffff ffff ffff ffff ffff  ................
0000060: ff0a 0b0c 0d0e 0fff ffff ffff ffff fffe  ................
0000070: ffff ffff ffff ffff ffff ffff ffff ffff  ................
0000080: ffff ffff ffff ffff ffff ffff ffff ffff  ................
0000090: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000a0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000b0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000c0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000d0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000e0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000f0: ffff ffff ffff ffff ffff ffff ffff ff00  ................
mod6@gentoo ~/phexdigit $ cat phexdigit_after.hex 
0000000: ffff ffff ffff ffff ffff ffff ffff ffff  ................
0000010: ffff ffff ffff ffff ffff ffff ffff ffff  ................
0000020: ffff ffff ffff ffff ffff ffff ffff ffff  ................
0000030: 0001 0203 0405 0607 0809 ffff ffff ffff  ................
0000040: ff0a 0b0c 0d0e 0fff ffff ffff ffff ffff  ................
0000050: ffff ffff ffff ffff ffff ffff ffff ffff  ................
0000060: ff0a 0b0c 0d0e 0fff ffff ffff ffff ffff  ................
0000070: ffff ffff ffff ffff ffff ffff ffff ffff  ................
0000080: ffff ffff ffff ffff ffff ffff ffff ffff  ................
0000090: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000a0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000b0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000c0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000d0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000e0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000f0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
mod6@gentoo ~/phexdigit $

As we can see in the file phexdigit_before.hex, we not only see the value 0xfe
contained in the output, but also a 0x00 at the end, indicating the
two-headedness of this bug.

In the second, corrected version, phexdigit_after.hex, we see that all the
values that are intended to be 0xff are, and the entire array is filled.




0x03: [ Investigation: Stage 3 (Advanced Entomology, Into The Guts) ]

ParseHex(const char* psz) is called from a few places, but one in particular
I thought was worth investigating, line 10920 of genesis.vpatch
where we set the scriptPubKey of the hard-coded genesis block data, as such:

txNew.vout[0].scriptPubKey = CScript() << ParseHex("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f") << OP_CHECKSIG;

Seeing as how correcting this bug changes how ParseHex(const char* psz) works,
I implemented a small test program to see outcomes from before the fix, and
after.  This is the entire ParseHex(const char* psz) function brought into my
test program, along with my driving code in main().

mod6@gentoo ~/phexdigit $ cat full_before.cpp 
#include <vector>
#include <stdio.h>
#include <ctype.h>

#define loop for(;;)

using namespace std;

vector<unsigned char> ParseHex(const char* psz)
{
    static char phexdigit[256] =
    { -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      0,1,2,3,4,5,6,7,8,9,-1,-1,-1,-1,-1,-1,
      -1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, };

    // convert hex dump to vector
    vector<unsigned char> vch;
    loop
    {
        while (isspace(*psz))
            psz++;
        char c = phexdigit[(unsigned char)*psz++];
        if (c == (char)-1)
            break;
        unsigned char n = (c << 4);
        c = phexdigit[(unsigned char)*psz++];
        if (c == (char)-1)
            break;
        n |= c;
        vch.push_back(n);
    }
    return vch;
}

int main() 
{
  const char *scriptPubKey = "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f";
  vector<unsigned char> vch;   
  vch = ParseHex(scriptPubKey); 
  for(vector<unsigned char>::iterator it = vch.begin(); it != vch.end(); ++it) 
  {
    printf("%c", (*it)); 
  }
  return 0;
}

mod6@gentoo ~/phexdigit $ cat full_after.cpp 
#include <vector>
#include <stdio.h>
#include <ctype.h>

#define loop for(;;)

using namespace std;

vector<unsigned char> ParseHex(const char* psz)
{
    static char phexdigit[256] =
    { -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      0,1,2,3,4,5,6,7,8,9,-1,-1,-1,-1,-1,-1,
      -1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, };

    // convert hex dump to vector
    vector<unsigned char> vch;
    loop
    {
        while (isspace(*psz))
            psz++;
        char c = phexdigit[(unsigned char)*psz++];
        if (c == (char)-1)
            break;
        unsigned char n = (c << 4);
        c = phexdigit[(unsigned char)*psz++];
        if (c == (char)-1)
            break;
        n |= c;
        vch.push_back(n);
    }
    return vch;
}

int main() 
{
  const char *scriptPubKey = "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f";
  vector<unsigned char> vch;   
  vch = ParseHex(scriptPubKey); 
  for(vector<unsigned char>::iterator it = vch.begin(); it != vch.end(); ++it) 
  {
    printf("%c", (*it)); 
  }
  return 0;
}

Let's diff these two tests just to ensure that the only line that was changed
was indeed, just adding the missing comma to the phexdigit array:

mod6@gentoo ~/phexdigit $ diff -uNr full_before.cpp full_after.cpp 
--- full_before.cpp     2018-04-29 02:45:38.839025733 -0500
+++ full_after.cpp      2018-04-29 02:45:49.654026384 -0500
@@ -15,7 +15,7 @@
       0,1,2,3,4,5,6,7,8,9,-1,-1,-1,-1,-1,-1,
       -1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1,
       -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-      -1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1
+      -1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1,
       -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
       -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
       -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,


Ok, so we can see that is correct.  Now, let's do the same thing as above
in section 2: Compile, Execute, Compare.

mod6@gentoo ~/phexdigit $ g++ -g -o full_before full_before.cpp 
mod6@gentoo ~/phexdigit $ g++ -g -o full_after  full_after.cpp
mod6@gentoo ~/phexdigit $ ./full_before | xxd > full_before.hex 
mod6@gentoo ~/phexdigit $ ./full_after  | xxd > full_after.hex 
mod6@gentoo ~/phexdigit $ diff -uNr full_before.hex full_after.hex 
mod6@gentoo ~/phexdigit $ cat full_before.hex 
00000000: 0467 8afd b0fe 5548 2719 67f1 a671 30b7  .g....UH'.g..q0.
00000010: 105c d6a8 28e0 3909 a679 62e0 ea1f 61de  .\..(.9..yb...a.
00000020: b649 f6bc 3f4c ef38 c4f3 5504 e51e c112  .I..?L.8..U.....
00000030: de5c 384d f7ba 0b8d 578a 4c70 2b6b f11d  .\8M....W.Lp+k..
00000040: 5f                                       _
mod6@gentoo ~/phexdigit $ cat full_after.hex 
00000000: 0467 8afd b0fe 5548 2719 67f1 a671 30b7  .g....UH'.g..q0.
00000010: 105c d6a8 28e0 3909 a679 62e0 ea1f 61de  .\..(.9..yb...a.
00000020: b649 f6bc 3f4c ef38 c4f3 5504 e51e c112  .I..?L.8..U.....
00000030: de5c 384d f7ba 0b8d 578a 4c70 2b6b f11d  .\8M....W.Lp+k..
00000040: 5f                                       _

We get the same exact output for both before and after.  Perhaps the
scriptPubKey from the genesis block is a bad test case.  Let's do some
debugging to see where we would hit this landmine.

I have crafted a new test driver based on the full_before.cpp from above.
The difference here is that I know now that to hit the 0xfe (-2), you must
attempt to parse a letter outside of what would be normal hexadecimal values
[0-9A-F].  The landmine exists at array element one hundred eleven, 111.
Which corresponds to the lower case 'o' decimal value.  In my debugging, I'm
going to show when we pass as string, "ab6o78" to ParseHex, we step on the
landmine.

Before we debug, let's first show the code that I'm going to use:

mod6@gentoo ~/phexdigit $ cat debug_before.cpp 
#include <vector>
#include <stdio.h>
#include <ctype.h>

#define loop for(;;)

using namespace std;

vector<unsigned char> ParseHex(const char* psz)
{
    static char phexdigit[256] =
    { -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      0,1,2,3,4,5,6,7,8,9,-1,-1,-1,-1,-1,-1,
      -1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, };

    // convert hex dump to vector
    vector<unsigned char> vch;
    loop
    {
        while (isspace(*psz))
            psz++;
        char c = phexdigit[(unsigned char)*psz++];
        if (c == (char)-1)
            break;
        unsigned char n = (c << 4);
        c = phexdigit[(unsigned char)*psz++];
        if (c == (char)-1)
            break;
        n |= c;
        vch.push_back(n);
    }
    return vch;
}

int main() 
{
  //const char *scriptPubKey = "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f";
  const char *scriptPubKey = "ab6o78";  // XXX NOTE THE LOWERCASE 'o'
  vector<unsigned char> vch;   
  vch = ParseHex(scriptPubKey); 
  for(vector<unsigned char>::iterator it = vch.begin(); it != vch.end(); ++it) 
  {
    printf("%c", (*it)); 
  }
  return 0;
}

mod6@gentoo ~/phexdigit $ g++ -g -o debug_before debug_before.cpp 
mod6@gentoo ~/phexdigit $ gdb debug_before
GNU gdb (GDB) 7.8.90 for GNAT GPL 2015 [rev=gdb-head-ref-102395-gccbe2a7]
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
See your support agreement for details of warranty and support.
If you do not have a current support agreement, then there is absolutely
no warranty for this version of GDB.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from debug_before...done.
(gdb) break main 
Breakpoint 1 at 0x400ae0: file debug_before.cpp, line 57.
(gdb) break ParseHex
Breakpoint 2 at 0x400a0e: file debug_before.cpp, line 35.
(gdb) run
Starting program: /home/mod6/phexdigit/debug_before 

Breakpoint 1, main () at debug_before.cpp:57
57        const char *scriptPubKey = "ab6o78";
(gdb) continue
Continuing.

Breakpoint 2, ParseHex (psz=0x401f84 "ab6o78") at debug_before.cpp:35
35          vector<unsigned char> vch;
(gdb) printf "%d", *psz
97(gdb) n
38              while (isspace(*psz))
(gdb) n
40              char c = phexdigit[(unsigned char)*psz++];
(gdb) printf "%d", *psz
(gdb) n
41              if (c == (char)-1)
(gdb) p/x c
$1 = 0xa
(gdb) n
43              unsigned char n = (c << 4);
(gdb) n
44              c = phexdigit[(unsigned char)*psz++];
(gdb) printf "%d", *psz
98(gdb) n
45              if (c == (char)-1)
(gdb) p/x c
$2 = 0xb
(gdb) n
47              n |= c;
(gdb) n
48              vch.push_back(n);
(gdb) p/x n
$3 = 0xab
(gdb) n
49          }
(gdb) n
40              char c = phexdigit[(unsigned char)*psz++];
(gdb) printf "%d", *psz
54(gdb) n
41              if (c == (char)-1)
(gdb) p/x c
$4 = 0x6
(gdb) n
43              unsigned char n = (c << 4);
(gdb) n
44              c = phexdigit[(unsigned char)*psz++];
(gdb) p/x n
$5 = 0x60
(gdb) printf "%d", *psz
111(gdb) n
45              if (c == (char)-1)
(gdb) p/x c
$6 = 0xfe
(gdb) p phexdigit[111]
$7 = -2 '\376'
(gdb) n
47              n |= c;
(gdb) p/x n
$8 = 0x60
(gdb) p/x c
$9 = 0xfe
(gdb) n
48              vch.push_back(n);
(gdb) p/x n 
$10 = 0xfe
(gdb) print "we should have bombed right after this, as after 'n |= c', n would have been FF, causing us to break"
$11 = "we should have bombed right after this, as after 'n |= c', n would have been FF, causing us to break"
(gdb) n
49          }
(gdb) n
40              char c = phexdigit[(unsigned char)*psz++];
(gdb) printf "%d", *psz
55(gdb) n
41              if (c == (char)-1)
(gdb) p/x c
$12 = 0x7
(gdb) n
43              unsigned char n = (c << 4);
(gdb) n
44              c = phexdigit[(unsigned char)*psz++];
(gdb) p/x n
$13 = 0x70
(gdb) printf "%d", *psz
56(gdb) n
45              if (c == (char)-1)
(gdb) p/x c
$14 = 0x8
(gdb) n
47              n |= c;
(gdb) p/x n
$15 = 0x70
(gdb) p/x c
$16 = 0x8
(gdb) n 
48              vch.push_back(n);
(gdb) p/x n
$17 = 0x78
(gdb) n
49          }
(gdb) n
40              char c = phexdigit[(unsigned char)*psz++];
(gdb) n
41              if (c == (char)-1)
(gdb) n
42                  break;
(gdb) p/x c
$18 = 0xff



0x04: [ Investigation: Conclusion ]

  Well, it took some time to get to the bottom of this defect, and its possible
impact.  However seeing as though you need to use a value outside the hexadecimal
range to step on the trap, it has a lower chance of being sprung.  Regardless, 
this is still a nasty bug, and it should get fixed.  As we saw above in my
debugging experience, the program ~did not~ break when it was supposed to break 
if the lower case 'o' was parsed.

I am attaching a vpatch for this problem to this Mailing List post for
consideration.

Happy Hunting,
mod6

[ References ]:
  [R.01]: http://btcbase.org/log/2018-04-29#1806228
  [R.02]: http://btcbase.org/log/2015-12-21#1349600
-------------- next part --------------
A non-text attachment was scrubbed...
Name: phexdigit_post.txt.mod6.sig
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://therealbitcoin.org/ml/btc-dev/attachments/20180430/phexdigit_post.txt.mod6.sig?sha1=80947fff5c46b5d88023e0417cbcd694414ec389>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mod6_phexdigit_fix.vpatch
Type: application/octet-stream
Size: 843 bytes
Desc: not available
URL: <http://therealbitcoin.org/ml/btc-dev/attachments/20180430/mod6_phexdigit_fix.vpatch?sha1=3a69b3f8d425279b5e38f70238f508464dfecbfc>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mod6_phexdigit_fix.vpatch.mod6.sig
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://therealbitcoin.org/ml/btc-dev/attachments/20180430/mod6_phexdigit_fix.vpatch.mod6.sig?sha1=d2970b5264ed72a7219d5cdc9a6324a24386fda4>


More information about the BTC-dev mailing list