As I have explained in great details in a previous post, the balance of a Bitcoin wallet differs from the balance that can be computed by looking at the blockchain. The reason is that practically every time you make a transaction, the Bitcoin software sends a fraction of the funds back to you, to a new address. Such addresses are invisible in the user interface, and for all practical purposes can be ignored by end users. These addresses are pre-allocated and constitute the key pool.
In this article I explain how you can compute the balance of a BitcoinQt
wallet.dat file using node.js and the JSON API of blockchain.info. I will show how you can extract the public addresses contained in the wallet, whether it is encrypted or not, and demonstrate how to use the blockchain.info API efficiently.
Extracting the Keys
Bitcoin relies on public/private key cryptography. The public key is what is turned into a public address through hashing (detailed below) and the private key is the one used to sign the transactions. In order to compute the balance of a wallet, we need to extract all the public keys from the key pool.
wallet.dat is a Berkley DB database. This is an old NoSQL embedded database from long before the term was coined. Unlike relational databases such as SQLite, it is not possible to make complicated queries in SQL. The wallet uses a single key/value dictionary to store all public and private keys, the last transactions, and all other account informations.
The keys from the key pool are stored directly into the database as individual records. The database keys are split into two parts. The first part of the DB key is some kind of identifier, that describes the type of record. Non encrypted Bitcoin keys are identified with
key while encrypted keys are identified with
ckey. The second part of the DB key is the Bitcoin public key itself, preceded by a single byte that indicates the length in bytes of the public key. The database record associated to the DB key is the private Bitcoin key, which may or may not be encrypted. This splitting of the DB key into two parts may sound odd, you have to remember that the DB keys must be unique to identify a record.
Here’s a drawing, in case you’re starting to be confused between the DB keys and the Bitcoin public and private keys.
You can see details about the key loarding algorithm in walletdb.cpp from the source code of Bitcoin. Look at the
I had no luck using node-bdb for reading the wallet with node.js, because cursors are not supported. Instead, I used a crude algorithm looking for the
key strings within the wallet. This works because BerkeleyDB does not use any compression, but that’s not ideal.
Computing Bitcoin public addresses
Once you have obtained the Bitcoin public keys from the pool, you can turn them into a Bitcoin addresses through a complicated process of hashing, which is well documented.
However, I found a discrepancy with the documentation. The length of the keys I read from my wallet were all 33 bytes instead of the expected 65 bytes. Bitcoin uses ECDSA, a variant of elliptic curve cryptography. Both public and private keys are points on a specific elliptic curve (Secp256k1) whose equation is
y^2 = x^3 + 7. The keys are the
y coordinates stored as 32 bit integers preceded by one extra header byte that OpenSSL uses to store format information. Since
y satisfy the known equation of the curve, you can store a single coordinate and a sign to compute the other coordinate with
y = +/- sqrt(x^3 + 7). This is what is called a compressed key, and it is 33 bytes long.
I initially thought that I would have to decompress the key, and convert it to the 65 bytes format in order to compute the address. But it turns out that Bitcoin hashes the OpenSSL ECDSA public key as it is. I suspect that means you could have multiple public addresses associated to the same public key: one public address for uncompressed 65 bytes key, and one public address for compressed 33 bytes keys…
You can simply hash the public keys with with the
sha256 algorithms as documented. The details of the hashing are strangely complicated, but easy to follow, and you end up with a 25 bytes binary address. This binary address is then converted into an ASCII string using the Base58 Check encoding. As explained in the source code and the docs, this encoding is used instead of the easier to calculate and more base64 encoding, to avoid confusion between similar characters like
Computing the balance with Blockchain.info
You can use the JSON API from blockchain.info to compute the balance of each address in the key pool, and sum them up to obtain the final balance. These APIs are public and can be used without authentication, but they are rate limited. This can be a problem if you use naively the single address API. Instead you should use the multi address API or alternatively the unspent outputs API.