Although bitcoin is famous for its anonymity level, it is agreed that it is not completely anonymous as there are multiple ways for tracking the sender of bitcoin transactions in most cases. We will discuss the Bayesian method for identification of bitcoin addresses throughout this article.
The Bayesian method for identification of bitcoin addresses can be broken down into 3 main steps:
1- Identification of the IP addresses linked to transactions in question
2- Categorization of bitcoin addresses
3- Assigning bitcoin addresses to users.
First, messages propagating across the bitcoin network are recorded and observed via special monitoring clients so that the greatest possible part of the network could be covered. These clients get information from the senders of transactions before being in relayed onto the first time segment. After some analysis, more attention is directed towards the monitoring clients that are likely to be the originator of the transaction.
Then, analysis of the blockchain is done to categorize bitcoin addresses belonging to the same user. This is facilitated by the innate properties of the blockchain which enable detection of the balances of various bitcoin addresses.
Finally, by obtaining multiple transactions originating from the same bitcoin address and categorizing bitcoin addresses according to users will enable us to combine data from multiple transactions to pinpoint users with relative accuracy. The monitoring clients are grouped according to geolocations via IP addresses, which permits pinpointing of bitcoin addresses and prediction of the geographical distribution and bitcoin flow across the world.
Step 1: Setting Probabilities:
Let us propose that a bitcoin transaction has been detected and recorded by the monitoring client. Even when a monitoring client establishes a connection to the originating address, this doesn’t mean that the client will intercept the message first, as the message is occasionally relayed faster via a mediator client.
If the originator address has less than 20 active connections, it informs all connected clients within 2 seconds or less that the monitoring client has first received the message. If we assume that there are around 50000 monitoring clients, around 92% of them will have no more than 20 connections. After the transaction is executed, each bitcoin client within the first time segment could be the bitcoin wallet address originating the transaction or another client relaying it. On the other hand, the originator address can be part of the rest of the bitcoin network, if it doesn’t get connected to the monitoring client.
To calculate the probability, we will use the following notations: C denoting that the monitoring clients has connected successfully to the originator of the transaction, O denotes that the originator has relayed the message within the context of the first time segment to the monitoring client and F denotes that a random client off the first time segment is the real originator of transaction.
P (C) = |C| / |A|
If the monitoring client successfully establishes a connection to the originator, it will inform the monitoring client within the context of the first time segment. Accordingly, all first time segment clients have equal probabilities of being the originator.
P (O|C) = 1 P(F|C) = 1 / |F|
If we then apply the law of total probability, we can get the following formula:
The above formula provides the probability of all clients belonging to the first time segment. Connected clients that aren’t part of the first time segment have a probability of zero. The remaining active clients have a probability of 1 / |A|.
Step 2: Categorizing Bitcoin transactions that belong to the same user:
The next step is to categorize bitcoin addresses according to the users that own them. In such a way, every transaction can be linked to its sender via analyzing the originating address of the transaction. It is well known that bitcoin addresses that appear on the input side of a given transaction belong to the same user. The below figure shows how bitcoin addressed are grouped. Transactions on the left side of the diagram are linked to their originating addresses, while the right side shows how bitcoin addresses are grouped.
Step 3: Correlating probabilities – Naive Bayes Classification:
Message propagation can be used to determine if clients are the originators of bitcoin transactions. IP addresses collected will be collectively divided into two groups: “originator” and “non-originator” IPs. In most cases, one IP can be linked to the originator address, however, as some users use more than one IP to create bitcoin transactions, more than one IP address can be linked to the originating address when final results are obtained. By applying the naive Bayes classifier, the probability of a given IP address belonging to the Co originator class can be calculated by the following formula.
Bitcoin transactions are not completely anonymous, yet there are multiple ways that can help you cover your tracks when using bitcoin and also you can receive bitcoin almost anonymously by using special techniques and precautions. The Bayesian approach has proven efficacy in deanonymising bitcoin transactions using simple techniques.