In 2013, I designed and built a cryptocurrency exchange for the China market. The basic concept & architecture only took a few days, but the full implementation required several years. I shared the basic architecture in 2013, and with the recent spike in interest in Bitcoin and Ethereum, I thought I would share additional details on the concept.
I wrote the “kernel” of the exchange over a long weekend in Shanghai. It turns out that building a large-scale cryptocurrency exchange is quite complex, and it finally (and successfully) launched in 2016. The notes below describe my design vision from 2013—the implementation followed this specification fairly closely. There is a lot of detail and documentation to many of the sections below—some of which I will elaborate in future posts. If you want more detail on something specific, comment below or find me on Facebook.
1: Cryptocurrency Exchange Design Goals
A Bitcoin exchange is a 24/7 marketplace which matches buyers and sellers of various currencies. Some exchanges hide the underlying market (Coinbase) while other make the order book semi-transparent to their users (Bitstamp) and provide advanced options. Think of the difference between a money changing desk at the airport and a professional forex currency exchange like Oanda or ForexTrader. Advanced exchanges typically allow buyers and seller to specify a set price for their order, while simple exchanges process orders at the market price. This exchange is if the latter variety: providing advanced forex trading features for power users.
a: Key Customer Features
To attract customers to a currency exchange, it needs to have three main features. There are of course many other features, but I consider these essential:
- liquidity: there needs to be enough volume for buyers and sellers to find each other, in order to (a) keep the bid– offer spread low and (b) process large orders without a large move in the price. The available liquidity should be communicated to the customer via the order book, market depth chart, and developer API.
- speedy execution: order needs to be processed quickly, ideally instantaneously from the customer point of view. Orders with set prices cannot be processed immediately, but the status of all orders should be immediately shown the customer.
- asset safety: customers need to have confidence that the money the keep at the exchange will be safe—and of course the exchange needs to actually be safe to use. While customers can’t see directly into the backend operations of the exchange, it is important to convey safety through communications and functionality. There are many different kinds of risks that an exchange faces—they are detailed below.
I group the scale of an exchange into three categories: small, medium, and large, or alternatively synchronous, asynchronous, and distributed.
- Small, synchronous exchanges are simple and easy to build: they process all customer request in a single transaction, from the interface to the order book to updating the customer’s account record. They have very limited scaling abilities and will fail at large transaction volumes.
- Medium, asynchronous exchanges use independent layers to process requests. The interface takes user requests and submits it to a queue. Each queue runs as a service—possibly on a different machine. The service executes the request, the communicates the status to the interface. For example, when a customer places an order, the interface inserts a request into an order book, the polls for status updates. An order matching service processes the order, then returns the response, which is shown by the interface. There is a service for each major features of the exchange, as well as maintenance services which are independent of user-triggered requests. This kind of exchange can scale to much larger transaction volumes. Each feature of the exchange runs as an independent service. This exchange can scale much higher than a small exchange.
- A large, distributed exchange is like a medium exchange, but the split the customer activity into segments (shards). The shards are independent, and if using cloud auto-scaling the exchange can scale indefinitely.
This design describes a medium sized exchange.
c: Component Isolation
- Customer interface layer (the UI layer is service-based using knockout.js and jqGrid to bind to JSON web services.)
- A trading (order processing) engine
- Bitcoin client interface
- Market maker
- Backend customer service interface
I won’t detail the design of each component here. The rest of the post elaborates the three basic requirements described above, plus the advanced trading features developed for this exchange.
2: Advanced Trading Features
This section is not by any means needed by the vast majority of users or indeed cryptocurrency exchanges, but was the most difficult component both conceptually and practically, so I’ll cover it first. The basic way to buy or sell an asset is to place a “market order.” This is a standing request to buy or sell some asset for whatever is the best price a matching buyer or sell offers.
Market order: The basic way to buy or sell an asset is to place a “market order.” This is a standing request to buy or sell some asset for whatever is the best price a matching buyer or sell offers. If there is enough liquidity in the market, the order should execute immediately either in one transaction, or multiple transactions. Many markets hide this detail by setting a single market price and hiding the details (and their risk) by charging a fixed percentage for the transaction.
Advanced orders: Power users don’t want to accept whatever the market rate happens to be. The only want to trade if the matching order meets a certain price floor or ceiling. This and variations on this theme are advanced orders. This exchange implements the following advanced order feature:
- Set price order: order only executes if there is a match at a specific price
- Trigger orders: Stop-loss/Take-profit: coins are automatically sold if the price falls below or above a set price
Leverage: customers can request access to leverage. Leverage allows approved users to trade as if they had more assets in their wallet than its actual value. This is very dangerous! A customer can make a bad trade and end up owing money to the exchange. To mitigate this risk, leverage is limited to (1) a fixed percentage (40% above fiat balance) (2) manually validated customers with a good history (3) leveraged transactions are only permitted if the wallet balance (if all assets were converted to fiat) after the transaction would be greater than zero. Because of the risk and processing complexity, only a small minority of customers get access to leverage.
Multi-currency support: There is no “primary” currency. All standing orders as created for to buy/sell currency A for B. This allows adding an arbitrary number of different currencies to the exchange via a simple configuration change.
Transparency: An order book and market depth chart are provided to customers—and also available via an API.
3: Speedy Execution
Essential to the success of any exchange is the ability to quickly match buyers and sellers regardless of market conditions—without compromising the integrity of the system. To keep up with a high transaction volume, we rely on several traditional tricks—such as in-memory caching of the order book, with transaction triggered cache invalidation. Memory caching allows orders can be matched very quickly. However: each transaction must be an atomic operation that persists the outcome to the database. This requires low-level table locking to prevent double-spending of a customer’s wallet. The advanced order features add quite a bit of complexity to the order processing, which requires very careful processing to avoid database deadlocks. See this post for the detailed order matching algorithm.
4: Security, Safety, & Integrity
Most people imagine that hackers trying to steal or crash the exchange are the biggest threat to a business, but this is just one of many threats facing a cryptocurrency exchange.
A much more serious threat is financial exposure: accepting legitimate orders which due to a flawed algorithm cause the exchange to lose all its assets, and even end up owing customers money. Most hacks and crashes can be recovered from, but a hidden bug in a flawed trading algorithm can cause the exchange to owe customers hundreds of millions of dollars, and cause the best exchange to go bankrupt. Such a bug cost Knight Capital Group $440 million in 2012 and drove the company bankrupt.
The threats can be classified as: security, safety, & integrity:
- Security threats: harm caused by a malicious agent—an external or internal actor trying to steal assets or interfere with normal operation
- Safety risks: costs incurred during normal operation which impose financial or reputational costs. For example, unintentionally selling coins below their market rate, and thus making a loss on transactions.
- Integrity risks: costs incurred because of system malfunction—crashes, hardware damage, overload, etc. For example, losing access keys to a crypto wallet because they were not properly backed up and the disk storing them crashed.
The following categories summarize categories of risk that we attempt to mitigate. Detailed descriptions are reserved for a dedicated post—please ask if you’re interested in the details of a specific risk.
A: Financial Exposure
This is the risk that the exchange will execute unprofitable transactions (for example, selling below market price). If customers are able to effectively perform arbitrage with another exchange, the exchange’s reserve may be rapidly depleted, and go negative. There are many ways to get in trouble with financial exposure—here we’ll cover three:
- Market trading: if the market price does not reflect the actual order book, periods of high volatility can permit arbitrage with other exchange, causing rapid depletion of fiat or cryptocurrency. If large orders are processed at a fixed rate and the volume is such as that processing the entire would cause a large move in the price, the exchange may incur a significant loss.
Mitigation: (1) The market price is updated near real time. (2) Orders are processed at prices matching actual or virtual (see market marker) bids and offers. (3) Alerts notify admins when there is a large % move in a house account balance (4) Fees for deposit/withdrawal are designed to nullify prices difference and make arbitrage unprofitable.
- Leveraged accounts: see “leverage” above. If the screening process and technical measures fail, the exchange must notify the account holder and use whatever means are available to recover the negative balance. A dedicated service monitors for a negative net account balance and notifies admins for follow up.
- Market maker: see “market maker” below. The market maker engages in automated trading to provide the exchange with volume. While the transactions are designed to fill actual orders at set prices, a bug or oversight in any automated trading system is highly risky. We mitigate risk with a real-time admin dashboard open to admin staff at all times and alerts set up when account balanced exceed set limits.
B: Customer Identity Theft
This is the risk that malicious users will impersonate legitimate users—by stealing credentials, bypassing the security controls, internal threats etc. For example, a malicious employee could change the withdrawal address in accounts and redirect bitcoin withdrawals to his own wallet.
The exchange controls identity theft by:
- Use a best-practice library for user account controls—the Microsoft ASP.Net membership provider.
- Require two-factor authentication for all withdrawals using SMS authentication
- Require real id validation to enable traders over X amount.
- Deposit and withdrawals over $X require administrator approval.
C: Security Risks
An agent who bypasses security measures and is able to perform unauthorized actions can do significant damage. These actions need to be technical. A malicious employee can copy the private key for a wallet in the course of normal operation and then transfer out its contents without any access to the system. It is very hard to identify the source of the leak in many scenarios.
The security measures consider five types of security risk:
- Application privilege escalation
- Server/cloud-service compromise
- Administrative account compromise
- Internal threats—malicious staff and developers
- External account (bank accounts, third party linked exchanges) compromise
A detailed discussion of each risk will be detailed later. Let’s consider the general security principles used to mitigate them:
Internal Threat Mitigation
- The principle of least privilege: each administrative account only has access to the module needed to operate it.
- Role separation: each administrative function is a separate page in the Admin Console and depends on a different role. Role separation is used to prevent any one admin from being able to disquise a chain of undesired actions
- Multiple signatures: certain actions, such as approving transfers from cold storage, bank transfers, etc require multiple employee signatures
- Environment credential isolation: developers use an independent environment from the production system. Deployment is automated and performed only by the lead architect with supervision from business ops. All customer data is wiped when cloning production data into development. Debugging of issues in production is supervised on a case by case basis.
External Threat Mitigation
- Component isolation: (see “component isolation” above) each system component (website, services, wallet) runs on a separate machine with different credentials
- Circuit breakers Alerting System: when certain threshold are reached (low balances, high transactions, etc) an email/SMS alert is sent to admins
- Real-time visualizations: a reporting platform shows a visual color-coded display of the order book, market maker, & wallet activity.
- DDOS protection: an active firewall (Cloudfare business) is used to obfuscate the origin web server. The origin server is on a separate AWS EC2 Security Group from the transaction server, which is separate from the wallet server. Only specified IP/port numbers are allowed past the firewalls.
Physical Security Measures (Cryptocurrency/Fiat Asset Protection)
- Air gaps: cold storage (with Trezor devices) is used to store the exchange’s reserves. The wallet is stored in a safe that only executives can access.
- Multiple signatures are required to access the “reserve” cold storage wallet. A separate cold wallet is used to store operating profits.
- All cloud servers are encrypted.
- Physical (paper) logs and signature is required to authorize cold storage and large withdrawal operations
D: Systems/Data Integrity
This is the risk that due to some software or hardware malfunction, an inconsistent or corrupt state will result. For example, if a customer withdraws money from a wallet at the same instant that he makes a purchase larger than the remaining balance, both transactions may go through.
- Atomic Transactions: each market operation is contained in a database transaction. While the system is calculating changes, the changes are tentative unless all changes are complete. If the transaction fails to complex, the operation is locked back.
- Sanity checking: market operation (orders) and transfer operations (withdrawing/depositing currency) use multiple sanity checks. For example, when buying BTC, the system checks for sufficient balances by storing the balance in the interface layer, when matching bids and offers, and in the transaction itself. When making withdrawals, the amount to be transferred is checked at the interface layer and at the transfer service level.
- Integrity checks: a maintenance service periodically validates system integrity. For example, the account balance of a customer should match the total of his debits and credits.
- Backup automation: restoring from backups is a matter of last resort, however account & the order book use the SQL Server transaction log, and are point backups nightly.
- Human escalation: if a transaction is out of scope (exceeds a set amount) or operating wallets fall below a certain minimum, or the exchange rate deviates too far from third party references, admins are notified.
- Auditing: a searchable audit log (using a separate data store from the trading system) records everything the system does. The admins can quickly filter the audit log by the customer to see all his activity.
5: Liquidity: the market-making service
Ideally, the exchange does not sell currency directly to customers, but simply matches buyers and sellers. Some exchanges hide this process by setting a price which matches current demand for specific currencies (like Coinbase), while others show the order book (Bitstamp). Nevertheless, customers must be actively be buying and selling cryptocurrency in order for the spread between the buy and sell price to be low, and for large order to be processed without large movements in the price. This presents a chicken and egg problem—an exchange can’t attract any customers without an existing active user base, but users won’t come to an exchange without existing liquidity.
The exchange operator can start with a reserve of cryptocurrency and fiat assets in order to provide initial volume for trading. But how can the operator create market volume without price signals given rapidly changing prices? If the price he offers deviated from the other exchanges even for a short time, traders can rapidly arbitrage on the difference, deplete the reserves, and bankrupt the exchange.
This is where the market-making system comes in. The market maker is a service run by the exchange which creates virtual orders. These orders are real in the sense that they represent transactions that that market marker is willing to make, but “fake” in the sense that they are being created by the exchange itself, and not by other customers. How does the market maker know how to price the orders? By pulling the live order book from the API of multiple third party exchange. The mirrored orders are executed on the first-party exchange, then the transactions are rolled up and reconciled by executing equivalent transactions as a batch on the third party exchange when a threshold is reached. This allows the exchange to generate volume without incurring excessive risk.
That’s the high-level design view of the exchange. Contact me with questions or opportunities.