As we conclude Game of Stakes and march toward the Cosmos Mainnet launch, Delegators must perform the crucial function of selecting Validators with which to delegate. There are multiple dimensions to consider when delegating, and in this series of posts we’ll outline key areas for Delegators to learn about.
In this post we’ll dive into security and Private Key management, and discuss the tradeoffs of different approaches. As a Delegator, you should have a good understanding of your Validator’s approach so you can better understand your slashing risk for Atom loss.
A critical part of being a Validator is proposing and signing blocks. This is done in a cryptographically secure way by using a unique “private key” that each Validator controls.
The private key is how a Validator proves who they are (identity), and therefore must be protected with utmost levels of security. If the private key is compromised or lost, the Validator must be abandoned.
The Cosmos/Tendermint private key is stored in a small text file that contains a long string of characters in a format called JSON and it looks like this:
priv_validator_key.json
“priv_key”: {
“type”: “tendermint/PrivKeyEd25519”,
“value”: “+QO5IbIlGMKEBIMaxQvhb+sZHKMvOtnhlwXSgDMB4V87BtUF+LkE0DYDQ5pa/hHhJyUHZ4nn4V/yWuQ8VnDiQg==”
}
The challenge for a Validator is: how to balance making their private key available to quickly sign blocks to keep the network humming along while at the same time securing and limiting access to it. Said differently: how to reliably access a file while severely restricting access to it?
There are several approaches to securing a private key, each with their own benefits and tradeoffs. As a Delegator, you must understand the approach your Validator is taking, as the penalties for misusing or losing a private key are steep.
As we wrote about in Cosmos Delegator and Validator Economics, the biggest impact on Delegator returns is limiting slashing risk. One of the ways a Validator can get slashed is if they “double sign”: their private key is used to sign the same block twice.
For the purposes of this discussion, we’ll focus on two ways that double signing can happen: accidents and compromised private key.
As an accident, the Tendermint/Cosmos software may have a bug or an unexpected failure mode (aka crash). As the software restarts, previous blocks may be signed unbeknown to the Validator and could cause a double sign. These sort of situations may be out of the Validator’s control, or they could be the result of misconfigured software or other operator error.
Another accident scenario is when a Validator may keep two servers with the Cosmos/Tendermint software running simultaneously and they both try to sign the same block. Perhaps to limit downtime, a Validator keeps a live and active server signing blocks and a backup server at the ready to switch to in the case the active server fails. If the backup server starts signing blocks while the active server is also signing blocks, the Validator will double sign.
If the private key is compromised and another party gets possession of it, they can very simply force a Validator to double sign by running a second version of the Cosmos/Tendermint software and using the private key. This kind of attack is far more nefarious and usually a directed action against a Validator.
In both cases (accident and compromise), the penalty for a Validator double signing is getting slashed and “tombstone jailed”. When a Validator gets slashed, all Delegators lose a percentage of their tokens. When a Validator is tombstone jailed, it means that the Validator can never use their current private key again on the network and their Validator is effectively dead. All Delegators must redelegate to another Validator.
Given the necessity to sign blocks (just once!) while also securing access to the private key file, it’s important to understand a Validator’s strategy for private key management.
Here are a couple of options currently available:
One larger network-wide consideration is having a diverse set of signing solutions. If all Validators were running same signing software and it was compromised, all Validators would be at risk. If there are different approaches, theoretically a disastrous bug wouldn’t decimate the entire network.
Given these options, what is the best approach for a Validator? It depends on server setup and budget.
If a Validator is running on cloud virtual servers, like many did in Game of Stakes, they will not be able to plug in an HSM like the YubiHSM2 or Ledger.
If a Validator is running on physical servers in a restricted access data center, they could use the advanced application/logic of the Ledger Nano-S, but would be required to go visit the data center in case of emergency or server restarts.
The YubiHSM2 is the most “enterprise ready” device for hands off management, but lacks the advanced application/logic of the Ledger. That logic would need to be recreated somewhere else, such as in the KMS on the remote signer server, but this is not yet implemented in the open source solutions. Therefore you could argue that the consumer-focused Ledger is currently “safer” at defending against double signs and slashing than the enterprise-focused YubiHSM2.
Double sign prevention is planned for the open source KMS, and in the mean-time validator teams may choose a hybrid approach of using the KMS and implementing their own double sign prevention.
Licensing a proprietary signing solution may get advanced features, but since it is closed source it’s hard to reason about code quality and bugs. It could be better or worse than open source implementations in different ways. Licensees may get access to source code, but it may be difficult to reason about and test, depending on experience with software development.
Finally, there is a tradeoff between high availability and safety. Would you rather have a Validator miss signing a few blocks or risk getting slashed? In general, the design goal of high availability is in conflict with the design goal of safety against an accidental double sign. Extremely thorough engineering and testing is required to achieve a highly available system with a low risk of double sign.
As we’ve discussed in this post, there are various approaches to securing a Validator’s private key. As a Delegator it is crucial to understand your Validator’s strategy as it directly impacts your Atoms via slashing risk.
If you’d like to chat about our approach at Figment, please reach out: contact@figment.network
Stay tuned for further posts discussing the different dimensions to evaluate a Cosmos Validator.
Thanks to Matt Harrop and Will Little for help with this post.