1 of 7

Guide for validators

This document provides guidance on how configure and promote a cheqd node to validator status. Having a validator node is necessary to participate in staking rewards, block creation, and governance.

Preparation steps

Step 1: Ensure you have a cheqd node installed as a service

You must already have a running cheqd-node instance installed using one of the supported methods.

Please also ensure the node is fully caught up with the latest ledger updates.

(recommended method)

Step 2: Generate a new account key

Follow the guidance on to create a new account key.

When you create a new key, a new account address and mnemonic backup phrase will be printed. Keep the mnemonic phrase safe as this is the only way to restore access to the account if they keyring cannot be recovered.

P.S. in case of using Ledger Nano device it would be helpful to use

Get your node ID
Follow the guidance on to fetch your node ID.
Get your validator account address
The validator account address is generated in Step 1 above when a new key is added. To show the validator account address, follow the .
(The assumption above is that there is only one account / key that has been added on the node. In case you have multiple addresses, please jot down the preferred account address.)

Promote a node to validator after acquiring CHEQ tokens for staking

Ensure your account has a positive balance
Get your node's validator public key
Promote your node to validator status by staking your token balance
You can decide how many tokens you would like to stake from your account balance. For instance, you may want to leave a portion of the balance for paying transaction fees (now and in the future).
To promote to validation, submit a create-validator transaction to the network:
Parameters required in the transaction above are:
- amount: Amount of tokens to stake. You should stake at least 1 CHEQ (= 1,000,000,000ncheq) to successfully complete a staking transaction.
- from: Key alias of the node operator account that makes the initial stake
- min-self-delegation: Minimum amount of tokens that the node operator promises to keep bonded
- pubkey: Node's bech32-encoded validator public key from the previous step
- commission-rate: Validator's commission rate. The minimum is set to 0.05.
- commission-max-rate: Validator's maximum commission rate, expressed as a number with up to two decimal points. The value for this cannot be changed later.
- commission-max-change-rate: Maximum rate of change of a validator's commission rate per day, expressed as a number with up to two decimal points. The value for this cannot be changed later.
- chain-id: Unique identifier for the chain.
  - For cheqd's current mainnet, this is cheqd-mainnet-1
  - For cheqd's current testnet, this is cheqd-testnet-6
- gas: Maximum gas to use for this specific transaction. Using auto uses Cosmos's auto-calculation mechanism, but can also be specified manually as an integer value.
- gas-adjustment (optional): If you're using auto gas calculation, this parameter multiplies the auto-calculated amount by the specified factor, e.g., 1.4. This is recommended so that it leaves enough margin of error to add a bit more gas to the transaction and ensure it successfully goes through.
- gas-prices: Maximum gas price set by the validator. Default value is 50ncheq.

Please note the parameters below are just an “example”.

You will see the commission they set, the max rate they set, and the rate of change. Please use this as a guide when thinking of your own commission configurations. This is important to get right, because the commission-max-rate and commission-max-change-rate cannot be changed after they are initially set.

Check that your validator node is bonded
Checking that the validator is correctly bonded can be checked via any node:
Find your node by moniker and make sure that status is BOND_STATUS_BONDED.
Check that your validator node is signing blocks and taking part in consensus
Query the latest block. Open <node-address:rpc-port/block in a web browser. Make sure that there is a signature with your validator address in the signature list.

Using Ledger Nano device

To use your Ledger Nano you will need to complete the following steps:

Set-up your wallet by creating a PIN and passphrase, which must be stored securely to enable recovery if the device is lost or damaged.
Connect your device to your PC and update the firmware to the latest version using the Ledger Live application.
Install the Cosmos application using the software manager (Manager > Cosmos > Install).
Adding a new key In order to use the hardware wallet address with the cli, the user must first add it via cheqd-noded. This process only records the public information about the key.

To import the key first plug in the device and enter the device pin. Once you have unlocked the device navigate to the Cosmos app on the device and open it.

To add the key use the following command:

Note

The --ledger flag tells the command line tool to talk to the ledger device and the --index flag selects which HD index should be used.

When running this command, the Ledger device will prompt you to verify the genereated address. Once you have done this you will get an output in the following form:

Next steps

On completion of the steps above, you would have successfully bonded a node as validator to the cheqd testnet and participating in staking/consensus.

FAQs for validators

How do I stake more tokens after setting up a validator node?

When you set up your Validator node, it is recommended that you only stake a very small amount from the actual Validator node. This is to minimise the tokens that could be locked in an unbonding period, were your node to experience signficiant downtime.

You should delegate the rest of your tokens to your Validator node from a different key alias.

How do I do this?

You can add (as many as you want) additional keys you want using the function:

When you create a new key, a mnemonic phrase and account address will be printed. Keep the mnemonic phrase safe as this is the only way to restore access to the account if they keyring cannot be recovered.

You can view all created keys using the function:

You are able to transfer tokens between key accounts using the function.

You can then delegate to your Validator Node, using the function

We use a second/different Virtual Machine to create these new accounts/wallets. In this instane, you only need to install cheqd-noded as a binary, you don't need to run it as a full node.

And then since this VM is not running a node, you can then append the --node parameter to any request and target the RPC port of the VM running the actual node.

That way:

The second node doesn't need to sync the full blockchain; and
You can separate out the keys/wallets, since the IP address of your actual node will be public by definition and people can attack it or try to break in

How much storage should I provision?

I’d recommend at least 250 GB at the current chain size. You can choose to go higher, so that you don’t need to revisit this. Within our team, we set alerts on our cloud providers/Datadog to raise alerts when nodes reach 85-90% storage used which allows us to grow the disk storage as and when needed, as opposed to over-provisioning.

Is there any way to use less storage?

Here’s the relevant section in the file:

How do I monitor the status of my node?

Green: 90-100% blocks signed
Amber: 70-90% blocks signed
Red: 1-70% blocks signed

Please join the channel 'mainnet-alerts' on the cheqd community slack.

Are there any other ways to optimise?

Yes! Here are a few other suggestions:

You can check the current status of disk storage used on all mount points manually through the output of df -hT
The default storage path for cheqd-node is on /home/cheqd. By default, most hosting/cloud providers will set this up on a single disk volume under the / (root) path. If you move and mount /home on a separate disk volume, this will allow you to expand the storage independent of the main volume. This can sometimes make a difference, because if you leave /home tree mounted on / mount path, many cloud providers will force you to bump the whole virtual machine category - including the CPU and RAM - to a more expensive tier in order to get additional disk storage on /. This can also result in over-provisioning since the additional CPU/RAM is likely not required.
You can also optimise the amount of logs stored, in case the logs are taking up too much space. There’s a few techniques here:
In config.toml you can set the logging level to error for less logging than the default which is info. (The other possible value for this is debug)
[Set the log rotation configuration to use different/custom parameters such as what file-size to rotate at, number of days to retain etc.

What is Commission rate and is it important?

As a Validator Node, you should be familiar with the concept of commission. This is the percentage of tokens that you take as a fee for running the infrastructure on the network. Token holders are able to delegate tokens to you, with an understanding that they can earn staking rewards, but as consideration, you are also able to earn a flat percentage fee of the rewards on the delegated stake they supply.

There are three commission values you should be familiar with:

The first is the maximum rate of commission that you will be able to move upwards to.

Please note that this value cannot be changed once your Validator Node is set up, so be careful and do your research.

The second parameter is the maximum amount of commission you will be able to increase by within a 24 hour period. For example if you set this as 0.01, you will be able to increase your commission by 1% a day.

The third value is your current commission rate.

Points to note: lower commission rate = higher likelihood of more token holders delegating tokens to you because they will earn more rewards. However, with a very low commission rate, in the future, you might find that the gas fees on the Network outweight the rewards made through commission.

higher commission rate = you earn more tokens from the existing stake + delegated tokens. But the tradeoff being that it may appear less desirable for new delegators when compared to other Validators.

What is Gas and Gas Prices?

When setting up the Validator, the Gas parameter is the amount of tokens you are willing to spend on gas.

For simplicity, we suggest setting:

AND setting:

These parameters, together, will make it highly likely that the transaction will go through and not fail. Having the gas set at auto, without the gas adjustment will endanger the transaction of failing, if the gas prices increase.

Gas prices also come into play here too, the lower your gas price, the more likely that your node will be considered in the active set for rewards.

We suggest the set:

should fall within this recommended range:

Low: 25ncheq
Medium: 50ncheq
High: 100ncheq

How do I change my public name and description

Your public name, is also known as your moniker.

You are able to change this, as well as the description of your node using the function:

Should I set my firewall port 26656 open to the world?

Yes, this is how you should do it. Since it's a public permissionless network, there's no way of pre-determining what the set of IP addresses will be, as entities may leave and join the network. We suggest using a TCP/network load balancer and keeping your VM/node in a private subnet though for security reasons. The LB then becomes your network edge which if you're hosting on a cloud provider they manage/patch/run.

Optimising disk storage with pruning

Context

Cosmos SDK and Tendermint has a concept of pruning, which allows reducing the disk utilisation and .

There are two kinds of pruning controls available on a node:

Tendermint pruning: This impacts the ~/.cheqdnode/data/blockstore.db/ folder by only retaining the last n specified blocks. Controlled by the min-retain-blocks parameter in ~/.cheqdnode/config/app.toml.
Cosmos SDK pruning: This impacts the ~/.cheqdnode/data/application.db/ folder and prunes Cosmos SDK app-level state (a logical layer higher than Tendermint, which is just peer-to-peer). These are set by the pruning parameters in the ~/.cheqdnode/config/app.toml file.

This can be done by modifying the pruning parameters inside /home/cheqd/.cheqdnode/config/app.toml file.

Instructions

⚠️ In order for either type of pruning to work, your node should be running the (at least v1.3.0+).

You can check which version of cheqd-noded you're running using:

The output should be a version higher than v1.3.0. If you're on a lower version, you either manually upgrade the node binary or while retaining settings.

The instructions below assume that the home directory for the cheqd user is set to the default value of /home/cheqd. If this is not the case for your node, please modify the commands below to the correct path.

Check systemd service status

Follow to check systemd service status:

(Substitute with cheqd-noded.service if you're running a standlone node rather than with Cosmovisor.)

Switch to the `cheqd` user and configuration directory

Switch to the cheqd user and then the .cheqdnode/config/ directory.

Display current directory usage/size for the node data folder

Before you make changes to pruning configuration, you might want to capture the existing usage first (only copy the command bit, not the full line):

The du -h -d 1 ... command above prints the disk usage for the specified folder down to one folder level depth (-d 1 parameter) and prints the output in GB/MB (-h parameter, which prints in human-readable values).

Open the `app.toml` file for editing

Open the app.toml file once you've switched to the ~/.cheqdnode/config/ folder using your preferred text editor, such as nano:

Choose a Cosmos SDK pruning strategy

⚠️ If your node was configured to work with release version v1.2.2 or earlier, you may have been advised to run in pruning="nothing" mode due to a bug in Cosmos SDK.

The file should already be populated with values. Edit the pruning parameter value to one of following:

pruning="nothing" (highest disk usage): This will disable Cosmos SDK pruning and set your node to behave like an "archive" node. This mode consumes the highest disk usage.
pruning="default" (recommended, moderate disk usage): This keeps the last 100 states in addition to every 500th state, and prunes on 10-block intervals. This configuration is safe to use on all types of nodes, especially validator nodes.
pruning="everything" (lowest disk usage): This mode is not recommended when running validator nodes. This will keep the current state and also prune on 10 blocks intervals. This settings is useful for nodes such as seed/sentry nodes, as long as they are not used to query RPC/REST API requests.
pruning="custom" (custom disk usage): If you set the pruning parameter to custom, you will have to modify two additional parameters:
- pruning-keep-recent: This will define how many recent states are kept, e.g., 250 (contrast this against default).
- pruning-interval: This will define how often state pruning happens, e.g., 50 (contrast against default, which does it every 10 blocks)
- pruning-keep-every: This parameter is deprecated in newer versions of Cosmos SDK. You can delete this line if it's present in your app.toml file.

Although the paramters named pruning-* are only supposed to take effect if the pruning strategy is custom, in practice it seems that in Cosmos SDK v0.46.10 these settings still impact pruning. Therefore, you're advised to comment out these lines when using default pruning.

Example configuration file with recommended settings:

Choose a Tendermint pruning strategy

Configuring min-retain-blocks parameter to a non-zero value activates Tendermint pruning, which specifies minimum block height to retain. By default, this parameter is set to 0, which disables this feature.

Enabling this feature can reduce disk usage significantly. Be careful in setting a value, as it must be at least higher than 250,000 as calculated below:

...divided by average block time = approx. 6s / block
= approx. 210,000 blocks
Adding a safety margin (in case average block time goes down) = approx. 250,000 blocks

Therefore, this setting must always be updated to carefully match a valid value in case the unbonding time on the network you're running on is different. (E.g., this value is different on mainnet vs testnet due to different unbonding period.)

Using the recommended values, on the current cheqd mainnet this section would look like the following:

Save changes in the file

Save and exit from the app.toml file. Working with text editors is outside the scope of this document, but in general under nano this would be Ctrl+X, "yes" to Save modified buffer, then Enter.

Switch user and restart service

ℹ️ NOTE: You need root or at least a user with super-user privileges using the sudo prefix to the commands below when interact with systemd.

If you switched to the cheqd user, exit out to a root/super-user:

Usually, this will switch you back to root or other super-user (e.g., ubuntu).

Restart systemd service:

(Substitute with cheqd-noded.service above if you're running without Cosmovisor)

Check the systemd service status and confirm that it's running:

Next steps

If you activate/modify any pruning configuration above, the changes to disk usage are NOT immediate. Typically, it may take 1-2 days over which the disk usage reduction is progressively applied.

If you've gone from a higher disk usage setting to a lower disk usage setting, re-run the disk usage command to comapre the breakdown of disk usage in the node data directory:

The output shown should show a difference in disk usage from the previous run before settings were changed for the application.db folder (if the pruning parameters were changed) and/or the blockstore.db folder (if min-retain-blocks) was changed.

Troubleshooting consistently high CPU/memory loads

Context

Blockchain applications (especially when running validator nodes) are a-typical from "traditional" web server applications because their performance characteristics tend to be different in the way specified below:

Tend to be more disk I/O heavy: Traditional web apps will typically offload data storage to persistent stores such as a database. In case of a blockchain/validator node, the database is on the machine itself, rather than offloaded to separate machine with a standalone engine. Many blockchains use LevelDB for their local data copies. (In Cosmos SDK apps, such as cheqd, this is the Golang implementation of LevelDB, but can also be C-implementation of LevelDB, RocksDB, etc.) The net result is the same as if you were trying to run a database engine on a machine: the system needs to have fast read/write performance characteristics.
Validator nodes cannot easily be auto-scaled: Many traditional applications can be horizontally (i.e., add more machines) or vertically (i.e., make current machine beefier) scaled. While this is possible for validator nodes, it must be done with extreme caution to ensure there aren't two instances of the same validator active simultaneously. This can be perceived by network consensus as a sign of compromised validator keys and lead to the node being jailed for double-signing blocks. These concerns are less relevant for non-validating nodes, since they have a greater tolerance for missed blocks and can be scaled horizontally/vertically.
Docker/Kubernetes setups are not recommended for validators (unless you really know what you're doing): Primarily due to the double-signing risk, it's (../setup-and-configure/docker.md) unless you have a strong DevOps practice. The other reason is related to the first point, i.e., a Docker setup adds an abstraction layer between the actual underlying file-storage vs the Docker volume engine. Depending on the Docker (or similar abstraction) storage drivers used, you may need to tune the storage/volume engine options for optimal performance.

Diagnosing a CPU/memory leak

⚠️ Please ensure you are running the latest stable release of cheqd-node since they may contain fixes/patches that improve node performance.

What does a CPU/memory leak look like?

If you've got monitoring built in for your machine, a memory (RAM) leak would look like a graph where memory usage grows to 100%, falls off a cliff, grows to 100% again (the process repeats itself).

Normal memory usage may grow over time, but will not max out the available memory up to 100%. The graph below is taken from a server run by the cheqd team, over a 14-day period:

Figure 1: Graph showing normal memory usage on a cheqd-node server

What does a CPU leak look like?

A "CPU leak", i.e., where one or more process(es) consume increasing amounts of CPU is rarer, but could also happen if your machine has too few vCPUs and/or underpowered CPUs.

Figure 2: Graph showing normal CPU usage on a cheqd-node server

There's a catch here: depending on your monitoring tool, "100% CPU" could be measured differently! The graph above is from DigitalOcean's monitoring tools, which counts the sum of all CPU capacity as "100%".

Other monitoring tools, such as Hetzner Cloud's, count each CPU as "100%", thus making the overall figure displayed in the graph (shown below) add up to number of CPUs x 100%.

Figure 4: Graph showing CPU usage on Hetzner cloud, adding up to more than 100%

Check what accounting metric your monitoring tool uses to get a realistic idea of whether your CPU is overloaded or not.

Load average is another useful measure of the responsiveness of a machine, regardless of the CPU usage.

Determining CPU/memory usage with command-line tools

If you don't have a monitoring application installed, you could use the built-in top or htop command.

Figure 2: Output of htop showing CPU and memory usage

htop is visually easier to understand than top since it breaks down usage per-CPU, as well as memory usage.

Unfortunately, this only provides the real-time usage, rather than historical usage over time. Historical usage typically requires an external application, which many cloud providers provide, or through 3rd party monitoring tools such as Datadog, etc.

Tendermint / Cosmos SDK also provides a Prometheus metrics interface, in case you already have a Prometheus instance you can use or comfortable with using the software. This can allow alerting based on actual metrics emitted by the node, rather than just top-level system metrics which are a blunt instrument / don't go into detail.

Troubleshooting system clock synchronisation issues

If your system clock is out of synchronisation, this could cause Tendermint peer-to-peer connections to be rejected. This is similar to how SSL/TLS connections can get rejected with a "handshake error" in a normal browser when accessing secure (HTTPS) sites.

The net result of your system clock being out of sync is that your node:

Constantly tries to dial peers to try and fetch new blocks
Connection gets rejected by some/all of them
Keeps retrying the above until CPU/memory get exhausted, or the node process crashes

To check if your system clock is synchronised, use the following command (note: only copy the command, not the sample output):

root@hostname ~# timedatectl status
   Local time: Wed 2023-03-29 20:31:56 CEST
   Universal time: Wed 2023-03-29 18:31:56 UTC 
   RTC time: Wed 2023-03-29 18:31:57     
   Time zone: Europe/Berlin (CEST, +0200) 
   System clock synchronized: yes                         
   NTP service: active                      
   RTC in local TZ: no

The timezone your machine is based in doesn't matter. You should check whether it reports System clock synchronized: yes and NTP service: active.

Resolving system clock issues

If either of these are not true, chances are that your system clock has fallen out of sync, and may be the root cause of CPU/memory leaks. Follow this guide on setting time synchronisation in Ubuntu to resolve the issue, and then monitor whether it fixes high utilisation.

NTP firewall rules

You may also need to allow outbound UDP traffic on port 123 explicitly, depending on your firewall settings. This port is used by the Network Time Protocol (NTP) service.

Troubleshooting node connectivity issues

Properly-configured nodes should have bidirectional connectivity for network traffic. To check whether this is the case, open <node-ip-address-or-dns-name:rpc-port>/net_info in your browser, for example, rpc.cheqd.net/net_info.

Accessing this endpoint via your browser would only work if traffic to your RPC port is allowed through your firewall and/or you're accessing from an allowed origin. If this is not the case, you can also view the results for this endpoint from the same machine where your node service is running through the command line:

curl -v http://localhost:26657/net_info

The JSON output should be similar to below:

{"jsonrpc":"2.0","id":-1,"result":{"listening":true,"listeners":["Listener(@sentry1.ap.cheqd.net:26656)"],"n_peers":"47","peers":[{"node_info":{"protocol_version":{"p2p":"8","block":"11","app":"0"},"id":"c7b1c178adaf364917caaac67687051d1ed5bf53","listen_addr":"78.46.83.78:26656","network":"cheqd-mainnet-1","version":"0.34.24","channels":"40202122233038606100","moniker":"cstp-cheqd","other":{"tx_index":"on","rpc_address":"tcp://0.0.0.0:26657"}},"is_outbound":true,}]}}

Look for the n_peers value at the beginning: this shows the number of peers your node is connected. A healthy node would typically be connected to anywhere between 5-50 nodes.

Next, search the results for the term is_outbound. The number of matches for this term should exactly be the same as the value of n_peers, since this is printed once per peer. The value of is_oubound may either be true or false.

A healthy node should have a mix of is_outbound: true as well as is_outbound: false. If your node reports only one of these values, it's a strong indication that your node is unidirectionally connected/reachable, rather than bidirectionally reachable.

Unidirectional connectivity may cause your node to work overtime to stay synchronised with latest blocks on the network. You may fly by just fine - until there's a loss of connectivity to critical mass of peers and then your node goes offline.

Furthermore, your node might fetch the address book from seed nodes, and then try to resolve/contact them (and fail) due to connectivity issues.

Is your node's external address reachable?

Ideally, the IP address or DNS name set in external_address property in your config.toml file should be externally reachable.

To determine whether this is true, from a machine other than your node, install tcptraceroute. Unlike ping which uses ICMP packets, tcptraceroute uses TCP, i.e., the actual protocol used for Tendermint P2P to see if the destination is reachable. Success or failure in connectivity using ping doesn't prove whether your node is reachable, since firewalls along the path may have different rules for ICMP vs TCP.

Once you have tcptraceroute installed, from this external machine you can execute the following command in tcptraceroute <hostname> <port> format (note: only copy the actual command, not sample output):

user@hostname ~> sudo tcptraceroute seed1.eu.cheqd.net 26656
Selected device en0, address 192.168.4.42, port 53088 for outgoing packets
Tracing the path to seed1.eu.cheqd.net (116.202.176.48) on TCP port 26656, 30 hops max
1  192.168.4.1  3.049 ms  2.186 ms  5.693 ms
2  * * *
3  hari-core-2a-xe-806-0.network.virginmedia.net (94.173.50.205)  27.455 ms  16.619 ms  23.925 ms
4  * hari-core-2b-ae1-0.network.virginmedia.net (81.96.16.210) 33.225 ms  25.725 ms
5  * * *
6  * * *
7  tele-ic-7-ae2-0.network.virginmedia.net (62.253.175.34)  34.680 ms  19.670 ms  17.274 ms
8  ae15-0.lon10.core-backbone.com (80.255.14.105)  19.708 ms  26.629 ms  21.323 ms
9  ae6-2011.nbg40.core-backbone.com (80.255.14.246)  33.451 ms  30.159 ms  31.193 ms
10  core-backbone.hetzner.com (81.95.15.6)  33.430 ms  33.701 ms  31.949 ms
11  core11.nbg1.hetzner.com (213.239.229.161)  33.887 ms  34.907 ms  34.535 ms
12  spine11.cloud1.nbg1.hetzner.com (213.133.112.66)  66.511 ms  36.853 ms  32.539 ms
13  spine4.cloud1.nbg1.hetzner.com (213.133.108.150)  37.238 ms  43.259 ms  28.669 ms
14  * * *
15  15629.your-cloud.host (49.12.139.7)  27.337 ms  46.956 ms  33.213 ms
16  static.48.176.202.116.clients.your-server.de (116.202.176.48) [open]  39.811 ms  34.168 ms  1019.051 ms

A successful run would result in tcptraceroute reaching the destination server on the required port (e.g., 26656) and then hanging up. If the connection times out consistently at any of the hops, this could indicate there's a firewall / router in the path dropping or blocking connections.

Resolving connectivity issues due to blocked firewall ports

Your firewall rules on the machine and/or infrastructure (cloud) provider could cause connectivity issues. Ideally, your firewall rules should allow:

Inbound TCP traffic on at least port 26656 (or custom P2P port)
Optionally, inbound TCP traffic on other ports (RPC, gRPC, gRPC Web)
Outbound TCP traffic on all ports

Router vs firewall issues

Besides firewalls, depending on your network infrastructure, your connectivity issue instead might lie in a router or Network Address Translation (NAT) gateway.

Outbound TCP traffic is the default mode on many systems, since the port through which traffic gets routed out is dynamically determined during TCP connection establishment. In some cases, e.g., when using a NAT gateway in AWS, you may require more complex configuration (outside the scope of this document).

Operating system firewalls

In addition to infrastructure-level firewalls, Ubuntu machines also come with firewall on the machine itself. Typically, this is either disabled or set to allow all traffic by default.

Configuring OS-level firewalls is outside the scope of this document, but can generally be checked/configured using the ufw utility:

root@hostname ~# ufw status
Status: inactive

If ufw status reports active, follow this guide on configuring firewall rules using ufw to allow traffic on the required ports (customise the ports to the required ports).

Connectivity issues due to blocked DNS traffic

Another common reason for unidirectional node connectivity occurs when the correct P2P inbound/outbound traffic is allowed in firewalls, but DNS traffic is blocked by a firewall.

Your node needs the ability to lookup DNS queries to resolve nodes with DNS names as their external_address property to IP addresses, since other peers may advertise their addresses as a DNS name. Seed nodes set in config.toml are a common example of this, since these are advertised as DNS names.

Your node may still scrape by if DNS resolution is blocked, for example, by obtaining an address book from a peer that has already done DNS -> IP resolution. However, this approach can be liable to break down if the resolution is incorrect or entries outdated.

Firewall rules to allow DNS traffic

To enable DNS lookups, your infrastructure/OS-level firewalls should allow:

Outbound UDP traffic on port 53: This is the most commonly-used port/protocol.
Outbound TCP traffic on port 853 (explicit rule not needed if you already allow TCP outbound on all ports): Modern DNS servers also allow DNS-over-TLS, which secures the connection using TLS to the DNS server. This can prevent malicious DNS servers from intercepting queries and giving spurious responses.
Outbound TCP traffic on port 443 (explicit rule not needed if you already allow TCP outbound on all ports): Similar to above, this enables DNS-over-HTTPS, if supported by your DNS resolver.

Checking whether DNS resolution works

To check DNS resolution work, try to run a DNS query and see if it returns a response. The following command will use the dig utility to look up and report your node's externally resolvable IP address via Cloudflare's 1.1.1.1 DNS resolver (note: only copy the command, not the sample output):

root@hostname ~# dig +short txt ch whoami.cloudflare @1.1.1.1
"157.90.124.113"

If the lookup fails, that could indicate DNS queries or blocked, or there are no externally-resolvable IPs where the node can be reached.

Other troubleshooting steps

Is your machine underpowered?

If your machine is provisioned with the bare minimum of CPU and RAM, you might find that the node struggles during times of high load, or slowly degrades over time. The minimum figures are recommended for a developer setup, rather than a production-grade node.

Typically, this problem is seen if you (non-exhaustive list):

Have only one CPU (bump to at least two CPU)
Only 1-2 GB of RAM (bump to at least 4 GB)

Most cloud providers should allow dynamically scaling these two factors without downtime. Monitor - especially over a period of days/weeks - whether this improves the situation or not. If the CPU/memory load behaviour remains similar, that likely indicates the issue is different.

Scaling CPU/memory without downtime may be different you're running a physical machine, or if your cloud provider doesn't support it. Please follow the guidance of those hosting platforms.

Unjailing a jailed validator

Validator nodes can get "jailed" along with a penalty imposed (through its stake getting slashed). Unlike a proof-of-work (PoW) network (such as Ethereum or Bitcoin), proof-of-stake (PoS) networks (such as the cheqd network, built using ) use from validators.

Conditions that cause a validator to be jailed

There are two scenarios in which a validator could be jailed, one of which has more serious consequences than the other.

Temporary: Jailed due to downtime

When a validator "misses" blocks or doesn't participate in consensus, it can get temporarily jailed. By enforcing this check, PoS networks like ours ensure that validators are actively participating in the operation of the network, ensuring that their nodes remain secure and up-to-date with the latest software releases, etc.

The duration on how this is calculated is defined in the . Jailing occurs based on a sliding time window (called the ) calculated as follows.

The signed_blocks_window (set to 25,920 blocks on mainnet) defines the time window that is used to calculate downtime.
Within this window of 25,920 blocks, at least 50% of the blocks must be signed by a validator. This is defined in the genesis parameter min_signed_per_window (set to 0.5 for mainnet).
Therefore, if a validator misses 12,960 blocks within the last 25,920 blocks it meets the criteria for getting jailed.
1. To convert this block window to a time period, consider the block time of the network, i.e., at what frequency a new block is created. The or any other explorer configured for cheqd network (such as ).
2. Let's assume the block time was 6 seconds. This equates to 12,960 * 6 = 77,760 seconds = ~21.6 hours. This means if the validator is not participating in consensus for more than ~21.6 hours (in this example), it will get temporarily jailed.
3. Since the block time of the network is variable on the number of nodes participating, network congestion, etc it's always important to calculate the time period on latest block time figures.

What happens when a validator is temporarily jailed for downtime

1% of all of the stake delegated to the node is slashed, i.e., burned and disappears forever. This includes any stake delegated to the node by external parties. (If a validator gets jailed, delegators may decide to switch whom they delegate to.) The percentage of stake to be slashed is defined in the slash_fraction_downtime genesis parameter.

Step 1: Check your Node is up to date

During the downtime of a Validator Node, it is common for the Node to miss important software upgrades, since they are no longer in the active set of nodes on the main ledger.

Therefore, the first step is checking that your node is up to date. You can execute the command

The expected response will be the latest cheqd-noded software release. At the time of writing, the expected response would be

Step 2: Upgrading to latest software

Step 3: Confirming the Node is up to date

Once again, check if your node is up to date, following Step 1.

Expected response: In the output, look for the text latest_block_height and note the value. Execute the status command above a few times and make sure the value of latest_block_height has increased each time.

The node is fully caught up when the parameter catching_up returns the output false.

Additionally,, you can check this has worked:

It shows you a page and field "version": "0.6.0".

Step 4: Unjailing command

If everything is up to date, and the node has fully caught, you can now unjail your node using this command in the cheqd CLI:

Move validator to a different machine

This document offers guidance for validators looking to move thier node instance to another one, for example in case of changing VPS provider or something like this.

The main tool required for this is cheqd's .

Preparations

Before completing the move, ensure the following checks are completed:

1. Copy `config` directory and `data/priv_validator_state.json` to safe place

Check that your config directory and data/priv_validator_state.json are copied to a safe place where they will cannot affected by the migration

2. Stop the service on your current node

If you are using cosmosvisor, use systemctl stop cheqd-cosmovisor

For all other cases, use systemctl stop cheqd-noded.

3. Confirm that your previous node / service was stopped

This step is of the utmost important

If your node is not stopped correctly and two nodes are running with the same private keys, this will lead to a double signing infraction which results in your node being permemently jailed (tombstoned) resulting in a 5% slack of staked tokens.

You will also be required to complete a fresh setup of your node.

Installation

Only after you have completed the preparation steps to shut down the previous node, the installation should begin.

Once this has been completed, you will be able to move your existing keys back and settings.

Installation with the latest snapshot

The answers for installer quiestion could be:

1. Select Version

Here you can pick up the version what you want.

2. Select Home directory

Set path for cheqd user's home directory [default: /home/cheqd]:.

This is essentialy a question about where the home directory, cheqdnode, is located or will be.

It is up to operator where they want to store data, config and log directories.

3. Setup node

Do you want to setup a new cheqd-node? (yes/no) [default: yes]:

Here the expected answer is No.

The main idea is that our old config directory will be used and data will be restored from the snapshot.

We don't need to setup the new one.

4. Select Network

Select cheqd network to join (testnet/mainnet) [default: mainnet]:

For now, we have 2 networks, testnet and mainnet.

Type whichever chain you want to use or just keep the default by clicking Enter.

5. Specify Cosmovisor option

Install cheqd-noded using Cosmovisor? (yes/no) [default: yes]:.

This is also up to the operator.

6. Specify if you are using a snapshot

CAUTION: Downloading a snapshot replaces your existing copy of chain data. Usually safe to use this option when doing a fresh installation. Do you want to download a snapshot of the existing chain to speed up node synchronisation? (yes/no) [default: yes].

On this question we recommend to answer Yes, cause it will help you to catchup with other nodes in the network. That is the main feature from this installer.

Example

Post-install steps

1. Copy your settings

Copy config directory to the CHEQD_HOME_DIRECTORY/.cheqdnode/
Copy data/priv_validator_state.json to the CHEQD_HOME_DIRECTORY/.cheqdnode/data
Make sure that permissions are cheqd:cheqd for CHEQD_HOME_DIRECTORY/.cheqdnode directory. For setting it the next command can help $ sudo chown -R cheqd:cheqd CHEQD_HOME_DIRECTORY/.cheqdnode

Where CHEQD_HOME_DIRECTORY is the home directory for cheqd user. By default it's /home/cheqd or what you answered during the installation for the second question.

2. Setup external address

You need to specify here new external address by calling the next command under the cheqd user:

3. Check that service works

The latest thing in this doc is to run the service and check that all works fine.

where <service-name> is a name of service depending was Install Cosmovisor selected or not.

cheqd-cosmovisor if Cosmovisor was installed.
cheqd-noded in case of keeping cheqd-noded as was with debian package approach.

For checking that service works, please run the next command:

where <service-name> has the same meaning as above.

The status should be Active(running)

Backup and restore node keys with Hashicorp Vault

If you're running a validator node, it's important to backup your validator's keys and state - especially before attempting any updates or shifting nodes.

What to backup from a validator node

Each validator node has three files/secrets that must be backed up, in case you want to restore or move a node. Anything not in scope listed below can be easily restored from snapshot or otherwise replaced with fresh copies, and as such this list is the bare minimum that needs to be backed up.

$CHEQD_HOME is the data directory for your node, which defaults to /home/cheqd/.cheqdnode

Validator private key

The validator private key is one of the most important secrets that uniquely identifies your validator, and what this node uses to sign blocks, participate in consensus etc. This file is stored under $CHEQD_HOME/config/priv_validator_key.json.

Node key

In the same folder as your validator private key, there's another key called $CHEQD_HOME/config/node_key.json. This key is used to derive the node ID for your validator.

Backing up this key means if you move or restore your node, you don't have to change the node ID in the configuration files any peers have. This is only relevant (usually) if you're running multiple nodes, e.g., a sentry or seed node.

For most node operators who run a singular validator node, this node key is NOT important and can be refreshed/created as new. It is only used for Tendermint peer-to-peer communication. Hypothetically, if you created a new node key (say when moving a node from one machine to another), and then restored the priv_validator_key.json, this is absolutely fine.

Validator private state

The validator private state is stored in the data folder, not the config folder where most other configuration files are kept - and therefore often gets missed by validator operators during backup. This file is stored at $CHEQD_HOME/data/priv_validator_state.json.

This file stores the last block height signed by the validator and is updated every time a new block is created. Therefore, this should only be backed up after stopping the node service, otherwise, the data stored within this file will be in an inconsistent state. An example validator state file is shown below:

If you forget to restore to the validator state file when restoring a node, or when restoring a node from snapshot, your validator will double-sign blocks it has already signed, and get jailed permanently ("tombstoned") with no way to re-establish the validator.

Upgrade height information

The software upgrades and block height they were applied at is stored in $CHEQD_HOME/data/upgrade-info.json. This file is used by Cosmovisor to track automated updates, and informs it whether it should attempt an upgrade/migration or not.

Manual backup and restore

The simplest way to backup your validator secrets listed above is to display them in your terminal:

You can copy the contents of the file displayed in terminal off the server and store it in a secure location.

To restore the files, open the equivalent file on the machine where you want to restore the files to using a text editor (e.g., nano) and paste in the contents:

Backing up and restoring via HashiCorp Vault

Pre-requisites

You also need a running HashiCorp Vault server cluster you can use to proceed with this guide.

Setting up Vault CLI environment

Once you have Vault CLI set up on the validator, you need to set up environment variables in your terminal to configure which Vault server the secrets need to be backed up to.

Add the following variables to your terminal environment. Depending on which terminal you use (e.g., bash, shell, zsh, fish etc), you may need to modify the statements accordingly. You'll also need to modify the values according to your validator and Vault server configuration.

Backup procedure for HashiCorp Vault

Configure Vault backup script

Make the script executable:

We recommend that you open the script using an editor such as nano and confirm that you're happy with the environment variables and settings in it.

Stop the cheqd service

Before backing up your secrets, it's important to stop the cheqd node service or Cosmovisor service; otherwise, the validator private state will be left in an inconsistent state and result in an incorrect backup.

If you're running via Cosmovisor (the default option), this can be stopped using:

Or, if running as a standalone service:

Run the Vault backup script

Once you've confirmed the cheqd service is stopped, execute the Vault backup script:

We use HashiCorp Vault KV v2 secrets engine. Please make sure that it's enabled and mounted under cheqd path.

Restoring from HashiCorp Vault

To restore backed-up secrets from a Vault server, you can use the same script using the -r ("restore") flag:

Restoring secrets to a different machine

If you're restoring to a different machine than the original machine the backup was done from, you'll need to go through the pre-requisites, CLI setup step, and download the Vault backup script to the new machine as well.

In this scenario, you're also also recommended to disable the service (e.g., cheqd-cosmovisor) on the original machine. This ensures that if the (original) machine gets restarted, systemd does not try and start the node service as this can potentially result in two validators running with the same validator keys (which will result in tombstoning).

Once you've successfully restored, you can enable the service (e.g., cheqd-cosmovisor) on the new machine:

Troubleshooting consistently high CPU/memory loads

Context

Tend to be more disk I/O heavy: Traditional web apps will typically offload data storage to persistent stores such as a database. In case of a blockchain/validator node, the database is on the machine itself, rather than offloaded to separate machine with a standalone engine. Many blockchains use LevelDB for their local data copies. (In Cosmos SDK apps, such as cheqd, this is the Golang implementation of LevelDB, but can also be C-implementation of LevelDB, RocksDB, etc.) The net result is the same as if you were trying to run a database engine on a machine: the system needs to have fast read/write performance characteristics.
Validator nodes cannot easily be auto-scaled: Many traditional applications can be horizontally (i.e., add more machines) or vertically (i.e., make current machine beefier) scaled. While this is possible for validator nodes, it must be done with extreme caution to ensure there aren't two instances of the same validator active simultaneously. This can be perceived by network consensus as a sign of compromised validator keys and lead to the node being jailed for double-signing blocks. These concerns are less relevant for non-validating nodes, since they have a greater tolerance for missed blocks and can be scaled horizontally/vertically.
Docker/Kubernetes setups are not recommended for validators (unless you really know what you're doing): Primarily due to the double-signing risk, it's (../setup-and-configure/docker.md) unless you have a strong DevOps practice. The other reason is related to the first point, i.e., a Docker setup adds an abstraction layer between the actual underlying file-storage vs the Docker volume engine. Depending on the Docker (or similar abstraction) storage drivers used, you may need to tune the storage/volume engine options for optimal performance.

Diagnosing a CPU/memory leak

⚠️ Please ensure you are running the latest stable release of cheqd-node since they may contain fixes/patches that improve node performance.

What does a CPU/memory leak look like?

If you've got monitoring built in for your machine, a memory (RAM) leak would look like a graph where memory usage grows to 100%, falls off a cliff, grows to 100% again (the process repeats itself).

Normal memory usage may grow over time, but will not max out the available memory up to 100%. The graph below is taken from a server run by the cheqd team, over a 14-day period:

Figure 1: Graph showing normal memory usage on a cheqd-node server

What does a CPU leak look like?

A "CPU leak", i.e., where one or more process(es) consume increasing amounts of CPU is rarer, but could also happen if your machine has too few vCPUs and/or underpowered CPUs.

Figure 2: Graph showing normal CPU usage on a cheqd-node server

Other monitoring tools, such as Hetzner Cloud's, count each CPU as "100%", thus making the overall figure displayed in the graph (shown below) add up to number of CPUs x 100%.

Figure 4: Graph showing CPU usage on Hetzner cloud, adding up to more than 100%

Check what accounting metric your monitoring tool uses to get a realistic idea of whether your CPU is overloaded or not.

Load average is another useful measure of the responsiveness of a machine, regardless of the CPU usage.

Determining CPU/memory usage with command-line tools

If you don't have a monitoring application installed, you could use the built-in top or htop command.

Figure 2: Output of htop showing CPU and memory usage

htop is visually easier to understand than top since it breaks down usage per-CPU, as well as memory usage.

Troubleshooting system clock synchronisation issues

The net result of your system clock being out of sync is that your node:

Constantly tries to dial peers to try and fetch new blocks
Connection gets rejected by some/all of them
Keeps retrying the above until CPU/memory get exhausted, or the node process crashes

To check if your system clock is synchronised, use the following command (note: only copy the command, not the sample output):

root@hostname ~# timedatectl status
   Local time: Wed 2023-03-29 20:31:56 CEST
   Universal time: Wed 2023-03-29 18:31:56 UTC 
   RTC time: Wed 2023-03-29 18:31:57     
   Time zone: Europe/Berlin (CEST, +0200) 
   System clock synchronized: yes                         
   NTP service: active                      
   RTC in local TZ: no

The timezone your machine is based in doesn't matter. You should check whether it reports System clock synchronized: yes and NTP service: active.

Resolving system clock issues

NTP firewall rules

You may also need to allow outbound UDP traffic on port 123 explicitly, depending on your firewall settings. This port is used by the Network Time Protocol (NTP) service.

Troubleshooting node connectivity issues

curl -v http://localhost:26657/net_info

The JSON output should be similar to below:

{"jsonrpc":"2.0","id":-1,"result":{"listening":true,"listeners":["Listener(@sentry1.ap.cheqd.net:26656)"],"n_peers":"47","peers":[{"node_info":{"protocol_version":{"p2p":"8","block":"11","app":"0"},"id":"c7b1c178adaf364917caaac67687051d1ed5bf53","listen_addr":"78.46.83.78:26656","network":"cheqd-mainnet-1","version":"0.34.24","channels":"40202122233038606100","moniker":"cstp-cheqd","other":{"tx_index":"on","rpc_address":"tcp://0.0.0.0:26657"}},"is_outbound":true,}]}}

Look for the n_peers value at the beginning: this shows the number of peers your node is connected. A healthy node would typically be connected to anywhere between 5-50 nodes.

Furthermore, your node might fetch the address book from seed nodes, and then try to resolve/contact them (and fail) due to connectivity issues.

Is your node's external address reachable?

Ideally, the IP address or DNS name set in external_address property in your config.toml file should be externally reachable.

user@hostname ~> sudo tcptraceroute seed1.eu.cheqd.net 26656
Selected device en0, address 192.168.4.42, port 53088 for outgoing packets
Tracing the path to seed1.eu.cheqd.net (116.202.176.48) on TCP port 26656, 30 hops max
1  192.168.4.1  3.049 ms  2.186 ms  5.693 ms
2  * * *
3  hari-core-2a-xe-806-0.network.virginmedia.net (94.173.50.205)  27.455 ms  16.619 ms  23.925 ms
4  * hari-core-2b-ae1-0.network.virginmedia.net (81.96.16.210) 33.225 ms  25.725 ms
5  * * *
6  * * *
7  tele-ic-7-ae2-0.network.virginmedia.net (62.253.175.34)  34.680 ms  19.670 ms  17.274 ms
8  ae15-0.lon10.core-backbone.com (80.255.14.105)  19.708 ms  26.629 ms  21.323 ms
9  ae6-2011.nbg40.core-backbone.com (80.255.14.246)  33.451 ms  30.159 ms  31.193 ms
10  core-backbone.hetzner.com (81.95.15.6)  33.430 ms  33.701 ms  31.949 ms
11  core11.nbg1.hetzner.com (213.239.229.161)  33.887 ms  34.907 ms  34.535 ms
12  spine11.cloud1.nbg1.hetzner.com (213.133.112.66)  66.511 ms  36.853 ms  32.539 ms
13  spine4.cloud1.nbg1.hetzner.com (213.133.108.150)  37.238 ms  43.259 ms  28.669 ms
14  * * *
15  15629.your-cloud.host (49.12.139.7)  27.337 ms  46.956 ms  33.213 ms
16  static.48.176.202.116.clients.your-server.de (116.202.176.48) [open]  39.811 ms  34.168 ms  1019.051 ms

Resolving connectivity issues due to blocked firewall ports

Your firewall rules on the machine and/or infrastructure (cloud) provider could cause connectivity issues. Ideally, your firewall rules should allow:

Inbound TCP traffic on at least port 26656 (or custom P2P port)
Optionally, inbound TCP traffic on other ports (RPC, gRPC, gRPC Web)
Outbound TCP traffic on all ports

Router vs firewall issues

Besides firewalls, depending on your network infrastructure, your connectivity issue instead might lie in a router or Network Address Translation (NAT) gateway.

Operating system firewalls

In addition to infrastructure-level firewalls, Ubuntu machines also come with firewall on the machine itself. Typically, this is either disabled or set to allow all traffic by default.

Configuring OS-level firewalls is outside the scope of this document, but can generally be checked/configured using the ufw utility:

root@hostname ~# ufw status
Status: inactive

If ufw status reports active, follow this guide on configuring firewall rules using ufw to allow traffic on the required ports (customise the ports to the required ports).

Connectivity issues due to blocked DNS traffic

Another common reason for unidirectional node connectivity occurs when the correct P2P inbound/outbound traffic is allowed in firewalls, but DNS traffic is blocked by a firewall.

Firewall rules to allow DNS traffic

To enable DNS lookups, your infrastructure/OS-level firewalls should allow:

Outbound UDP traffic on port 53: This is the most commonly-used port/protocol.
Outbound TCP traffic on port 853 (explicit rule not needed if you already allow TCP outbound on all ports): Modern DNS servers also allow DNS-over-TLS, which secures the connection using TLS to the DNS server. This can prevent malicious DNS servers from intercepting queries and giving spurious responses.
Outbound TCP traffic on port 443 (explicit rule not needed if you already allow TCP outbound on all ports): Similar to above, this enables DNS-over-HTTPS, if supported by your DNS resolver.

Checking whether DNS resolution works

root@hostname ~# dig +short txt ch whoami.cloudflare @1.1.1.1
"157.90.124.113"

If the lookup fails, that could indicate DNS queries or blocked, or there are no externally-resolvable IPs where the node can be reached.

Other troubleshooting steps

Is your machine underpowered?

Typically, this problem is seen if you (non-exhaustive list):

Have only one CPU (bump to at least two CPU)
Only 1-2 GB of RAM (bump to at least 4 GB)

Scaling CPU/memory without downtime may be different you're running a physical machine, or if your cloud provider doesn't support it. Please follow the guidance of those hosting platforms.

Optimising disk storage with pruning

Context

Cosmos SDK and Tendermint has a concept of pruning, which allows reducing the disk utilisation and .

There are two kinds of pruning controls available on a node:

Tendermint pruning: This impacts the ~/.cheqdnode/data/blockstore.db/ folder by only retaining the last n specified blocks. Controlled by the min-retain-blocks parameter in ~/.cheqdnode/config/app.toml.
Cosmos SDK pruning: This impacts the ~/.cheqdnode/data/application.db/ folder and prunes Cosmos SDK app-level state (a logical layer higher than Tendermint, which is just peer-to-peer). These are set by the pruning parameters in the ~/.cheqdnode/config/app.toml file.

This can be done by modifying the pruning parameters inside /home/cheqd/.cheqdnode/config/app.toml file.

Instructions

⚠️ In order for either type of pruning to work, your node should be running the (at least v1.3.0+).

You can check which version of cheqd-noded you're running using:

The output should be a version higher than v1.3.0. If you're on a lower version, you either manually upgrade the node binary or while retaining settings.

Check systemd service status

Follow to check systemd service status:

(Substitute with cheqd-noded.service if you're running a standlone node rather than with Cosmovisor.)

Switch to the `cheqd` user and configuration directory

Switch to the cheqd user and then the .cheqdnode/config/ directory.

sudo su cheqd
cd ~

Display current directory usage/size for the node data folder

Before you make changes to pruning configuration, you might want to capture the existing usage first (only copy the command bit, not the full line):

root@hostname ~# du -h -d 1 /home/cheqd/.cheqdnode/data/ 2>/dev/null
60G     /home/cheqd/.cheqdnode/data/blockstore.db
80K     /home/cheqd/.cheqdnode/data/evidence.db
1021M   /home/cheqd/.cheqdnode/data/cs.wal
274G    /home/cheqd/.cheqdnode/data/application.db
50G     /home/cheqd/.cheqdnode/data/tx_index.db
72K     /home/cheqd/.cheqdnode/data/snapshots
45G     /home/cheqd/.cheqdnode/data/state.db
428G    /home/cheqd/.cheqdnode/data/

Open the `app.toml` file for editing

Open the app.toml file once you've switched to the ~/.cheqdnode/config/ folder using your preferred text editor, such as nano:

cd ~/.cheqdnode/config/
nano app.toml

Instructions on how to use text editors such as nano is out of the scope of this document. If you're unsure how to use it, consider following .

Choose a Cosmos SDK pruning strategy

⚠️ If your node was configured to work with release version v1.2.2 or earlier, you may have been advised to run in pruning="nothing" mode due to a bug in Cosmos SDK.
Ensure you've upgraded to the () or otherwise. When running a validator node, you're recommended to change this value to pruning="default".

The file should already be populated with values. Edit the pruning parameter value to one of following:

pruning="nothing" (highest disk usage): This will disable Cosmos SDK pruning and set your node to behave like an "archive" node. This mode consumes the highest disk usage.
pruning="default" (recommended, moderate disk usage): This keeps the last 100 states in addition to every 500th state, and prunes on 10-block intervals. This configuration is safe to use on all types of nodes, especially validator nodes.
pruning="everything" (lowest disk usage): This mode is not recommended when running validator nodes. This will keep the current state and also prune on 10 blocks intervals. This settings is useful for nodes such as seed/sentry nodes, as long as they are not used to query RPC/REST API requests.
pruning="custom" (custom disk usage): If you set the pruning parameter to custom, you will have to modify two additional parameters:
- pruning-keep-recent: This will define how many recent states are kept, e.g., 250 (contrast this against default).
- pruning-interval: This will define how often state pruning happens, e.g., 50 (contrast against default, which does it every 10 blocks)
- pruning-keep-every: This parameter is deprecated in newer versions of Cosmos SDK. You can delete this line if it's present in your app.toml file.

Example configuration file with recommended settings:

pruning = "default"

# These are applied if and only if the pruning strategy is custom.
#pruning-keep-recent = "0"
#pruning-interval = "0"

Choose a Tendermint pruning strategy

Enabling this feature can reduce disk usage significantly. Be careful in setting a value, as it must be at least higher than 250,000 as calculated below:

(14 days) converted to seconds = 1,210,000 seconds
...divided by average block time = approx. 6s / block
= approx. 210,000 blocks
Adding a safety margin (in case average block time goes down) = approx. 250,000 blocks

Using the recommended values, on the current cheqd mainnet this section would look like the following:

# Note: Tendermint block pruning is dependant on this parameter in conunction
# with the unbonding (safety threshold) period, state pruning and state sync
# snapshot parameters to determine the correct minimum value of
# ResponseCommit.RetainHeight.
min-retain-blocks = 250000

Save changes in the file

Switch user and restart service

ℹ️ NOTE: You need root or at least a user with super-user privileges using the sudo prefix to the commands below when interact with systemd.

If you switched to the cheqd user, exit out to a root/super-user:

exit

Usually, this will switch you back to root or other super-user (e.g., ubuntu).

Restart systemd service:

sudo systemctl restart cheqd-cosmovisor.service

(Substitute with cheqd-noded.service above if you're running without Cosmovisor)

Check the systemd service status and confirm that it's running:

systemctl status cheqd-cosmovisor.service

Our has a section on how to check service status.

Next steps

If you activate/modify any pruning configuration above, the changes to disk usage are NOT immediate. Typically, it may take 1-2 days over which the disk usage reduction is progressively applied.

If you've gone from a higher disk usage setting to a lower disk usage setting, re-run the disk usage command to comapre the breakdown of disk usage in the node data directory:

du -h -d 1 /home/cheqd/.cheqdnode/data/ 2>/dev/null

Guide for validators

This document provides guidance on how configure and promote a cheqd node to validator status. Having a validator node is necessary to participate in staking rewards, block creation, and governance.

Preparation steps

Step 1: Ensure you have a cheqd node installed as a service

You must already have a running cheqd-node instance installed using one of the supported methods.

Please also ensure the node is fully caught up with the latest ledger updates.

(recommended method)

Step 2: Generate a new account key

Follow the guidance on to create a new account key.

P.S. in case of using Ledger Nano device it would be helpful to use

Get your node ID
Follow the guidance on to fetch your node ID.
Get your validator account address
The validator account address is generated in Step 1 above when a new key is added. To show the validator account address, follow the .
(The assumption above is that there is only one account / key that has been added on the node. In case you have multiple addresses, please jot down the preferred account address.)

Promote a node to validator after acquiring CHEQ tokens for staking

Ensure your account has a positive balance
Follow the guidance on to check that your account is correctly showing the CHEQ testnet tokens provided to you.
```
cheqd-noded query bank balances <address>
```
Get your node's validator public key
The node validator public key is required as a parameter for the next step. More details on validator public key is mentioned in the .
```
cheqd-noded tendermint show-validator
```
Promote your node to validator status by staking your token balance
You can decide how many tokens you would like to stake from your account balance. For instance, you may want to leave a portion of the balance for paying transaction fees (now and in the future).
To promote to validation, submit a create-validator transaction to the network:
```
cheqd-noded tx staking create-validator --amount <amount-staking> --from <key-name> --chain-id <chain-id> --min-self-delegation <min-self-delegation> --gas auto --gas-adjustment <multiplication-factor> --gas-prices <price-gas> --pubkey <validator-pubkey> --commission-max-change-rate <commission-max-change-rate> --commission-max-rate <commission-max-rate> --commission-rate <commission-rate>
```
Parameters required in the transaction above are:
- amount: Amount of tokens to stake. You should stake at least 1 CHEQ (= 1,000,000,000ncheq) to successfully complete a staking transaction.
- from: Key alias of the node operator account that makes the initial stake
- min-self-delegation: Minimum amount of tokens that the node operator promises to keep bonded
- pubkey: Node's bech32-encoded validator public key from the previous step
- commission-rate: Validator's commission rate. The minimum is set to 0.05.
- commission-max-rate: Validator's maximum commission rate, expressed as a number with up to two decimal points. The value for this cannot be changed later.
- commission-max-change-rate: Maximum rate of change of a validator's commission rate per day, expressed as a number with up to two decimal points. The value for this cannot be changed later.
- chain-id: Unique identifier for the chain.
  - For cheqd's current mainnet, this is cheqd-mainnet-1
  - For cheqd's current testnet, this is cheqd-testnet-6
- gas: Maximum gas to use for this specific transaction. Using auto uses Cosmos's auto-calculation mechanism, but can also be specified manually as an integer value.
- gas-adjustment (optional): If you're using auto gas calculation, this parameter multiplies the auto-calculated amount by the specified factor, e.g., 1.4. This is recommended so that it leaves enough margin of error to add a bit more gas to the transaction and ensure it successfully goes through.
- gas-prices: Maximum gas price set by the validator. Default value is 50ncheq.

Please note the parameters below are just an “example”.

When setting parameters such as the commission rate, a good benchmark is to consider the .

cheqd-noded tx staking create-validator --amount 1000000000ncheq --from key-alias-name --moniker mainnet-validator-name --chain-id cheqd-mainnet-1 --min-self-delegation="1" --gas auto --gas-adjustment 1.4 --gas-prices="50ncheq" --pubkey '{"@type":"/cosmos.crypto.ed25519.PubKey","key":"4anVUO8WhmRMqG1t4z6VxqmqZL3V7q6HqucjwZePiUw="}' --commission-max-change-rate 0.01 --commission-max-rate 0.2 --commission-rate 0.05 --node https://rpc.cheqd.net:443

Check that your validator node is bonded
Checking that the validator is correctly bonded can be checked via any node:
```
cheqd-noded query staking validators --node <any-rpc-url>
```
Find your node by moniker and make sure that status is BOND_STATUS_BONDED.
Check that your validator node is signing blocks and taking part in consensus
Find out your and look for "ValidatorInfo":{"Address":"..."}:
```
cheqd-noded tendermint show-address
```
Query the latest block. Open <node-address:rpc-port/block in a web browser. Make sure that there is a signature with your validator address in the signature list.

Using Ledger Nano device

To use your Ledger Nano you will need to complete the following steps:

Set-up your wallet by creating a PIN and passphrase, which must be stored securely to enable recovery if the device is lost or damaged.
Connect your device to your PC and update the firmware to the latest version using the Ledger Live application.
Install the Cosmos application using the software manager (Manager > Cosmos > Install).
Adding a new key In order to use the hardware wallet address with the cli, the user must first add it via cheqd-noded. This process only records the public information about the key.

To import the key first plug in the device and enter the device pin. Once you have unlocked the device navigate to the Cosmos app on the device and open it.

To add the key use the following command:

cheqd-noded keys add <name for the key> --ledger

Note

The --ledger flag tells the command line tool to talk to the ledger device and the --index flag selects which HD index should be used.

When running this command, the Ledger device will prompt you to verify the genereated address. Once you have done this you will get an output in the following form:

$ cheqd-noded keys add test --ledger
- name: test
 type: ledger
 address: cheqd1zx9a7rsrmy5a2hakas0vnfwpadqwp3m327f2yt
 pubkey: ‘{“@type”:“/cosmos.crypto.secp256k1.PubKey”,“key”:“Akm0MdDZpTVltoCpRmmWd/wxiosA9edjPlbNcirs4YO1"}’
 mnemonic: “”

Next steps

On completion of the steps above, you would have successfully bonded a node as validator to the cheqd testnet and participating in staking/consensus.

Learn more about what you can do with your new validator node in the .

Guide for validators

Preparation steps

Step 1: Ensure you have a cheqd node installed as a service

Step 2: Generate a new account key

Promote a node to validator after acquiring CHEQ tokens for staking

Using Ledger Nano device

Next steps

FAQs for validators

How do I stake more tokens after setting up a validator node?

How much storage should I provision?

Is there any way to use less storage?

How do I monitor the status of my node?

Are there any other ways to optimise?

What is Commission rate and is it important?

What is Gas and Gas Prices?

How do I change my public name and description

Should I set my firewall port 26656 open to the world?

Optimising disk storage with pruning

Context

Instructions

Check systemd service status

Switch to the cheqd user and configuration directory

Display current directory usage/size for the node data folder

Open the app.toml file for editing

Choose a Cosmos SDK pruning strategy

Choose a Tendermint pruning strategy

Save changes in the file

Switch user and restart service

Next steps

Troubleshooting consistently high CPU/memory loads

Context

Diagnosing a CPU/memory leak

What does a CPU/memory leak look like?

What does a CPU leak look like?

Determining CPU/memory usage with command-line tools

Troubleshooting system clock synchronisation issues

Resolving system clock issues

NTP firewall rules

Troubleshooting node connectivity issues

Is your node's external address reachable?

Resolving connectivity issues due to blocked firewall ports

Router vs firewall issues

Operating system firewalls

Connectivity issues due to blocked DNS traffic

Firewall rules to allow DNS traffic

Checking whether DNS resolution works

Other troubleshooting steps

Is your machine underpowered?

Unjailing a jailed validator

Conditions that cause a validator to be jailed

Temporary: Jailed due to downtime

What happens when a validator is temporarily jailed for downtime

Step 1: Check your Node is up to date

Step 2: Upgrading to latest software

Step 3: Confirming the Node is up to date

Step 4: Unjailing command

Move validator to a different machine

Preparations

1. Copy config directory and data/priv_validator_state.json to safe place

2. Stop the service on your current node

3. Confirm that your previous node / service was stopped

Installation

Installation with the latest snapshot

1. Select Version

2. Select Home directory

3. Setup node

4. Select Network

5. Specify Cosmovisor option

6. Specify if you are using a snapshot

Example

Post-install steps

1. Copy your settings

2. Setup external address

3. Check that service works

Backup and restore node keys with Hashicorp Vault

What to backup from a validator node

Validator private key

Node key

Validator private state

Upgrade height information

Switch to the `cheqd` user and configuration directory

Open the `app.toml` file for editing

1. Copy `config` directory and `data/priv_validator_state.json` to safe place

Switch to the `cheqd` user and configuration directory

Open the `app.toml` file for editing

1. Copy `config` directory and `data/priv_validator_state.json` to safe place