arrow-left

All pages
gitbookPowered by GitBook
1 of 8

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Guide for validators

This document provides guidance on how configure and promote a cheqd node to validator status. Having a validator node is necessary to participate in staking rewards, block creation, and governance.

hashtag
Preparation steps

hashtag
Step 1: Ensure you have a cheqd node installed as a service

You must already have a running cheqd-node instance installed using one of the supported methods.

Please also ensure the node is fully caught up with the latest ledger updates.

  1. (recommended method)

hashtag
Step 2: Generate a new account key

Follow the guidance on to create a new account key.

When you create a new key, a new account address and mnemonic backup phrase will be printed. Keep the mnemonic phrase safe as this is the only way to restore access to the account if they keyring cannot be recovered.

P.S. in case of using Ledger Nano device it would be helpful to use

  1. Get your node ID

    Follow the guidance on to fetch your node ID.

  2. Get your validator account address

    The validator account address is generated in Step 1 above when a new key is added. To show the validator account address, follow the .

hashtag
Promote a node to validator after acquiring CHEQ tokens for staking

  1. Ensure your account has a positive balance

    Follow the guidance on to check that your account is correctly showing the CHEQ testnet tokens provided to you.

  2. Get your node's validator public key

    The node validator public key is required as a parameter for the next step. More details on validator public key is mentioned in the .

Parameters required in the json file above are:

  • amount: Amount of tokens to stake. You should stake at least 1 CHEQ (= 1,000,000,000ncheq) to successfully complete a staking transaction.

  • from: Key alias of the node operator account that makes the initial stake

Please note the parameters mentioned are just an “example”.

When setting parameters such as the commission rate, a good benchmark is to consider the .

You will see the commission they set, the max rate they set, and the rate of change. Please use this as a guide when thinking of your own commission configurations. This is important to get right, because the commission-max-rate and commission-max-change-rate cannot be changed after they are initially set.

Submit the create-validator transaction to the chain:

  1. Check that your validator node is bonded

    Checking that the validator is correctly bonded can be checked via any node:

    Find your node by moniker and make sure that status is BOND_STATUS_BONDED.

hashtag
Using Ledger Nano device

To use your Ledger Nano you will need to complete the following steps:

  • Set-up your wallet by creating a PIN and passphrase, which must be stored securely to enable recovery if the device is lost or damaged.

  • Connect your device to your PC and update the firmware to the latest version using the Ledger Live application.

  • Install the Cosmos application using the software manager (Manager > Cosmos > Install).

To import the key first plug in the device and enter the device pin. Once you have unlocked the device navigate to the Cosmos app on the device and open it.

To add the key use the following command:

Note

The --ledger flag tells the command line tool to talk to the ledger device and the --index flag selects which HD index should be used.

When running this command, the Ledger device will prompt you to verify the genereated address. Once you have done this you will get an output in the following form:

hashtag
Next steps

On completion of the steps above, you would have successfully bonded a node as validator to the cheqd testnet and participating in staking/consensus.

Learn more about what you can do with your new validator node in the .

(The assumption above is that there is only one account / key that has been added on the node. In case you have multiple addresses, please jot down the preferred account address.)
Promote your node to validator status by staking your token balance

You can decide how many tokens you would like to stake from your account balance. For instance, you may want to leave a portion of the balance for paying transaction fees (now and in the future).

To promote to validation, first prepare a json file with all the validator information named validator.json:

min-self-delegation
: Minimum amount of tokens that the node operator promises to keep bonded
  • pubkey: Node's bech32-encoded validator public key from the previous step

  • commission-rate: Validator's commission rate. The minimum is set to 0.05.

  • commission-max-rate: Validator's maximum commission rate, expressed as a number with up to two decimal points. The value for this cannot be changed later.

  • commission-max-change-rate: Maximum rate of change of a validator's commission rate per day, expressed as a number with up to two decimal points. The value for this cannot be changed later.

  • Check that your validator node is signing blocks and taking part in consensus

    Find out your validator node's hex-encoded address and look for "ValidatorInfo":{"Address":"..."}:

    Query the latest block. Open <node-address:rpc-port/block in a web browser. Make sure that there is a signature with your validator address in the signature list.

    Adding a new key In order to use the hardware wallet address with the cli, the user must first add it via cheqd-noded. This process only records the public information about the key.

    Interactive installer
    Docker install
    using cheqd CLI to manage keys
    this instructions
    using cheqd CLI to manage nodes
    cheqd CLI guide on managing accounts
    using cheqd CLI to manage accounts
    cheqd CLI guide on managing nodes
    commission rates set by validators on existing networks such as Cosmos ATOM chainarrow-up-right
    cheqd CLI guide
    cheqd-noded keys add <alias>
    cheqd-noded tendermint show-node-id
    cheqd-noded keys list
    cheqd-noded query bank balances <address>
    cheqd-noded tendermint show-validator
    {
    	"pubkey": {"@type":"/cosmos.crypto.ed25519.PubKey","key":"4anVUO8WhmRMqG1t4z6VxqmqZL3V7q6HqucjwZePiUw="},
    	"amount": "1000000000ncheq",
    	"moniker": "mainnet-validator-name",
    	"identity": "optional identity signature (ex. UPort or Keybase)",
    	"website": "validator's (optional) website",
    	"security": "validator's (optional) security contact email",
    	"details": "validator's (optional) details",
    	"commission-rate": "0.1",
    	"commission-max-rate": "0.2",
    	"commission-max-change-rate": "0.01",
    	"min-self-delegation": "1"
    }
    cheqd-noded tx staking create-validator path/to/validator.json --from key-alias-name --gas auto --gas-adjustment 1.4 --gas-prices="5000ncheq" --node https://rpc.cheqd.net:443
    cheqd-noded query staking validators --node <any-rpc-url>
    cheqd-noded keys add <name for the key> --ledger
    $ cheqd-noded keys add test --ledger
    - name: test
     type: ledger
     address: cheqd1zx9a7rsrmy5a2hakas0vnfwpadqwp3m327f2yt
     pubkey: ‘{“@type”:“/cosmos.crypto.secp256k1.PubKey”,“key”:“Akm0MdDZpTVltoCpRmmWd/wxiosA9edjPlbNcirs4YO1"}’
     mnemonic: “”
    cheqd-noded tendermint show-address

    Optimising disk storage with pruning

    hashtag
    Context

    Cosmos SDK and Tendermint has a concept of pruning, which allows reducing the disk utilisation and storage required on a node.

    There are two kinds of pruning controls available on a node:

    1. Tendermint pruning: This impacts the ~/.cheqdnode/data/blockstore.db/ folder by only retaining the last n specified blocks. Controlled by the min-retain-blocks parameter in ~/.cheqdnode/config/app.toml.

    2. Cosmos SDK pruning: This impacts the ~/.cheqdnode/data/application.db/ folder and prunes Cosmos SDK app-level state (a logical layer higher than Tendermint, which is just peer-to-peer). These are set by the pruning parameters in the ~/.cheqdnode/config/app.toml file.

    This can be done by modifying the pruning parameters inside /home/cheqd/.cheqdnode/config/app.toml file.

    hashtag
    Instructions

    ⚠️ In order for either type of pruning to work, your node should be running the (at least v1.3.0+).

    You can check which version of cheqd-noded you're running using:

    The output should be a version higher than v1.3.0. If you're on a lower version, you either manually upgrade the node binary or while retaining settings.

    The instructions below assume that the home directory for the cheqd user is set to the default value of /home/cheqd. If this is not the case for your node, please modify the commands below to the correct path.

    hashtag
    Check systemd service status

    Follow to check systemd service status:

    (Substitute with cheqd-noded.service if you're running a standlone node rather than with Cosmovisor.)

    hashtag
    Switch to the cheqd user and configuration directory

    Switch to the cheqd user and then the .cheqdnode/config/ directory.

    hashtag
    Display current directory usage/size for the node data folder

    Before you make changes to pruning configuration, you might want to capture the existing usage first (only copy the command bit, not the full line):

    The du -h -d 1 ... command above prints the disk usage for the specified folder down to one folder level depth (-d 1 parameter) and prints the output in GB/MB (-h parameter, which prints in human-readable values).

    hashtag
    Open the app.toml file for editing

    Open the app.toml file once you've switched to the ~/.cheqdnode/config/ folder using your preferred text editor, such as nano:

    Instructions on how to use text editors such as nano is out of the scope of this document. If you're unsure how to use it, consider following .

    hashtag
    Choose a Cosmos SDK pruning strategy

    ⚠️ If your node was configured to work with release version v1.2.2 or earlier, you may have been advised to run in pruning="nothing" mode due to a bug in Cosmos SDK.

    Ensure you've upgraded to the () or otherwise. When running a validator node, you're recommended to change this value to pruning="default".

    The file should already be populated with values. Edit the pruning parameter value to one of following:

    1. pruning="nothing" (highest disk usage): This will disable Cosmos SDK pruning and set your node to behave like an "archive" node. This mode consumes the highest disk usage.

    2. pruning="default" (recommended, moderate disk usage): This keeps the last 100 states in addition to every 500th state, and prunes on 10-block intervals. This configuration is safe to use on all types of nodes, especially validator nodes.

    Although the paramters named pruning-* are only supposed to take effect if the pruning strategy is custom, in practice it seems that in Cosmos SDK v0.46.10 these settings still impact pruning. Therefore, you're advised to comment out these lines when using default pruning.

    Example configuration file with recommended settings:

    hashtag
    Choose a Tendermint pruning strategy

    Configuring min-retain-blocks parameter to a non-zero value activates Tendermint pruning, which specifies minimum block height to retain. By default, this parameter is set to 0, which disables this feature.

    Enabling this feature can reduce disk usage significantly. Be careful in setting a value, as it must be at least higher than 250,000 as calculated below:

    • (14 days) converted to seconds = 1,210,000 seconds

    • ...divided by average block time = approx. 6s / block

    • = approx. 210,000 blocks

    Therefore, this setting must always be updated to carefully match a valid value in case the unbonding time on the network you're running on is different. (E.g., this value is different on mainnet vs testnet due to different unbonding period.)

    Using the recommended values, on the current cheqd mainnet this section would look like the following:

    hashtag
    Save changes in the file

    Save and exit from the app.toml file. Working with text editors is outside the scope of this document, but in general under nano this would be Ctrl+X, "yes" to Save modified buffer, then Enter.

    hashtag
    Switch user and restart service

    ℹ️ NOTE: You need root or at least a user with super-user privileges using the sudo prefix to the commands below when interact with systemd.

    If you switched to the cheqd user, exit out to a root/super-user:

    Usually, this will switch you back to root or other super-user (e.g., ubuntu).

    Restart systemd service:

    (Substitute with cheqd-noded.service above if you're running without Cosmovisor)

    Check the systemd service status and confirm that it's running:

    Our has a section on how to check service status.

    hashtag
    Next steps

    If you activate/modify any pruning configuration above, the changes to disk usage are NOT immediate. Typically, it may take 1-2 days over which the disk usage reduction is progressively applied.

    If you've gone from a higher disk usage setting to a lower disk usage setting, re-run the disk usage command to comapre the breakdown of disk usage in the node data directory:

    The output shown should show a difference in disk usage from the previous run before settings were changed for the application.db folder (if the pruning parameters were changed) and/or the blockstore.db folder (if min-retain-blocks) was changed.

    hashtag
    Third-Party Snapshots for Reference

    Instead of syncing a full node from scratch, validators can leverage pruned snapshots provided by trusted third-party services. These snapshots are substantially smaller (usually 3-10 GB compared to 100s of GBs for a full node), allowing faster setup and reduced storage requirements.

    hashtag
    Available Third-Party Snapshots

    1. NodeStake

      • Website:

      • Instructions: Navigate to the Snapshot tab for detailed steps on downloading and using their snapshots.

    pruning="everything" (lowest disk usage): This mode is not recommended when running validator nodes. This will keep the current state and also prune on 10 blocks intervals. This settings is useful for nodes such as seed/sentry nodes, as long as they are not used to query RPC/REST API requests.
  • pruning="custom" (custom disk usage): If you set the pruning parameter to custom, you will have to modify two additional parameters:

    • pruning-keep-recent: This will define how many recent states are kept, e.g., 250 (contrast this against default).

    • pruning-interval: This will define how often state pruning happens, e.g., 50 (contrast against default, which does it every 10 blocks)

    • pruning-keep-every: This parameter is deprecated in newer versions of Cosmos SDK. You can delete this line if it's present in your app.toml file.

  • Adding a safety margin (in case average block time goes down) = approx. 250,000 blocks
    Lavender.Five Nodes
    • Website: https://services.lavenderfive.com/mainnet/cheqd/snapshotarrow-up-right

    • Visit the page for links and commands to restore a node from their pruned snapshots.

  • STAVR

    1. Website: https://stavr-team.gitbook.io/nodes-guides/mainnets/Cheqd/statesync-snapshotarrow-up-right

    2. Visit the page for links and commands to restore a node from their pruned snapshots.

  • latest stable release of cheqd-nodearrow-up-right
    use the interactive installer to execute an upgrade
    similar instructions as mentioned in the installer guide
    guides on how to use nanoarrow-up-right
    latest stable releasearrow-up-right
    using the installer
    Unbonding time
    installer guide
    https://nodestake.org/cheqdarrow-up-right
    cheqd-noded version
    systemctl status cheqd-cosmovisor.service
    sudo su cheqd
    cd ~
    root@hostname ~# du -h -d 1 /home/cheqd/.cheqdnode/data/ 2>/dev/null
    60G     /home/cheqd/.cheqdnode/data/blockstore.db
    80K     /home/cheqd/.cheqdnode/data/evidence.db
    1021M   /home/cheqd/.cheqdnode/data/cs.wal
    274G    /home/cheqd/.cheqdnode/data/application.db
    50G     /home/cheqd/.cheqdnode/data/tx_index.db
    72K     /home/cheqd/.cheqdnode/data/snapshots
    45G     /home/cheqd/.cheqdnode/data/state.db
    428G    /home/cheqd/.cheqdnode/data/
    cd ~/.cheqdnode/config/
    nano app.toml
    pruning = "default"
    
    # These are applied if and only if the pruning strategy is custom.
    #pruning-keep-recent = "0"
    #pruning-interval = "0"
    # Note: Tendermint block pruning is dependant on this parameter in conunction
    # with the unbonding (safety threshold) period, state pruning and state sync
    # snapshot parameters to determine the correct minimum value of
    # ResponseCommit.RetainHeight.
    min-retain-blocks = 250000
    exit
    sudo systemctl restart cheqd-cosmovisor.service
    systemctl status cheqd-cosmovisor.service
    du -h -d 1 /home/cheqd/.cheqdnode/data/ 2>/dev/null

    Unjailing a jailed validator

    Validator nodes can get "jailed" along with a penalty imposed (through its stake getting slashed). Unlike a proof-of-work (PoW) network (such as Ethereum or Bitcoin), proof-of-stake (PoS) networks (such as the cheqd network, built using Cosmos SDKarrow-up-right) use stake slashing as a mechanism of enforcing good on-chain behaviourarrow-up-right from validators.

    hashtag
    Conditions that cause a validator to be jailed

    There are two scenarios in which a validator could be jailed, one of which has more serious consequences than the other.

    hashtag
    Temporary: Jailed due to downtime

    When a validator "misses" blocks or doesn't participate in consensus, it can get temporarily jailed. By enforcing this check, PoS networks like ours ensure that validators are actively participating in the operation of the network, ensuring that their nodes remain secure and up-to-date with the latest software releases, etc.

    The duration on how this is calculated is defined in the . Jailing occurs based on a sliding time window (called the ) calculated as follows.

    1. The signed_blocks_window (set to 25,920 blocks on mainnet) defines the time window that is used to calculate downtime.

    2. Within this window of 25,920 blocks, at least 50% of the blocks must be signed by a validator. This is defined in the genesis parameter min_signed_per_window (set to 0.5 for mainnet).

    hashtag
    What happens when a validator is temporarily jailed for downtime

    1. 1% of all of the stake delegated to the node is slashed, i.e., burned and disappears forever. This includes any stake delegated to the node by external parties. (If a validator gets jailed, delegators may decide to switch whom they delegate to.) The percentage of stake to be slashed is defined in the slash_fraction_downtime genesis parameter.

    hashtag
    Step 1: Check your Node is up to date

    During the downtime of a Validator Node, it is common for the Node to miss important software upgrades, since they are no longer in the active set of nodes on the main ledger.

    Therefore, the first step is checking that your node is up to date. You can execute the command

    The expected response will be the latest cheqd-noded software release. At the time of writing, the expected response would be

    hashtag
    Step 2: Upgrading to latest software

    If your node is not up to date, please

    hashtag
    Step 3: Confirming the Node is up to date

    Once again, check if your node is up to date, following Step 1.

    Expected response: In the output, look for the text latest_block_height and note the value. Execute the status command above a few times and make sure the value of latest_block_height has increased each time.

    The node is fully caught up when the parameter catching_up returns the output false.

    Additionally,, you can check this has worked:

    It shows you a page and field "version": "0.6.0".

    hashtag
    Step 4: Unjailing command

    If everything is up to date, and the node has fully caught, you can now unjail your node using this command in the cheqd CLI:

    Re-enable pruning and recovering node db

    This guide is specifically made for the the validators/node operators affected by the pruning issue encountered following our v4.x upgrade. This issue required some validators/node operators having to disable pruning entirely.

    Re-enabling pruning (to default/custom from nothing) on the affected nodes will cause the nodes to halt again as the database pruning resumes its operation. To avoid this it is recommended to reset your node db and recover it using state sync or a db snapshot. The nodes that weren't affected by the pruning issue or that had recovered by the time the pruning fix was rolled out do not have to undergo this procedure.

    Following this procedure will significantly reduce the disk space required for your node’s regular operations, thereby lowering operational costs (on the nodes we manage, we observed storage usage drop from 700+ GB to under 10 GB). Additionally, running a node with less disk usage will likely improve performance.

    Therefore, if a validator misses 12,960 blocks within the last 25,920 blocks it meets the criteria for getting jailed.
    1. To convert this block window to a time period, consider the block time of the network, i.e., at what frequency a new block is created. The latest block time can be found on our mainnet explorerarrow-up-right or any other explorer configured for cheqd network (such as Ping Walletarrow-up-right).

    2. Let's assume the block time was 6 seconds. This equates to 12,960 * 6 = 77,760 seconds = ~21.6 hours. This means if the validator is not participating in consensus for more than ~21.6 hours (in this example), it will get temporarily jailed.

    3. Since the block time of the network is variable on the number of nodes participating, network congestion, etc it's always important to calculate the time period on latest block time figures.

    genesis parameters of cheqd network
    infraction windowarrow-up-right
    follow the instructions here
    You have two options here:
    1. Reset via State Sync

    2. Reset by using DB snapshot

    hashtag
    State Sync

    Stop the systemd service of the node. sudo systemctl stop cheqd-cosmovisor.service

    Take a backup of the priv_validator_state.json. This step is very Important for validator nodes:

    Turn the pruning strategy to default or custom based on your preference in the ~/.cheqdnode/config/app.toml file.

    Reset the database:

    Restore the priv_validator_state.json. This step is very Important for validator nodes:

    Enable statesync on the node and provide the required variables:

    Start the node:

    The node should start looking for statesync chunks from it's peers and begin the restoration process in a few minutes. After some time, it should catch up with the network and continue signing blocks.

    hashtag
    Snapshot

    Stop the systemd service of the node:

    Take a backup of the priv_validator_state.json. This step is very Important for validator nodes:

    Turn the pruning strategy to default or custom based on your preference in the ~/.cheqdnode/config/app.toml file.

    Reset the database:

    Restore the priv_validator_state.json. This step is very Important for validator nodes:

    Download the latest lz4 tar archive for mainnet from our snapshots pagearrow-up-right

    Unpack the tar archive and restore the db:

    Start the node

    cheqd-noded version
    0.6.0
    http://<your node ip or domain name>:26657/abci_info
    cheqd-noded tx slashing unjail --from <address_alias> --gas auto --gas-adjustment 1.4 --gas-prices 5000ncheq --chain-id cheqd-mainnet-1
    cp ~/.cheqdnode/data/priv_validator_state.json ~/priv_validator_state.json
    cheqd-noded tendermint unsafe-reset-all --home ~/.cheqdnode/ --keep-addr-book
    cp ~/priv_validator_state.json ~/.cheqdnode/data/priv_validator_state.json
    STATESYNC_RPC="https://rpc.cheqd.net:443"
    LATEST_HEIGHT=$(curl -s $STATESYNC_RPC/block | jq -r .result.block.header.height)
    BLOCK_HEIGHT=$((LATEST_HEIGHT - 2000))
    TRUST_HASH=$(curl -s "$STATESYNC_RPC/block?height=$BLOCK_HEIGHT" | jq -r .result.block_id.hash)
    sed -i.bak -E "s|^(enable[[:space:]]+=[[:space:]]+).*$|\1true| ; \
    s|^(rpc_servers[[:space:]]+=[[:space:]]+).*$|\1\"$STATESYNC_RPC,$STATESYNC_RPC\"| ; \
    s|^(trust_height[[:space:]]+=[[:space:]]+).*$|\1$BLOCK_HEIGHT| ; \
    s|^(trust_hash[[:space:]]+=[[:space:]]+).*$|\1\"$TRUST_HASH\"|" ~/.cheqdnode/config/config.toml
    sudo systemctl restart cheqd-cosmovisor.service
    sudo systemctl stop cheqd-cosmovisor.service`
    cp ~/.cheqdnode/data/priv_validator_state.json ~/priv_validator_state.json
    cheqd-noded tendermint unsafe-reset-all --home ~/.cheqdnode/ --keep-addr-book`
    cp ~/priv_validator_state.json ~/.cheqdnode/data/priv_validator_state.json
    wget https://cheqd-node-backups.ams3.digitaloceanspaces.com/mainnet/<timestamp>/cheqd-mainnet-1_<timestamp>.tar.lz4
    lz4 -c -d cheqd-mainnet-1_<timestamp>.tar.lz4 | tar -x -C ~/.cheqdnode` 
    sudo systemctl restart cheqd-cosmovisor.service

    Move validator to a different machine

    This document offers guidance for validators looking to move thier node instance to another one, for example in case of changing VPS provider or something like this.

    The main tool required for this is cheqd's .

    hashtag
    Preparations

    Before completing the move, ensure the following checks are completed:

    hashtag
    1. Stop the service on your current node

    If you are using cosmosvisor, use systemctl stop cheqd-cosmovisor

    For all other cases, use systemctl stop cheqd-noded.

    hashtag
    2. Confirm that your previous node / service was stopped

    This step is of the utmost important

    If your node is not stopped correctly and two nodes are running with the same private keys, this will lead to a double signing infraction which results in your node being permemently jailed (tombstoned) resulting in a 5% slack of staked tokens.

    You will also be required to complete a fresh setup of your node.

    hashtag
    3. Copy config directory and data/priv_validator_state.json to safe place

    Check that your config directory and data/priv_validator_state.json are copied to a safe place where they will cannot affected by the migration.

    hashtag
    Installation

    Only after you have completed the preparation steps to shut down the previous node, the installation should begin.

    In general, the installer allows you to install the binary and download/extract the latest snapshot from https://snapshots.cheqd.net/arrow-up-right.

    Once this has been completed, you will be able to move your existing keys back and settings.

    hashtag
    Installation with the latest snapshot

    The answers for installer quiestion could be:

    hashtag
    1. Select Version

    Here you can pick up the version what you want.

    hashtag
    2. Select Home directory

    Set path for cheqd user's home directory [default: /home/cheqd]:.

    This is essentialy a question about where the home directory, cheqdnode, is located or will be.

    It is up to operator where they want to store data, config and log directories.

    hashtag
    3. Setup node

    Do you want to setup a new cheqd-node? (yes/no) [default: yes]:

    Here the expected answer is No.

    The main idea is that our old config directory will be used and data will be restored from the snapshot.

    We don't need to setup the new one.

    hashtag
    4. Select Network

    Select cheqd network to join (testnet/mainnet) [default: mainnet]:

    For now, we have 2 networks, testnet and mainnet.

    Type whichever chain you want to use or just keep the default by clicking Enter.

    hashtag
    5. Specify Cosmovisor option

    Install cheqd-noded using Cosmovisor? (yes/no) [default: yes]:.

    This is also up to the operator.

    hashtag
    6. Specify if you are using a snapshot

    CAUTION: Downloading a snapshot replaces your existing copy of chain data. Usually safe to use this option when doing a fresh installation. Do you want to download a snapshot of the existing chain to speed up node synchronisation? (yes/no) [default: yes].

    On this question we recommend to answer Yes, cause it will help you to catchup with other nodes in the network. That is the main feature from this installer.

    hashtag
    Example

    hashtag
    Post-install steps

    hashtag
    1. Copy your settings

    If the installation process was successful, the next step is to get back the configurations from preparation steps:

    • Copy config directory to the CHEQD_HOME_DIRECTORY/.cheqdnode/

    • Copy data/priv_validator_state.json to the CHEQD_HOME_DIRECTORY/.cheqdnode/data

    • Make sure that permissions are cheqd:cheqd for CHEQD_HOME_DIRECTORY/.cheqdnode directory. For setting it the next command can help $ sudo chown -R cheqd:cheqd CHEQD_HOME_DIRECTORY/.cheqdnode

    Where CHEQD_HOME_DIRECTORY is the home directory for cheqd user. By default it's /home/cheqd or what you answered during the installation for the second question.

    hashtag
    2. Setup external address

    You need to specify here new external address by calling the next command under the cheqd user:

    hashtag
    3. Check that service works

    The latest thing in this doc is to run the service and check that all works fine.

    where <service-name> is a name of service depending was Install Cosmovisor selected or not.

    • cheqd-cosmovisor if Cosmovisor was installed.

    • cheqd-noded in case of keeping cheqd-noded as was with debian package approach.

    For checking that service works, please run the next command:

    where <service-name> has the same meaning as above.

    The status should be Active(running)

    interactive installer
    1) v0.6.0
    2) v0.6.0-rc3
    3) v0.6.0-rc2
    4) v0.6.0-rc1
    5) v0.5.0
    Choose the appropriate list option number above to select the version of cheqd-node to install [default: 1]:
    *********  Latest stable cheqd-noded release version is Name: v0.6.0
    *********  List of cheqd-noded releases:
    1) v0.6.0
    2) v0.6.0-rc3
    3) v0.6.0-rc2
    4) v0.6.0-rc1
    5) v0.5.0
    Choose list option number above to select version of cheqd-node to install [default: 1]:
    1
    Set path for cheqd user's home directory [default: /home/cheqd]:
    
    Do you want to setup a new cheqd-node? (yes/no) [default: yes]:
    no
    Select cheqd network to join (testnet/mainnet) [default: mainnet]:
    mainnet
    *********  INFO: Installing cheqd-node with Cosmovisor allows for automatic unattended upgrades for valid software upgrade proposals.
    Install cheqd-noded using Cosmovisor? (yes/no) [default: yes]:
    yes
    CAUTION: Downloading a snapshot replaces your existing copy of chain data. Usually safe to use this option when doing a fresh installation. Do you want to download a snapshot of the existing chain to speed up node synchronisation? (yes/no) [default: yes]:
    yes
    sudo su cheqd
    cheqd-noded configure p2p external-address <your-new-external-address>
    sudo systemctl start <service-name>
    systemctl status <service-name>

    Backup and restore node keys with Hashicorp Vault

    If you're running a validator node, it's important to backup your validator's keys and state - especially before attempting any updates or shifting nodes.

    hashtag
    What to backup from a validator node

    Each validator node has three files/secrets that must be backed up, in case you want to restore or move a node. Anything not in scope listed below can be easily restored from snapshot or otherwise replaced with fresh copies, and as such this list is the bare minimum that needs to be backed up.

    $CHEQD_HOME is the data directory for your node, which defaults to /home/cheqd/.cheqdnode

    hashtag
    Validator private key

    The validator private key is one of the most important secrets that uniquely identifies your validator, and what this node uses to sign blocks, participate in consensus etc. This file is stored under $CHEQD_HOME/config/priv_validator_key.json.

    hashtag
    Node key

    In the same folder as your validator private key, there's another key called $CHEQD_HOME/config/node_key.json. This key is used to derive the node ID for your validator.

    Backing up this key means if you move or restore your node, you don't have to change the node ID in the configuration files any peers have. This is only relevant (usually) if you're running multiple nodes, e.g., a sentry or seed node.

    For most node operators who run a singular validator node, this node key is NOT important and can be refreshed/created as new. It is only used for Tendermint peer-to-peer communication. Hypothetically, if you created a new node key (say when moving a node from one machine to another), and then restored the priv_validator_key.json, this is absolutely fine.

    hashtag
    Validator private state

    The validator private state is stored in the data folder, not the config folder where most other configuration files are kept - and therefore often gets missed by validator operators during backup. This file is stored at $CHEQD_HOME/data/priv_validator_state.json.

    This file stores the last block height signed by the validator and is updated every time a new block is created. Therefore, this should only be backed up after stopping the node service, otherwise, the data stored within this file will be in an inconsistent state. An example validator state file is shown below:

    If you forget to restore to the validator state file when restoring a node, or when restoring a node from snapshot, your validator will double-sign blocks it has already signed, and get jailed permanently ("tombstoned") with no way to re-establish the validator.

    hashtag
    Upgrade height information

    The software upgrades and block height they were applied at is stored in $CHEQD_HOME/data/upgrade-info.json. This file is used by Cosmovisor to track automated updates, and informs it whether it should attempt an upgrade/migration or not.

    hashtag
    Manual backup and restore

    The simplest way to backup your validator secrets listed above is to display them in your terminal:

    You can copy the contents of the file displayed in terminal off the server and store it in a secure location.

    To restore the files, open the equivalent file on the machine where you want to restore the files to using a text editor (e.g., nano) and paste in the contents:

    hashtag
    Backing up and restoring via HashiCorp Vault

    is an open-source project that allows server admins to run a secure, access-controlled off-site backup for secrets. You can either or .

    hashtag
    Pre-requisites

    Before you get started with this guide, make sure you've on the validator you want to run backups from.

    You also need a running HashiCorp Vault server cluster you can use to proceed with this guide.

    Setting up a HashiCorp Vault cluster is outside the scope of this documentation, since it can vary a lot depending on your setup. If you don't already have this set up, and is the best place to get started.

    hashtag
    Setting up Vault CLI environment

    Once you have Vault CLI set up on the validator, you need to set up environment variables in your terminal to configure which Vault server the secrets need to be backed up to.

    Add the following variables to your terminal environment. Depending on which terminal you use (e.g., bash, shell, zsh, fish etc), you may need to modify the statements accordingly. You'll also need to modify the values according to your validator and Vault server configuration.

    hashtag
    Backup procedure for HashiCorp Vault

    hashtag
    Configure Vault backup script

    Download the from Github:

    Make the script executable:

    We recommend that you open the script using an editor such as nano and confirm that you're happy with the environment variables and settings in it.

    hashtag
    Stop the cheqd service

    Before backing up your secrets, it's important to stop the cheqd node service or Cosmovisor service; otherwise, the validator private state will be left in an inconsistent state and result in an incorrect backup.

    If you're running via Cosmovisor (the default option), this can be stopped using:

    Or, if running as a standalone service:

    hashtag
    Run the Vault backup script

    Once you've confirmed the cheqd service is stopped, execute the Vault backup script:

    We use HashiCorp Vault KV v2 secrets engine. Please make sure that it's enabled and mounted under cheqd path.

    hashtag
    Restoring from HashiCorp Vault

    To restore backed-up secrets from a Vault server, you can use the same script using the -r ("restore") flag:

    hashtag
    Restoring secrets to a different machine

    If you're restoring to a different machine than the original machine the backup was done from, you'll need to go through the pre-requisites, CLI setup step, and download the Vault backup script to the new machine as well.

    In this scenario, you're also also recommended to disable the service (e.g., cheqd-cosmovisor) on the original machine. This ensures that if the (original) machine gets restarted, systemd does not try and start the node service as this can potentially result in two validators running with the same validator keys (which will result in tombstoning).

    Once you've successfully restored, you can enable the service (e.g., cheqd-cosmovisor) on the new machine:

    HashiCorp Vaultarrow-up-right
    run a self-managed HashiCorp Vault serverarrow-up-right
    use a hosted/managed HashiCorp Vault servicearrow-up-right
    installed HashiCorp Vault CLIarrow-up-right
    HashiCorp Vault documentationarrow-up-right
    Vault tutorialsarrow-up-right
    vault-backup.sh scriptarrow-up-right
    {
      "height": "3968024",
      "round": 0,
      "step": 3,
      "signature": "<signature>",
      "signbytes": "<sign-bytes>"
    }
    {"name":"v0.6","height":2478827,"info":"{\"binaries\":{\"linux/amd64\":\"https://github.com/cheqd/cheqd-node/releases/download/v0.6.0/cheqd-noded\"}}"}
    cat <filename>
    nano <filename>
    export VAULT_ADDR="https://vault.example.com"
    export VAULT_NAMESPACE="admin/testnet"
    export VAULT_TOKEN="[VAULT_TOKEN]"
    export CHEQD_HOME=/home/cheqd
    export MONIKER=[NODE_MONIKER]
    wget -c https://raw.githubusercontent.com/cheqd/infra/main/scripts/tools/vault-backup.sh
    chmod +x vault-backup.sh
    sudo systemctl stop cheqd-cosmovisor
    sudo systemctl stop cheqd-node
    ./vault.sh -b true
    ./vault.sh -r true
    sudo systemctl disable cheqd-cosmovisor
    sudo systemctl enable cheqd-cosmovisor

    FAQs for validators

    hashtag
    How do I stake more tokens after setting up a validator node?

    When you set up your Validator node, it is recommended that you only stake a very small amount from the actual Validator node. This is to minimise the tokens that could be locked in an unbonding period, were your node to experience signficiant downtime.

    You should delegate the rest of your tokens to your Validator node from a different key alias.

    How do I do this?

    You can add (as many as you want) additional keys you want using the function:

    When you create a new key, a mnemonic phrase and account address will be printed. Keep the mnemonic phrase safe as this is the only way to restore access to the account if they keyring cannot be recovered.

    You can view all created keys using the function:

    You are able to transfer tokens between key accounts using the function.

    You can then delegate to your Validator Node, using the function

    We use a second/different Virtual Machine to create these new accounts/wallets. In this instane, you only need to install cheqd-noded as a binary, you don't need to run it as a full node.

    And then since this VM is not running a node, you can then append the --node parameter to any request and target the RPC port of the VM running the actual node.

    That way:

    1. The second node doesn't need to sync the full blockchain; and

    2. You can separate out the keys/wallets, since the IP address of your actual node will be public by definition and people can attack it or try to break in

    hashtag
    How much storage should I provision?

    I’d recommend at least 250 GB at the current chain size. You can choose to go higher, so that you don’t need to revisit this. Within our team, we set alerts on our cloud providers/Datadog to raise alerts when nodes reach 85-90% storage used which allows us to grow the disk storage as and when needed, as opposed to over-provisioning.

    hashtag
    Is there any way to use less storage?

    Yes, you can. You can do this by to more aggressive parameters in the app.toml file.

    Here’s the relevant section in the file:

    Please also see this thread on the trade-offs involved. This will help to some extent, but please note that this is a general property of all blockchains that the chain size will grow. E.g., out of the gate. We recommend using alerting policies to grow the disk storage as needed, which is less likely to require higher spend due to over-provisioning.

    hashtag
    How do I withdraw Validator Rewards including Commission?

    Validators can withdraw their rewards, including commission, directly via the command-line interface (CLI). This feature is essential for managing earned rewards efficiently.

    hashtag
    Command for Withdrawing Rewards with Commission

    hashtag
    Explanation of Command Parameters

    • cheqdvaloper...: Insert your validator operator address.

    • --commission: Ensures that commission rewards are included in the withdrawal.

    • --from <wallet-name>: Specifies the wallet from which the transaction will be initiated.

    hashtag
    How do I monitor the status of my node?

    One of the simplest ways to do this is to , and with a more detailed view on the per-validator page (, for example). The condition is scored based on :

    • Green: 90-100% blocks signed

    • Amber: 70-90% blocks signed

    • Red: 1-70% blocks signed

    We have also internally that takes the output of this from condition score from the block explorer GraphQL API and makes it available as a simple REST API that can be used to send alerts on Slack, Discord etc which we have and set up on our Slack/Discord.

    Please join the channel 'mainnet-alerts' on the cheqd community slack.

    In addition to that, (for those who already use it for monitoring/want to set one up) that has metrics for monitoring node status (and a lot more).

    hashtag
    Are there any other ways to optimise?

    Yes! Here are a few other suggestions:

    • You can check the current status of disk storage used on all mount points manually through the output of df -hT

    • The default storage path for cheqd-node is on /home/cheqd. By default, most hosting/cloud providers will set this up on a single disk volume under the / (root) path. If you move and mount /home on a separate disk volume, this will allow you to expand the storage independent of the main volume. This can sometimes make a difference, because if you leave /home

    hashtag
    What is Commission rate and is it important?

    As a Validator Node, you should be familiar with the concept of commission. This is the percentage of tokens that you take as a fee for running the infrastructure on the network. Token holders are able to delegate tokens to you, with an understanding that they can earn staking rewards, but as consideration, you are also able to earn a flat percentage fee of the rewards on the delegated stake they supply.

    There are three commission values you should be familiar with:

    The first is the maximum rate of commission that you will be able to move upwards to.

    Please note that this value cannot be changed once your Validator Node is set up, so be careful and do your research.

    The second parameter is the maximum amount of commission you will be able to increase by within a 24 hour period. For example if you set this as 0.01, you will be able to increase your commission by 1% a day.

    The third value is your current commission rate.

    Points to note: lower commission rate = higher likelihood of more token holders delegating tokens to you because they will earn more rewards. However, with a very low commission rate, in the future, you might find that the gas fees on the Network outweight the rewards made through commission.

    higher commission rate = you earn more tokens from the existing stake + delegated tokens. But the tradeoff being that it may appear less desirable for new delegators when compared to other Validators.

    You can have a look at other projects on Cosmos to get an idea of the percentages that nodes set as commission.

    hashtag
    What is Gas and Gas Prices?

    When setting up the Validator, the Gas parameter is the amount of tokens you are willing to spend on gas.

    For simplicity, we suggest setting:

    AND setting:

    These parameters, together, will make it highly likely that the transaction will go through and not fail. Having the gas set at auto, without the gas adjustment will endanger the transaction of failing, if the gas prices increase.

    Gas prices also come into play here too, the lower your gas price, the more likely that your node will be considered in the active set for rewards.

    We suggest the set:

    should fall within this recommended range:

    1. Low: 25ncheq

    2. Medium: 5000ncheq

    3. High: 100ncheq

    hashtag
    How do I change my public name and description

    Your public name, is also known as your moniker.

    You are able to change this, as well as the description of your node using the function:

    hashtag
    Should I set my firewall port 26656 open to the world?

    Yes, this is how you should do it. Since it's a public permissionless network, there's no way of pre-determining what the set of IP addresses will be, as entities may leave and join the network. We suggest using a TCP/network load balancer and keeping your VM/node in a private subnet though for security reasons. The LB then becomes your network edge which if you're hosting on a cloud provider they manage/patch/run.

    --gas auto: Automatically calculates the gas required for the transaction.

  • --gas-adjustment 1.7: Adjusts the gas limit to account for network fluctuations.

  • --gas-prices 5000ncheq: Sets the gas price in ncheq.

  • --chain-id cheqd-mainnet-1: Identifies the chain ID for the transaction.

  • tree mounted on
    /
    mount path, many cloud providers will force you to bump the
    whole
    virtual machine category - including the CPU and RAM - to a more expensive tier in order to get additional disk storage on
    /
    . This can also result in over-provisioning since the additional CPU/RAM is likely not required.
  • You can also optimise the amount of logs stored, in case the logs are taking up too much space. There’s a few techniques here:

  • In config.toml you can set the logging level to error for less logging than the default which is info. (The other possible value for this is debug)

  • [Set the log rotation configuration to use different/custom parameters such as what file-size to rotate at, number of days to retain etc.

  • setting the pruning settingsarrow-up-right
    Sovrin’s technical docs require 1 TB minimumarrow-up-right
    look at the validator “Condition” in the block explorerarrow-up-right
    see cheqd’s validatorarrow-up-right
    number of missed blocks within the signed blocks windowarrow-up-right
    built a toolarrow-up-right
    open sourcedarrow-up-right
    Cosmos/Tendermint also provide a Prometheus metrics interfacearrow-up-right
    herearrow-up-right
    cheqd-noded keys add <alias>
    cheqd-noded keys list
    cheqd-noded tx bank send <from> <to-address> <amount> --node <url> --chain-id <chain> --gas auto --gas-adjustment 1.4
    cheqd-noded tx staking delegate <validator address> <amount to stake> --from <key alias> --gas auto --gas-adjustment 1.4 --gas-prices 5000ncheq 
    default: the last 100 states are kept in addition to every 500th state; pruning at 10 block intervals
    
    nothing: all historic states will be saved, nothing will be deleted (i.e. archiving node)
    
    everything: all saved states will be deleted, storing only the current state; pruning at 10 block intervals
    
    custom: allow pruning options to be manually specified through 'pruning-keep-recent', 'pruning-keep-every', and 'pruning-interval'
    
    pruning = "default"
    
    These are applied if and only if the pruning strategy is custom.
    
    pruning-keep-recent = "0"
    pruning-keep-every = "0"
    pruning-interval = "0"
    cheqd-noded tx distribution withdraw-rewards cheqdvaloper... --commission --from <wallet-name> --gas auto --gas-adjustment 1.7 --gas-prices 5000ncheq --chain-id cheqd-mainnet-1
    max_commission_rate
    
    max_commission_rate_change
    
    commission_rate
    --gas: auto
    --gas-adjustment: 1.2
    --gas-price
    cheqd-noded tx staking edit-validator --from validator1-eu --moniker "cheqd" --details "cheqd is building a private and secure decentralised digital identity network on the Cosmos ecosystem" --website "https://www.cheqd.io" --identity "F0669B9ACEE06ADC" --security-contact [email protected] --gas auto --gas-adjustment 1.4 --gas-prices 5000ncheq -chain-id cheqd-mainnet-1

    Troubleshooting consistently high CPU/memory loads

    hashtag
    Context

    Blockchain applications (especially when running validator nodes) are a-typical from "traditional" web server applications because their performance characteristics tend to be different in the way specified below:

    1. Tend to be more disk I/O heavy: Traditional web apps will typically offload data storage to persistent stores such as a database. In case of a blockchain/validator node, the database is on the machine itself, rather than offloaded to separate machine with a standalone engine. Many blockchains use for their local data copies. (In Cosmos SDK apps, such as cheqd, this is the , but can also be , , etc.) The net result is the same as if you were trying to run a database engine on a machine: the system needs to have fast read/write performance characteristics.

    2. Validator nodes cannot easily be auto-scaled: Many traditional applications can be horizontally (i.e., add more machines) or vertically (i.e., make current machine beefier) scaled. While this is possible for validator nodes, it must be done with extreme caution to ensure there aren't two instances of the same validator active simultaneously. This can be perceived by network consensus as a sign of compromised validator keys and lead to the . These concerns are less relevant for non-validating nodes, since they have a greater tolerance for missed blocks and can be scaled horizontally/vertically.

    3. Docker/Kubernetes setups are not recommended for validators (unless you really know what you're doing): Primarily due to the double-signing risk, it's (../setup-and-configure/docker.md) unless you have a strong DevOps practice. The other reason is related to the first point, i.e., a Docker setup adds an abstraction layer between the actual underlying file-storage vs the Docker volume engine. Depending on the Docker (or similar abstraction) storage drivers used, you may need to for optimal performance.

    hashtag
    Diagnosing a CPU/memory leak

    ⚠️ Please ensure you are running the since they may contain fixes/patches that improve node performance.

    hashtag
    What does a CPU/memory leak look like?

    If you've got monitoring built in for your machine, a memory (RAM) leak would look like a graph where memory usage grows to 100%, falls off a cliff, grows to 100% again (the process repeats itself).

    Normal memory usage may grow over time, but will not max out the available memory up to 100%. The graph below is taken from a server run by the cheqd team, over a 14-day period:

    Figure 1: Graph showing normal memory usage on a cheqd-node server

    hashtag
    What does a CPU leak look like?

    A "CPU leak", i.e., where one or more process(es) consume increasing amounts of CPU is rarer, but could also happen if your machine has too few vCPUs and/or underpowered CPUs.

    Figure 2: Graph showing normal CPU usage on a cheqd-node server

    There's a catch here: depending on your monitoring tool, "100% CPU" could be measured differently! The graph above is from .

    Other monitoring tools, such as , count each CPU as "100%", thus making the overall figure displayed in the graph (shown below) add up to number of CPUs x 100%.

    Figure 4: Graph showing CPU usage on Hetzner cloud, adding up to more than 100%

    Check what accounting metric your monitoring tool uses to get a realistic idea of whether your CPU is overloaded or not.

    , regardless of the CPU usage.

    hashtag
    Determining CPU/memory usage with command-line tools

    If you don't have a monitoring application installed, you could use the built-in top or htop command.

    Figure 2: Output of htop showing CPU and memory usage

    htop is visually easier to understand than top since it breaks down usage per-CPU, as well as memory usage.

    Unfortunately, this only provides the real-time usage, rather than historical usage over time. Historical usage typically requires an external application, which many cloud providers provide, or through 3rd party monitoring tools such as , etc.

    , in case you already have a Prometheus instance you can use or comfortable with using the software. This can allow alerting based on actual metrics emitted by the node, rather than just top-level system metrics which are a blunt instrument / don't go into detail.

    hashtag
    Troubleshooting system clock synchronisation issues

    If your , this could cause Tendermint peer-to-peer connections to be rejected. This is similar to in a normal browser when accessing secure (HTTPS) sites.

    The net result of your system clock being out of sync is that your node:

    • Constantly tries to dial peers to try and fetch new blocks

    • Connection gets rejected by some/all of them

    • Keeps retrying the above until CPU/memory get exhausted, or the node process crashes

    To check if your system clock is synchronised, use the following command (note: only copy the command, not the sample output):

    The timezone your machine is based in doesn't matter. You should check whether it reports System clock synchronized: yes and NTP service: active.

    hashtag
    Resolving system clock issues

    If either of these are not true, chances are that your system clock has fallen out of sync, and may be the root cause of CPU/memory leaks. Follow to resolve the issue, and then monitor whether it fixes high utilisation.

    hashtag
    NTP firewall rules

    You may also need to allow outbound UDP traffic on port 123 explicitly, depending on your firewall settings. This port is used by the Network Time Protocol (NTP) service.

    hashtag
    Troubleshooting node connectivity issues

    Properly-configured nodes should have bidirectional connectivity for network traffic. To check whether this is the case, open <node-ip-address-or-dns-name:rpc-port>/net_info in your browser, for example, .

    Accessing this endpoint via your browser would only work and/or you're accessing from an allowed origin. If this is not the case, you can also view the results for this endpoint from the same machine where your node service is running through the command line:

    The JSON output should be similar to below:

    Look for the n_peers value at the beginning: this shows the number of peers your node is connected. A healthy node would typically be connected to anywhere between 5-50 nodes.

    Next, search the results for the term is_outbound. The number of matches for this term should exactly be the same as the value of n_peers, since this is printed once per peer. The value of is_oubound may either be true or false.

    A healthy node should have a mix of is_outbound: true as well as is_outbound: false. If your node reports only one of these values, it's a strong indication that your node is unidirectionally connected/reachable, rather than bidirectionally reachable.

    Unidirectional connectivity may cause your node to work overtime to stay synchronised with latest blocks on the network. You may fly by just fine - until there's a loss of connectivity to critical mass of peers and then your node goes offline.

    Furthermore, your node might fetch the address book from seed nodes, and then try to resolve/contact them (and fail) due to connectivity issues.

    hashtag
    Is your node's external address reachable?

    Ideally, the IP address or DNS name set in external_address property in your config.toml file should be externally reachable.

    To determine whether this is true, from a machine other than your node, . Unlike ping which uses ICMP packets, tcptraceroute uses TCP, i.e., the actual protocol used for Tendermint P2P to see if the destination is reachable. Success or failure in connectivity using ping doesn't prove whether your node is reachable, since firewalls along the path may have different rules for ICMP vs TCP.

    Once you have tcptraceroute installed, from this external machine you can execute the following command in tcptraceroute <hostname> <port> format (note: only copy the actual command, not sample output):

    A successful run would result in tcptraceroute reaching the destination server on the required port (e.g., 26656) and then hanging up. If the connection times out consistently at any of the hops, this could indicate there's a firewall / router in the path dropping or blocking connections.

    hashtag
    Resolving connectivity issues due to blocked firewall ports

    Your firewall rules on the machine and/or infrastructure (cloud) provider could cause connectivity issues. Ideally, :

    1. Inbound TCP traffic on at least port 26656 (or custom P2P port)

    2. Optionally, inbound TCP traffic on other ports (RPC, gRPC, gRPC Web)

    3. Outbound TCP traffic on all ports

    hashtag
    Router vs firewall issues

    Besides firewalls, depending on your network infrastructure, your connectivity issue instead might lie in a router or Network Address Translation (NAT) gateway.

    Outbound TCP traffic is the default mode on many systems, since the port through which traffic gets routed out is dynamically determined during TCP connection establishment. In some cases, e.g., when , you may require more complex configuration (outside the scope of this document).

    hashtag
    Operating system firewalls

    In addition to infrastructure-level firewalls, Ubuntu machines also come with firewall on the machine itself. Typically, this is either disabled or set to allow all traffic by default.

    Configuring OS-level firewalls is outside the scope of this document, but can generally be :

    If ufw status reports active, follow to allow traffic on the required ports (customise the ports to the required ports).

    hashtag
    Connectivity issues due to blocked DNS traffic

    Another common reason for unidirectional node connectivity occurs when the correct P2P inbound/outbound traffic is allowed in firewalls, but DNS traffic is blocked by a firewall.

    Your node needs the ability to lookup DNS queries to resolve nodes with DNS names as their external_address property to IP addresses, since other peers may advertise their addresses as a DNS name. Seed nodes set in config.toml are a common example of this, since these are advertised as DNS names.

    Your node may still scrape by if DNS resolution is blocked, for example, by obtaining an address book from a peer that has already done DNS -> IP resolution. However, this approach can be liable to break down if the resolution is incorrect or entries outdated.

    hashtag
    Firewall rules to allow DNS traffic

    To enable DNS lookups, your infrastructure/OS-level firewalls should allow:

    1. Outbound UDP traffic on port 53: This is the most commonly-used port/protocol.

    2. Outbound TCP traffic on port 853 (explicit rule not needed if you already allow TCP outbound on all ports): Modern DNS servers also allow , which secures the connection using TLS to the DNS server. This can prevent malicious DNS servers from intercepting queries and giving spurious responses.

    3. Outbound TCP traffic on port 443 (explicit rule not needed if you already allow TCP outbound on all ports): Similar to above, this enables

    hashtag
    Checking whether DNS resolution works

    To check DNS resolution work, try to run a DNS query and see if it returns a response. The following command will use the dig utility to look up and report your node's externally resolvable IP address via (note: only copy the command, not the sample output):

    If the lookup fails, that could indicate DNS queries or blocked, or there are no externally-resolvable IPs where the node can be reached.

    hashtag
    Other troubleshooting steps

    hashtag
    Is your machine underpowered?

    If your machine is provisioned with , you might find that the node struggles during times of high load, or slowly degrades over time. The minimum figures are recommended for a developer setup, rather than a production-grade node.

    Typically, this problem is seen if you (non-exhaustive list):

    • Have only one CPU (bump to at least two CPU)

    • Only 1-2 GB of RAM (bump to at least 4 GB)

    Most cloud providers should allow dynamically scaling these two factors without downtime. Monitor - especially over a period of days/weeks - whether this improves the situation or not. If the CPU/memory load behaviour remains similar, that likely indicates the issue is different.

    Scaling CPU/memory without downtime may be different you're running a physical machine, or if your cloud provider doesn't support it. Please follow the guidance of those hosting platforms.

    , if supported by your DNS resolver.
    LevelDBarrow-up-right
    Golang implementation of LevelDBarrow-up-right
    C-implementation of LevelDBarrow-up-right
    RocksDBarrow-up-right
    node being jailed for double-signing blocks
    tune the storage/volume engine optionsarrow-up-right
    latest stable release of cheqd-nodearrow-up-right
    DigitalOcean's monitoring tools, which counts the sum of all CPU capacity as "100%"arrow-up-right
    Hetzner Cloud'sarrow-up-right
    Load average is another useful measure of the responsiveness of a machinearrow-up-right
    Datadogarrow-up-right
    Tendermint / Cosmos SDK also provides a Prometheus metrics interfacearrow-up-right
    system clock is out of synchronisationarrow-up-right
    how SSL/TLS connections can get rejected with a "handshake error"arrow-up-right
    this guide on setting time synchronisation in Ubuntuarrow-up-right
    rpc.cheqd.net/net_infoarrow-up-right
    if traffic to your RPC port is allowed through your firewall
    install tcptraceroutearrow-up-right
    your firewall rules should allow
    using a NAT gateway in AWSarrow-up-right
    checked/configured using the ufw utilityarrow-up-right
    this guide on configuring firewall rules using ufwarrow-up-right
    DNS-over-TLSarrow-up-right
    Cloudflare's 1.1.1.1 DNS resolverarrow-up-right
    the bare minimum of CPU and RAM
    Graph showing memory/RAM usage on a cheqd-node server
    Graph showing normal CPU usage on a cheqd-node server
    Graph showing CPU usage on Hetzner, adding up to more than 100%
    Output of htop command showing CPU and memory usage
    root@hostname ~# timedatectl status
       Local time: Wed 2023-03-29 20:31:56 CEST
       Universal time: Wed 2023-03-29 18:31:56 UTC 
       RTC time: Wed 2023-03-29 18:31:57     
       Time zone: Europe/Berlin (CEST, +0200) 
       System clock synchronized: yes                         
       NTP service: active                      
       RTC in local TZ: no 
    curl -v http://localhost:26657/net_info
    {"jsonrpc":"2.0","id":-1,"result":{"listening":true,"listeners":["Listener(@sentry1.ap.cheqd.net:26656)"],"n_peers":"47","peers":[{"node_info":{"protocol_version":{"p2p":"8","block":"11","app":"0"},"id":"c7b1c178adaf364917caaac67687051d1ed5bf53","listen_addr":"78.46.83.78:26656","network":"cheqd-mainnet-1","version":"0.34.24","channels":"40202122233038606100","moniker":"cstp-cheqd","other":{"tx_index":"on","rpc_address":"tcp://0.0.0.0:26657"}},"is_outbound":true,}]}}
    user@hostname ~> sudo tcptraceroute seed1.eu.cheqd.net 26656
    Selected device en0, address 192.168.4.42, port 53088 for outgoing packets
    Tracing the path to seed1.eu.cheqd.net (116.202.176.48) on TCP port 26656, 30 hops max
    1  192.168.4.1  3.049 ms  2.186 ms  5.693 ms
    2  * * *
    3  hari-core-2a-xe-806-0.network.virginmedia.net (94.173.50.205)  27.455 ms  16.619 ms  23.925 ms
    4  * hari-core-2b-ae1-0.network.virginmedia.net (81.96.16.210) 33.225 ms  25.725 ms
    5  * * *
    6  * * *
    7  tele-ic-7-ae2-0.network.virginmedia.net (62.253.175.34)  34.680 ms  19.670 ms  17.274 ms
    8  ae15-0.lon10.core-backbone.com (80.255.14.105)  19.708 ms  26.629 ms  21.323 ms
    9  ae6-2011.nbg40.core-backbone.com (80.255.14.246)  33.451 ms  30.159 ms  31.193 ms
    10  core-backbone.hetzner.com (81.95.15.6)  33.430 ms  33.701 ms  31.949 ms
    11  core11.nbg1.hetzner.com (213.239.229.161)  33.887 ms  34.907 ms  34.535 ms
    12  spine11.cloud1.nbg1.hetzner.com (213.133.112.66)  66.511 ms  36.853 ms  32.539 ms
    13  spine4.cloud1.nbg1.hetzner.com (213.133.108.150)  37.238 ms  43.259 ms  28.669 ms
    14  * * *
    15  15629.your-cloud.host (49.12.139.7)  27.337 ms  46.956 ms  33.213 ms
    16  static.48.176.202.116.clients.your-server.de (116.202.176.48) [open]  39.811 ms  34.168 ms  1019.051 ms
    root@hostname ~# ufw status
    Status: inactive
    root@hostname ~# dig +short txt ch whoami.cloudflare @1.1.1.1
    "157.90.124.113"
    DNS-over-HTTPSarrow-up-right