Free TON

Validator Contest: Devops tools

I think that you need to add staked amount and stale weight .

1 Like

Dear All,

I’ve published some scripts too:

  • script to install systemd service for existing node, as well as a couple of improvements to existing scripts to transparently support running in service mode;
  • script to perform node update with minimal downtime. It automatically gets fresh git updates, builds them, stops node, cleanup node.log to zip archive while leaving tail lines to examine the transition process for manual examination, and then starts new version. It also supports automatic crontab execution, say once a day;
  • a couple of utility scripts for external monitoring tools:
    – one to get average performance duration metric from node.log (which displays famous “SLOW” tag). It is a very basic way of measuring TON node operations performance;
    – and the second script to get current wallet balance - it is necessary for monitoring tools to get clean amount without anything else.

Available at: https://github.com/samorodkin/net.ton.dev/tree/toolscripts/scripts
under Apache 2.0 license (cheers goes to M :wink:)

Also I have made a very pleasant dashboard on TIG stack (Telegraf+Influxdb+Grafana), please have a look:

The idea was to separate 3 layers - server, network and business (later). Stay clean and do not abuse dashboard with excessive indicators.

  • Network interface utilization %;
  • CPU utilization %, iowait, Load average;
  • Memory and disk utilization %;
  • FreeTON network sync status (TIME_DIFF);
  • Node.log duration (famous SLOW tags in log);
  • Wallet(s) balance history.

Filters support several hosts/disks/wallets/net interfaces. Also dashboard indicators contains threshould values so you could easily tune Grafana alerts.

In order to setup dashboard besides standard TIG stack you need to update your telegraf.conf according to dashboard variables https://samorodkin.grafana.net/d/IeFxBvzMk/freeton-example-dashboard

3 Likes

Updated! Thanks a lot for the idea!

Also I’ve added a “SLOW-meter” which shows SLOW-to-all logs ratio.

3 Likes

This post has so many screens, and if I put there all screens for all features, the page will load so long

Hello!
Zabbix, Grafana, ELK… We need to know how to install all this software and how to configure them.
But what if we are not near the computer? We are on the trip, for example, relaxing on a beach or somewhere, where we have no access to our computers?
We didn’t know what our validator node is down, or our CPU/RAM is overloaded. Sound sad =(
Or we get SMS from the monitoring tool, that something is wrong? And again, we need a computer.
SO? Validators need a fast and satisfactory solution for them. I create “TON Telegram Bot” with alerts, statistics, and many useful tools for validators.
Are you ready?
And yes

Goedenmiddag, God eftermiddag, Guten Tag, Buenas tardes, Bonne après-midi, नमस्कार, Buon pomeriggio, Boa tarde, Hyvää iltapäivää, God eftermiddag, Tünaydın, Καλό απόγευμα, Добрый день, Доброго дня, こんにちは

My telegram bot supports all languages above!

Let’s go!

What this bot can do for now(This is only start))

Monitoring

  1. Validator node
  2. CPU load
  3. RAM load
  4. Network
  5. Time diff
  6. Wallet balance
  7. Stake monitoring
  8. Error log monitoring
  9. Slow log monitoring

Historical data

1. CPU Utilization (Dinamic)

!

2. RAM Load (Dinamic)

3. Time Diff (Dinamic)


4. Slow log events

5. Disk I/O (Dinamic)

6. Network perfomance (Dinamic)

7. Ping test (Dinamic)

Alert

1. Validator node down

2. High CPU Utilization
alert1

3. High RAM load
/No screenshot, but, it will be like other alerts/

4. Network degradation
Screenshot 2020-06-02 at 15.47.09
5. Stake < Wallet balance

Features

Validator

  1. Restart validotor node
  2. Check current stake
  3. Update stake
  4. Check wallet balance
  5. Check current time diff + Historical data
  6. Know your adnl key
  7. Get your error log
  8. Get your slow log + Historical data
  9. Validators count (New)
  10. Election status & validators count

Server

  1. Check CPU load + Historical data
  2. Check RAM load + Historical data
  3. Check disk usage
  4. Check disk i/o + Historical data
  5. Check validator ports
  6. Check server ping + Historical data
  7. Alalyze server traceroute
  8. Get top processes
  9. Check uptime
  10. Check network load + Historical data
  11. Check server network speed to different countries (Some countries may not work because speedtest servers may have problems. On Hetzner, many countries didn’t work. In the future, I will add much more servers for tests)

Some screenshots
Start screen

Español example

Alerts (Node not running, diff time, high ping, high CPU load, High RAM load, Validator node is down!, Stake lower than your wallet balance)

Validator tools (You can just restart your node in a second)




Linux tools


Check server network speed test

Future: history graphics for (diff time, cpu, ram, network etc… ) and many other interesting things
Looks good?
And installation for a minute!
Download https://github.com/anvme/TONTgBot

23 Likes

very nice! thank you!

1 Like

Nice and beautiful, I’m working on a similar one, but with a distinct set of features.
There won’t be any possibility to change state of the node (i.e. change stake or restart the node), though, for security’s sake.

2 Likes

Hello!

This is script for TON validator nodes for automatic registration in elections and automatic confirmation by custodians of transactions to the elector smart contract

Features:

  • More reliable than validator_msig.sh
  • Checks wallet balance before transactions
  • Confirms registration with “participant_list” method
  • Telegram and email notifications
  • Fully supports multisig wallets with reqConfirm > 1
  • Auto confirmation of multisig transactions
  • Requests blockchain global configuration parameters (minimal hardcode)
  • Uses tonos-cli

Criticism and suggestions are welcome!

2 Likes

Thank you! Handy telegram bot!!!
It works perfectly on ubuntu 18.04. My greetings. I hope you will get the first place!

2 Likes

Upgrades in my telegram bot.

Added Historical data (New)

1. CPU Utilization (Dinamic)

!

2. RAM Load (Dinamic)

3. Time Diff (Dinamic)

4. Disk i/o

5. Network performance

6. Ping test (Dinamic)

7 Likes

@Stanislav
Can you please allow the bot to reboot the server? A command like /rebootmyserver
Thank you! All commands work well!

Maybe, but I think that we don’t need this function here now(today).
Maybe in the next few months, I do

I also want to join on the validators, let me know.

Great solution! This bot is like Swiss Army Knife, has tons of functions in one place, it’s easy to install, and don’t have to open any additional ports. Thank you!

Can you please improve updating stake functionality? I’d prefer to update stake in dialogue style instead of writing down /updstake command. It is not a big deal, but rather inconvenient to type commands using smartphone.

1 Like

Introducing ftvmon - Free TON Validator’s Node Monitoring and Alerting, written in Go.

Uses Telegram as an endpoint for status messages and alerts, supports multiple users. Sends alerts or reports status for every metric if a user issues the /status command.

Has powerful log inspection engine, can monitor multiple logs simultaneously in real-time, with multiple event-matching criteria per log. Event-matching can be done against simple substring or using regex, regular expressions are compiled and guaranteed to run in time linear in the size of the input (thousands of log records per second can be inspected). Log files are seeked to the end during launch and all new log records are inspected against match criteria in real-time. An alert message for every log event class can be triggered by a single event or by a number of events exceeding a predefined threshold during a predefined time window, in this case the system will send an off message if the condition clears (i.e. if the number of events during last n minutes becomes lower than a threshold set in the config). All log inspection parameters are set in the config file. Some of the validator’s specific log matching entries have been added to config template.

Constantly monitors a number of system performance metrics. System metrics are monitored using native code without invoking any external processes. Sends a message when a condition arises and when it clears. Metrics include CPU, Memory, Free Disk Space, Disk Device IOPS, Disk Mb/s, Network Mb/s, existence of a process with a given name in the system, and Disk I/O % utilization. Disk I/O % utilization (derived from Weighted time spent doing I/Os) is the most meaningful disk counter, device saturation occurs when this value is close to 100% for a single disk (for RAIDs capable of multiple I/O operations simultaneously it can be higher).

Validator’s node specific metrics are:

  1. Sync status (TIME_DIFF);
  2. Is validator’s node in the active set? Checks status using ADNL address, since default scripts overwrite ADNL key file after submitting a stake for the elections, software saves previous ADNL address. Sends an alert if neither of the ADNL keys can be found in the active set;
  3. Is validator’s node in the elections? During elections, if the validator tried to submit a stake for the elections, but its public key can’t be found in the list of election participants, sends an alert. If the validator is found, adds stake amount to status message;
  4. Is validator’s node in the next set? If the next set is active, checks status using current ADNL key and sends an alert if the validator is not found.
    Thus, monitoring covers the whole validation cycle.

Easily extendable. Uses run-time reflection, a metric can be added by adding a function (returning status and setting corresponding messages) and creating a config entry with the name of the function.

6 Likes

Cloud bot for monitoring validators node. @TON_Validators_Bot

Hello everybody!

Monitoring is good. You can configure many different monitoring systems on your server where the validator’s node is running. But what to do if the server crashes, freezes, something happened, the monitoring system is fail? You cannot track monitoring if it is unavailable or broken.

I propose a solution. My telegram bot @TON_Validators_Bot does not require installation on your server. It runs in my cloud and does not make any requests to your validator node. However, he can check the time when your node signed the last block in the blockchain. If your node does not sign new blocks for a long time, you will receive a notification in the telegram. You will immediately see that your server requires attention.

What this bot can do?

Monitoring

  1. The bot periodically checks whether your validator node signs blocks.
  2. The bot checks to see if your validator is participating in future validator elections.
  3. The bot can automatically calculate your public keys and adnl addresses. It’s comfortable.
  4. You can find out information about the validator without knowing its Account Address, just enter the public key or adnl address.

Alerts*

  1. If for a long time there are no new signed blocks, you will receive a notification.
  2. If your node does not participate in future elections of validators, you will see it.

*Alerts is under constructions.

Functions

  1. Easy to use! The bot does not require you to take any steps to install and configure it. Just “/start” and enter your Account Address in hex.
  2. The bot does not interact with your server. It is completely autonomous.
  3. The bot checks the result of the validator, not the process. It is only important for him that the validator correctly signs the blocks and they are accepted by the network.
  4. The entire message history will be saved in the telegram chat history.

Screenshot when the validator broken

This bot @TON_Validators_Bot is already running in test mode on the network net.ton.dev
t.me/TON_Validators_Bot - just type “/start”

Future: *Alerts is under constructions.
Sources: https://github.com/FreeTONi/Ton_Validators_Bot

5 Likes

Fixed a bug: Status set incorrectly on elections close.

docker-compose for a validator node.
General ideas are:

  1. To not use bash scripts from Tonlabs repo for build and startup.
  2. Fast and easy node upgrade with almost zero downtime.

Started a couple of hours ago so don’t have a lot.

2 Likes

Hey,

I’ve joined contest just yesterday and seems won’t be able to deliver working solution by 1st of June, but wanted to share architecture that I came up with and going to implement next week.

Notes:

  • Validator node doesn’t have any extra ports exposed
  • Every deployment can be scaled independently and whenever is required
  • Very flexible in controlling costs - Validator, Controller and Logstash are deployed via Docker (backed-up with docker-compose) either to bare-metal machine or VM. (Ansible can help in some maintenance later on, I’ve excluded Terraform as it’s not that good for bare-metal cases).
    At the same time monitoring can be either custom solution or one of SaaS solutions with pay-as-you-go subscriptions. The same applies for message-queue (either custom deployment or SaaS).
    With the current specs for a Validator node bare-metal machines will be the most cost-effective I believe comparing to any VM in any of cloud providers.
  • Pub/Sub layer provides good abstraction and allows to inject many type of notifications and ways to control validator(s), including safe for the validator web interfaces.
    It will be easy to integrate any kind of alerting and automatic response to those alerts.
  • Controller plan to implement as set of standalone libraries for tonos-cli, lite-client, validator-engine (re-usable for any other python apps as well) + controller logic itself with interface to message queue (so that extension to any MQ will be possible)
  • Controller will be responsible for automation of participation in elections, querying for configuration and blockchain data, help interaction with smart contracts.

Sad that tackled this TON context that late :\ But anyway will be striving to join validators group :slight_smile:

Implementation going to land here and once in working-order, some parts likely will be moved to separate repositories (ex py-tonos-cli, py-ton-lite-client, py-validator-engine).

3 Likes

Hello! I want to support validator node at least for the one year by this infrastructure. It’s just initial configuration, a lot of tests and additional functionality will be available soon.

For now I have Dockerfile for C++ node (which can be run in Openshift also)
Helm chart for Validator Node
High Availability infrastructure based on AWS
Logging system based on CloudWatch
EC2 monitoring

1 Like

Hello
Where to put our solution of this contest?