Metrics
Prometheus endpoint configuration
diode-send and diode-receive implements a metric system compatible with Prometheus. To enable this system, is it required to provide the url which will be scrapped by prometheus.
There are two urls to set, one for the sender and one for the receiver. They both are under the “metric” option.
[sender]
metrics = "127.0.0.1:9001"
[receiver]
metrics = "127.0.0.1:9001"
Note
When running on the same host for tests, it is necessary to put different ports or it will fail with the following error: Cannot init metrics: cannot start http listener: failed to create HTTP listener: Address already in use
Relevant metrics
diode-send
tx_sessions : total number of TCP connections accepted by diode-send
tx_tcp_blocks : total number of blocks received on TCP sessions
tx_tcp_bytes : total number of bytes received on TCP sessions
tx_encoding_blocks : total number of blocks successfully encoded
tx_encoding_blocks_err : total number of blocks lost due to encoding error
tx_udp_pkts : total number of UDP packets successfully sent to diode-receive
tx_udp_bytes : total number of bytes successfully sent on UDP packets to diode-receive. This only is the udp payload without lidi header, this does not contain network transport headers of packets (Eth/IP/UDP). Since it contains repair packets and one raptorq header per block, the value is bigger than tx_tcp_bytes.
tx_udp_pkts_err : total number of UDP packets not sent (socket error)
tx_udp_bytes_err : total number of bytes not sent (socket error)
diode-receive
All stats of diode-receive starts with rx.
rx_sessions : total number of completed TCP sessions
rx_decoding_blocks : total number of blocks successfully decoded
rx_decoding_blocks_err : total number of blocks lost due to decoding error: too many packets missing or corrupted at the time of decoding.
rx_udp_pkts : total number of UDP packets successfully received
rx_udp_bytes : total number of bytes successfully received from UDP packets
rx_udp_deserialize_header_err : total number of lost UDP packets due to corrupted header
rx_udp_recv_pkts_err : total number of read socket failure
rx_udp_send_reorder_err : total number of lost UDP packets because it was impossible to push it to the reorder/decode queue. Try to increase “udp_packets_queue_size” receiver config value or reduce throughput with rate limiter or try to optimize RX performance receiver Multithreading.
rx_udp_pkts_missing : total number of missing UDP packets when trying to decode blocks (packet drops, header error or queue full…).
rx_tcp_blocks : total number of blocks sent on TCP session
rx_tcp_blocks_err : total number of lost blocks, not sent on TCP session (socket error)
rx_tcp_bytes : total number of bytes sent on TCP session
rx_tcp_bytes_err : total number of lost bytes, not sent on TCP session (socket error)
rx_pop_ok_packets : total number of packets sent to reordering module and which completed blocks. Reordering module used this packet to complete a block and returns it. This value should be equal or inferior to rx_decoding_blocks. (Inferior because we can sometimes successfully decode a block even if we do not have all packets (see rx_pop_timeout_with_packets).
rx_pop_ok_none : total number of packets sent to reordering module, without finishing a block. Reordering module kept this packet and returned nothing, waiting for other packets to finish a block
rx_pop_timeout_with_packets : the current block did not receive the needed packets to complete it before a timeout occurs. We will try to decode the block and maybe succeed if we received enough data.
rx_pop_timeout_none : a timeout happens when there was no waiting packet for the current block.
rx_send_block_err : total number of lost blocks because it was impossible to push it to the TCP sender queue (most probably because it is full). Try to increase “tcp_blocks_queue_size” receiver config value or adjust sender/receiver TCP throughput.
rx_skip_block : number of completed blocks dropped because the session is broken (we lost a previous block).
snmp_ip_in_discards : From kernel: the number of input IP datagrams for which no problems were encountered to prevent their continued processing, but which were discarded (e.g., for lack of buffer space). See RFC 1213.
snmp_udp_in_errors : From kernel: the number of received UDP datagrams that could not be delivered for reasons other than the lack of an application at the destination port. See RFC 1213.
cpu_usage : Same as top. Label is thread name.
Summary of data loss metrics (diode-receive side)
Packet loss metrics
If too many packets are lost, we will see block decoding error.
rx_udp_deserialize_header_err
rx_udp_send_reorder_err
rx_udp_pkts_missing
rx_udp_recv_pkts_err (maybe ? not sure of possible error case)
Block loss metrics
If a block is lost, the whole session is lost.
rx_decoding_blocks_err
rx_send_block_err
rx_tcp_blocks_err