[OpenSIPS-Users] dialog-dlg-repl capability State=not synced in a 2-node cluster, but dialog-prof-repl State=Ok
Hari Radhakrishnan
harirkrishnan at gmail.com
Mon Jun 3 16:09:01 EDT 2019
Hi Opensips Team,
We are facing an issue with a 2-node opensips cluster in opensips-2.4
running on Ubuntu.
We would be really grateful if you could help with this. We have spent a
few days putting
DBG statements in the code, but didn't get anywhere significant.
The setup uses db_text module, with two separate files (having same data),
in each node.
The opensips instances are running on AWS EC2 in the same subnet (security
groups are
enabled).
With this setup (please see clusterer_list_cap output below), I get this.
Ideally, we only
require *dialog_replication_cluster* enabled. But if
*profile_replication_cluster* is not enabled,
the dialogs do not get replicated.
Cluster:: 1
Capability:: dialog-dlg-repl State=not synced
Capability:: dialog-prof-repl State=Ok
This is an Active-Passive setup, with keepalived enabled to switch-over the
Virtual IP.
When I enable INFO logs, I see these messages on startup -
*2019-06-04T05:14:53.909986+10:00 edge-opensips
/usr/local/sbin/opensips[6519]: ERROR:dialog:dlg_replicate_profiles: Failed
to replicate dialog profile*
*2019-06-04T05:14:54.011064+10:00 edge-opensips
/usr/local/sbin/opensips[6519]: ERROR:dialog:dlg_replicate_profiles: All
destinations in cluster: 1 are down or probing*
*2019-06-04T05:14:54.011096+10:00 edge-opensips
/usr/local/sbin/opensips[6519]: ERROR:dialog:dlg_replicate_profiles: Failed
to replicate dialog profile*
*2019-06-04T05:14:54.013319+10:00 edge-opensips
/usr/local/sbin/opensips[6414]: INFO:core:probe_max_sock_buff: using snd
buffer of 416 kb*
*2019-06-04T05:14:54.013336+10:00 edge-opensips
/usr/local/sbin/opensips[6414]: INFO:core:init_sock_keepalive: TCP
keepalive enabled on socket 8*
*2019-06-04T05:14:54.014216+10:00 edge-opensips
/usr/local/sbin/opensips[6461]: INFO:clusterer:handle_internal_msg: Node
[2] is UP*
When I enable DBG logs, these messages appear.
We saw that *dialog-dlg-repl* capability remains in *CAP_SYNC_PENDING
state. *
*2019-06-04T05:39:52.583634+10:00 edge-opensips
/usr/local/sbin/opensips[8867]: DBG:clusterer:cl_register_cap: Registered
capability: dialog-prof-repl*
*2019-06-04T05:39:52.583651+10:00 edge-opensips
/usr/local/sbin/opensips[8867]: DBG:clusterer:cl_register_cap: Registered
capability: dialog-dlg-repl*
*2019-06-04T05:39:52.583659+10:00 edge-opensips
/usr/local/sbin/opensips[8867]: DBG:clusterer:cl_request_sync: requesting
dialog-dlg-repl sync in cluster 1*
*2019-06-04T05:39:52.583668+10:00 edge-opensips
/usr/local/sbin/opensips[8867]: DBG:clusterer:cl_request_sync: failed to
find sync source*
The following logs keep running all the time (right after startup, without
any calls).
It appears that it is retrying to send some message but is stuck in a big
loop
due to a state which does not get changed. Similar logs are seen on the
other
node as well.
*2019-06-04T05:39:53.599460+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: INFO:clusterer:handle_internal_msg: Node
[2] is UP*
*2019-06-04T05:39:53.599467+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:clusterer:match_node: matched node 2*
*2019-06-04T05:39:53.599483+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:core:tcp_conn_get: con found in state 0*
*2019-06-04T05:39:53.599485+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:core:tcp_conn_get: tcp connection found
(0x7feee3585a98), acquiring fd*
*2019-06-04T05:39:53.599492+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:core:tcp_conn_get: c= 0x7feee3585a98,
n=16, Usock=303*
*2019-06-04T05:39:53.599545+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:core:tcp_conn_get: after receive_fd: c=
0x7feee3585a98 n=8 fd=304*
*2019-06-04T05:39:53.599552+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:proto_bin_send: sending via
fd 304...*
*2019-06-04T05:39:53.599560+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:async_tsend_stream: Async
successful write from first try on 0x7feee3585a98*
*2019-06-04T05:39:53.599562+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:proto_bin_send: after write:
c= 0x7feee3585a98 n/len=48/48 fd=304*
*2019-06-04T05:39:53.599565+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:clusterer:msg_send_retry: sent bin
packet to node [2]*
*2019-06-04T05:39:53.599574+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:bin_read_req: tcp_read_req
end*
*2019-06-04T05:39:53.599689+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:bin_read_req: Using the
global ( per process ) buff*
*2019-06-04T05:39:53.599698+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:bin_parse_headers: received a
COMPLETE message*
*2019-06-04T05:39:53.599700+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:bin_handle_req: completely
received a message*
*2019-06-04T05:39:53.599702+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:bin_handle_req: Nothing more
to read on TCP conn 0x7feee35857e0, currently in state 0*
*2019-06-04T05:39:53.599706+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:core:call_callbacks: binary Packet CMD:
8. Capability: clusterer-internal*
*2019-06-04T05:39:53.599709+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:clusterer:bin_rcv_cl_packets: received
clusterer message from: 10.100.0.142:41737 <http://10.100.0.142:41737> with
source id: 2 and cluster id: 1*
*2019-06-04T05:39:53.599711+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:clusterer:handle_internal_msg: Received
capability update with source [2]*
*2019-06-04T05:39:53.599713+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:core:bin_pop_str: Popped:
'dialog-dlg-repl' [15]*
*2019-06-04T05:39:53.599715+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:core:bin_pop_str: Popped:
'dialog-prof-repl' [16]*
*2019-06-04T05:39:53.599718+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:bin_read_req: tcp_read_req
end*
*2019-06-04T05:39:53.624271+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:bin_read_req: Using the
global ( per process ) buff*
*2019-06-04T05:39:53.624280+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:bin_parse_headers: received a
COMPLETE message*
*2019-06-04T05:39:53.624282+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:bin_handle_req: completely
received a message*
*2019-06-04T05:39:53.624284+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:bin_handle_req: Nothing more
to read on TCP conn 0x7feee35857e0, currently in state 0*
*2019-06-04T05:39:53.624288+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:core:call_callbacks: binary Packet CMD:
4. Capability: dialog-prof-repl*
*2019-06-04T05:39:53.624291+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:clusterer:bin_rcv_mod_packets: received
bin packet from: 10.100.0.142:41737 <http://10.100.0.142:41737> with source
id: 2 and cluster id: 1*
*2019-06-04T05:39:53.624295+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:core:bin_pop_str: Popped: 'inbound' [7]*
*2019-06-04T05:39:53.624298+10:00 edge-opensips
/usr/local/sbin/opensips[8919]: DBG:proto_bin:bin_read_req: tcp_read_req
end*
*2019-06-04T05:39:53.699828+10:00 edge-opensips
/usr/local/sbin/opensips[8983]: DBG:clusterer:match_node: matched node 2*
*2019-06-04T05:39:53.699852+10:00 edge-opensips
/usr/local/sbin/opensips[8983]: DBG:core:tcp_conn_get: con found in state 0*
*2019-06-04T05:39:53.699855+10:00 edge-opensips
/usr/local/sbin/opensips[8983]: DBG:core:tcp_conn_get: tcp connection found
(0x7feee3585a98), acquiring fd*
*2019-06-04T05:39:53.699863+10:00 edge-opensips
/usr/local/sbin/opensips[8983]: DBG:core:tcp_conn_get: c= 0x7feee3585a98,
n=16, Usock=429*
*2019-06-04T05:39:53.599515+10:00 edge-opensips
/usr/local/sbin/opensips[8984]: message repeated 2 times: [
DBG:core:handle_worker: read response= 7feee3585a98, 1, fd -1 from 52
(8919)]*
*2019-06-04T05:39:53.699885+10:00 edge-opensips
/usr/local/sbin/opensips[8984]: DBG:core:handle_worker: read response=
7feee3585a98, 1, fd -1 from 116 (8983)*
*2019-06-04T05:39:53.699899+10:00 edge-opensips
/usr/local/sbin/opensips[8983]: DBG:core:tcp_conn_get: after receive_fd: c=
0x7feee3585a98 n=8 fd=8*
*2019-06-04T05:39:53.699910+10:00 edge-opensips
/usr/local/sbin/opensips[8983]: DBG:proto_bin:proto_bin_send: sending via
fd 8...*
*2019-06-04T05:39:53.699952+10:00 edge-opensips
/usr/local/sbin/opensips[8983]: DBG:proto_bin:async_tsend_stream: Async
successful write from first try on 0x7feee3585a98*
*2019-06-04T05:39:53.699980+10:00 edge-opensips
/usr/local/sbin/opensips[8983]: DBG:proto_bin:proto_bin_send: after write:
c= 0x7feee3585a98 n/len=61/61 fd=8*
*2019-06-04T05:39:53.699994+10:00 edge-opensips
/usr/local/sbin/opensips[8983]: DBG:clusterer:msg_send_retry: sent bin
packet to node [2]*
===================== SETUP ============================
# salt -G hosttype:edge-opensips cmd.run 'opensipsctl fifo clusterer_list'
edge-opensips.10-100-0-142:
Cluster:: 1
Node:: 1 DB_ID=-1 URL=bin:10.100.0.141:5555 Enabled=1 Link_state=Up
Next_hop=1 Description=none
edge-opensips.10-100-0-141:
Cluster:: 1
Node:: 2 DB_ID=-1 URL=bin:10.100.0.142:5555 Enabled=1 Link_state=Up
Next_hop=2 Description=none
# salt -G hosttype:edge-opensips cmd.run 'opensipsctl fifo
clusterer_list_topology'
edge-opensips.10-100-0-141:
Cluster:: 1
Node:: 1 Neighbours=2
Node:: 2 Neighbours=1
edge-opensips.10-100-0-142:
Cluster:: 1
Node:: 2 Neighbours=1
Node:: 1 Neighbours=2
# salt -G hosttype:edge-opensips cmd.run 'opensipsctl fifo
clusterer_list_cap'
edge-opensips.10-100-0-142:
Cluster:: 1
Capability:: dialog-dlg-repl State=not synced
Capability:: dialog-prof-repl State=Ok
edge-opensips.10-100-0-141:
Cluster:: 1
Capability:: dialog-dlg-repl State=not synced
Capability:: dialog-prof-repl State=Ok
# salt -G hosttype:edge-opensips cmd.run 'netstat -anp | grep 5555'
edge-opensips.10-100-0-141:
tcp 0 0 10.100.0.141:5555 0.0.0.0:*
LISTEN 6409/opensips
tcp 0 0 10.100.0.141:38272 10.100.0.142:5555
ESTABLISHED
6526/opensips
tcp 0 0 10.100.0.141:5555 10.100.0.142:38355
ESTABLISHED
6461/opensips
edge-opensips.10-100-0-142:
tcp 0 0 10.100.0.142:5555 0.0.0.0:*
LISTEN 29658/opensips
tcp 0 0 10.100.0.142:5555 10.100.0.141:38272
ESTABLISHED
29710/opensips
tcp 0 0 10.100.0.142:38355 10.100.0.141:5555
ESTABLISHED
29775/opensips
===========================================================
******************* CONFIG ************************
loadmodule "proto_bin.so"
modparam("proto_bin", "bin_port", 5555)
modparam("proto_bin", "bin_send_timeout", 200)
# CLUSTERER
loadmodule "clusterer.so"
modparam("clusterer", "db_mode", 0)
modparam("clusterer", "db_url", "text:///etc/opensips/db/")
modparam("clusterer", "db_table", "clusterer")
modparam("clusterer", "ping_interval", 4)
modparam("clusterer", "ping_timeout", 1000) # tried setting higher value
modparam("clusterer", "node_timeout", 60)
modparam("clusterer", "current_id", 1)
modparam("clusterer", "current_info", "cluster_id=1,node_id=1,url=bin:
10.100.0.141:5555")
modparam("clusterer", "neighbor_info", "cluster_id=1,node_id=2,url=bin:
10.100.0.142:5555")
## Dialog Module
loadmodule "dialog.so"
modparam("dialog", "dlg_match_mode", 1) # DID_FALLBACK best method for now
modparam("dialog", "db_mode", 0)
modparam("dialog", "db_url", "text:///etc/opensips/db/")
modparam("dialog", "profiles_no_value", "inbound/b")
# For dialog replication
modparam("dialog", "profile_replication_cluster", 1)
modparam("dialog", "dialog_replication_cluster", 1)
modparam("dialog", "dlg_sharing_tag", "vip=active") # set to "vip=backup"
in the other node
---------------------------------
# DB_TEXT
# cat /etc/opensips/db/clusterer
id(int,auto) cluster_id(int) node_id(int) url(string) state(int)
no_ping_retries(int) priority(int) sip_addr(string,null) flags(string,null)
description(string,null)
10:1:1:bin\:10.100.0.141\:5555:1:3:50:NULL:NULL:primary
20:1:2:bin\:10.100.0.142\:5555:1:3:50:NULL:NULL:secondary
**************************************************
best regards,
Hari
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20190604/1486c0cf/attachment-0001.html>
More information about the Users
mailing list