[OpenSIPS-Users] opensips-1.11.4: exec_msg_threshold warning logged showing delayes in unexpected places

Michael Ulitskiy mulitskiy at acedsl.com
Tue Dec 15 16:50:46 CET 2015


Hello,

So I wrote a script that watches opensips log for log_expiry warning and collects per-process
cpu utilization (using 'top' command) when such entry appears in the log.
Here's what it found:

LOG:
Dec 14 19:50:41 csp-p1n1 csp1[19109]: WARNING:core:log_expiry: threshold exceeded : msg processing took too long - 148363 us.Source : REGISTER sip:csp1.example.com:5060 SIP/2.0
Dec 14 19:50:41 csp-p1n1 csp1[19109]: WARNING:core:log_expiry: #1 is a module action : subst - 147916us - line 1308
Dec 14 19:50:41 csp-p1n1 csp1[19109]: WARNING:core:log_expiry: #2 is a module action : t_replicate - 78us - line 1310
Dec 14 19:50:41 csp-p1n1 csp1[19109]: WARNING:core:log_expiry: #3 is a module action : save - 61us - line 1291
Dec 14 19:50:41 csp-p1n1 csp1[19109]: WARNING:core:log_expiry: #4 is a module action : avp_subst - 34us - line 907
Dec 14 19:50:41 csp-p1n1 csp1[19109]: WARNING:core:log_expiry: #5 is a module action : pv_www_authorize - 15us - line 1254

TOP:
top - 19:50:41 up 350 days,  2:31,  0 users,  load average: 0.53, 0.33, 0.27
Tasks: 196 total,   3 running, 193 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  0.5%sy,  0.0%ni, 98.7%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4005248k total,  3776444k used,   228804k free,   286420k buffers
Swap:  2097144k total,    40960k used,  2056184k free,  1850068k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
19109 opensips  20   0  696m  19m  16m S   30  0.5  26:28.04 opensips           
19104 opensips  20   0  696m  19m  16m S    2  0.5  26:33.08 opensips           
19105 opensips  20   0  696m  19m  16m S    2  0.5  26:28.10 opensips           
    1 root      20   0  4356  572  492 S    0  0.0  13366:15 init               
    2 root      20   0     0    0    0 S    0  0.0   0:06.93 kthreadd           
    3 root      20   0     0    0    0 S    0  0.0  24:34.15 ksoftirqd/0        
    5 root       0 -20     0    0    0 S    0  0.0   0:00.00 kworker/0:0H       
    7 root      RT   0     0    0    0 S    0  0.0   1:56.92 migration/0       

opensips itself utilized 30% of cpu time.

here's another example:

LOG:
Dec 14 21:07:29 csp-p1n1 csp1[19110]: WARNING:core:log_expiry: threshold exceeded : msg processing took too long - 148509 us.Source : NOTIFY sip:nswasser002 at 192.168.36.132:5061 SIP/2.0
Dec 14 21:07:29 csp-p1n1 csp1[19110]: WARNING:core:log_expiry: #1 is a module action : avp_subst - 63us - line 907
Dec 14 21:07:29 csp-p1n1 csp1[19110]: WARNING:core:log_expiry: #2 is a module action : t_relay - 57us - line 1116
Dec 14 21:07:29 csp-p1n1 csp1[19110]: WARNING:core:log_expiry: #3 is a core action : 69 - 15us - line 941
Dec 14 21:07:29 csp-p1n1 csp1[19110]: WARNING:core:log_expiry: #4 is a module action : sipmsg_validate - 14us - line 1002
Dec 14 21:07:29 csp-p1n1 csp1[19110]: WARNING:core:log_expiry: #5 is a module action : lookup - 13us - line 1631

TOP:
top - 21:07:29 up 350 days,  3:48,  0 users,  load average: 0.15, 0.29, 0.31
Tasks: 196 total,   1 running, 195 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  0.5%sy,  0.0%ni, 98.7%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4005248k total,  3773700k used,   231548k free,   285912k buffers
Swap:  2097144k total,    40960k used,  2056184k free,  1854896k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
19110 opensips  20   0  696m  19m  16m S   30  0.5  26:31.94 opensips           
    9 root      20   0     0    0    0 S    2  0.0 184:25.29 rcu_sched          
19104 opensips  20   0  696m  19m  16m S    2  0.5  26:39.33 opensips           
19112 opensips  20   0  696m  19m  16m S    2  0.5  26:27.47 opensips           
19115 opensips  20   0  696m  19m  16m S    2  0.5  26:33.44 opensips           
19118 opensips  20   0  696m  19m  16m S    2  0.5  26:33.73 opensips           
25899 root      19  -1  193m  24m  11m S    2  0.6   6487:27 corosync           
    1 root      20   0  4356  572  492 S    0  0.0  13366:15 init               
    2 root      20   0     0    0    0 S    0  0.0   0:06.93 kthreadd           
    3 root      20   0     0    0    0 S    0  0.0  24:35.13 ksoftirqd/0        
    5 root       0 -20     0    0    0 S    0  0.0   0:00.00 kworker/0:0H      

again opensips itself utilized 30% of cpu time. in addition log_expiry message doesn't show a single action that would take long, just the whole message processing takes ~150ms.

it looks like opensips for no obvious reasons randomly spikes its cpu utilization. it's not periodic, doesn't correspond to any periodic process in the system and looks completely random.

i checked with MI 'ps' command and those cpu-spiking processes are all "SIP receiver" processes.

I'm at loss. What could be causing opensips process to spike its cpu usage like that?

I'd appreciate any ideas.

Thanks,

Michael


On Wednesday, December 02, 2015 07:19:06 PM Liviu Chircu wrote:
> Hi Michael,
> 
> The subst() warning is very interesting, since it's not doing I/O at all 
> (bonus: it's not even allocating shared memory!).
> 
> So first thing that comes to mind is some excessive CPU usage happening 
> on your machine. Some cron job or daemon which may be CPU starving the 
> OpenSIPS workers from time to time?
> 
> Best regards,
> 
> Liviu Chircu
> OpenSIPS Developer
> http://www.opensips-solutions.com
> 
> On 02.12.2015 17:46, Michael Ulitskiy wrote:
> >
> > Hello
> >
> > I have opensips-1.11.4 running as frontend proxy/registrar to a 
> > cluster of asterisk servers.
> >
> > It currently works under low load - around 50 messages per second.
> >
> > Nonetheless several times a day I'm receiving exec_msg_threshold 
> > warnings showing
> >
> > delays in the very unexpected places. exec_msg_threshold is set to 
> > 100ms (100000 microseconds).
> >
> > Here's a few examples:
> >
> > Dec 2 00:56:31 csp-p1n1 csp1[19102]: WARNING:core:log_expiry: 
> > threshold exceeded : msg processing took too long - 148931 us.Source : 
> > NOTIFY sip:srv.csp1.example.com SIP/2.0
> >
> > Dec 2 00:56:31 csp-p1n1 csp1[19102]: WARNING:core:log_expiry: #1 is a 
> > module action : pv_proxy_authorize - 148529us - line 1433
> >
> > Dec 2 00:56:31 csp-p1n1 csp1[19102]: WARNING:core:log_expiry: #2 is a 
> > module action : t_relay - 78us - line 1116
> >
> > Dec 2 00:56:31 csp-p1n1 csp1[19102]: WARNING:core:log_expiry: #3 is a 
> > module action : avp_subst - 32us - line 907
> >
> > Dec 2 00:56:31 csp-p1n1 csp1[19102]: WARNING:core:log_expiry: #4 is a 
> > core action : 69 - 11us - line 943
> >
> > Dec 2 00:56:31 csp-p1n1 csp1[19102]: WARNING:core:log_expiry: #5 is a 
> > module action : record_route - 8us - line 455
> >
> > pv_proxy_authorize() took 148ms. the code in question:
> >
> > if (!pv_proxy_authorize("example.com")) {
> >
> > ...
> >
> > }
> >
> > Another example:
> >
> > Dec 2 01:10:24 csp-p1n1 csp1[19104]: WARNING:core:log_expiry: 
> > threshold exceeded : msg processing took too long - 148508 us.Source : 
> > REGISTER sip:csp1.example.com:5060 SIP/2.0
> >
> > Dec 2 01:10:24 csp-p1n1 csp1[19104]: WARNING:core:log_expiry: #1 is a 
> > module action : subst - 147958us - line 1308
> >
> > Dec 2 01:10:24 csp-p1n1 csp1[19104]: WARNING:core:log_expiry: #2 is a 
> > module action : t_replicate - 85us - line 1310
> >
> > Dec 2 01:10:24 csp-p1n1 csp1[19104]: WARNING:core:log_expiry: #3 is a 
> > module action : save - 75us - line 1291
> >
> > Dec 2 01:10:24 csp-p1n1 csp1[19104]: WARNING:core:log_expiry: #4 is a 
> > module action : avp_subst - 45us - line 907
> >
> > Dec 2 01:10:24 csp-p1n1 csp1[19104]: WARNING:core:log_expiry: #5 is a 
> > module action : pv_www_authorize - 20us - line 1254
> >
> > Here's subst() took 148ms. the code in question:
> >
> > subst('/^To: .*sip:(.+)@.+/To: sip:\1@$rd/i');
> >
> > Yet another example:
> >
> > Dec 2 04:37:47 csp-p1n1 csp1[19119]: WARNING:core:log_expiry: 
> > threshold exceeded : msg processing took too long - 146908 us.Source : 
> > REGISTER sip:srv.csp1.example.com SIP/2.0
> >
> > Dec 2 04:37:47 csp-p1n1 csp1[19119]: WARNING:core:log_expiry: #1 is a 
> > module action : www_challenge - 146535us - line 1272
> >
> > Dec 2 04:37:47 csp-p1n1 csp1[19119]: WARNING:core:log_expiry: #2 is a 
> > module action : avp_subst - 57us - line 907
> >
> > Dec 2 04:37:47 csp-p1n1 csp1[19119]: WARNING:core:log_expiry: #3 is a 
> > module action : sipmsg_validate - 22us - line 1002
> >
> > Dec 2 04:37:47 csp-p1n1 csp1[19119]: WARNING:core:log_expiry: #4 is a 
> > module action : pv_www_authorize - 15us - line 1254
> >
> > Dec 2 04:37:47 csp-p1n1 csp1[19119]: WARNING:core:log_expiry: #5 is a 
> > module action : nat_uac_test - 11us - line 1202
> >
> > www_challenge() took 146ms. the code is:
> >
> > www_challenge("example.com","1");
> >
> > I've seen it in all kind of places including pure core operations, 
> > like pseudo-variables assignments.
> >
> > Does anybody have any idea what might be causing this?
> >
> > Thanks,
> >
> > Michael
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.opensips.org
> > http://lists.opensips.org/cgi-bin/mailman/listinfo/users
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20151215/9af81fe4/attachment-0001.htm>


More information about the Users mailing list