[OpenSIPS-Users] opensips-1.11.4: exec_msg_threshold warning logged showing delayes in unexpected places
Michael Ulitskiy
mulitskiy at acedsl.com
Tue Dec 15 16:50:46 CET 2015
Hello,
So I wrote a script that watches opensips log for log_expiry warning and collects per-process
cpu utilization (using 'top' command) when such entry appears in the log.
Here's what it found:
LOG:
Dec 14 19:50:41 csp-p1n1 csp1[19109]: WARNING:core:log_expiry: threshold exceeded : msg processing took too long - 148363 us.Source : REGISTER sip:csp1.example.com:5060 SIP/2.0
Dec 14 19:50:41 csp-p1n1 csp1[19109]: WARNING:core:log_expiry: #1 is a module action : subst - 147916us - line 1308
Dec 14 19:50:41 csp-p1n1 csp1[19109]: WARNING:core:log_expiry: #2 is a module action : t_replicate - 78us - line 1310
Dec 14 19:50:41 csp-p1n1 csp1[19109]: WARNING:core:log_expiry: #3 is a module action : save - 61us - line 1291
Dec 14 19:50:41 csp-p1n1 csp1[19109]: WARNING:core:log_expiry: #4 is a module action : avp_subst - 34us - line 907
Dec 14 19:50:41 csp-p1n1 csp1[19109]: WARNING:core:log_expiry: #5 is a module action : pv_www_authorize - 15us - line 1254
TOP:
top - 19:50:41 up 350 days, 2:31, 0 users, load average: 0.53, 0.33, 0.27
Tasks: 196 total, 3 running, 193 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 0.5%sy, 0.0%ni, 98.7%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4005248k total, 3776444k used, 228804k free, 286420k buffers
Swap: 2097144k total, 40960k used, 2056184k free, 1850068k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19109 opensips 20 0 696m 19m 16m S 30 0.5 26:28.04 opensips
19104 opensips 20 0 696m 19m 16m S 2 0.5 26:33.08 opensips
19105 opensips 20 0 696m 19m 16m S 2 0.5 26:28.10 opensips
1 root 20 0 4356 572 492 S 0 0.0 13366:15 init
2 root 20 0 0 0 0 S 0 0.0 0:06.93 kthreadd
3 root 20 0 0 0 0 S 0 0.0 24:34.15 ksoftirqd/0
5 root 0 -20 0 0 0 S 0 0.0 0:00.00 kworker/0:0H
7 root RT 0 0 0 0 S 0 0.0 1:56.92 migration/0
opensips itself utilized 30% of cpu time.
here's another example:
LOG:
Dec 14 21:07:29 csp-p1n1 csp1[19110]: WARNING:core:log_expiry: threshold exceeded : msg processing took too long - 148509 us.Source : NOTIFY sip:nswasser002 at 192.168.36.132:5061 SIP/2.0
Dec 14 21:07:29 csp-p1n1 csp1[19110]: WARNING:core:log_expiry: #1 is a module action : avp_subst - 63us - line 907
Dec 14 21:07:29 csp-p1n1 csp1[19110]: WARNING:core:log_expiry: #2 is a module action : t_relay - 57us - line 1116
Dec 14 21:07:29 csp-p1n1 csp1[19110]: WARNING:core:log_expiry: #3 is a core action : 69 - 15us - line 941
Dec 14 21:07:29 csp-p1n1 csp1[19110]: WARNING:core:log_expiry: #4 is a module action : sipmsg_validate - 14us - line 1002
Dec 14 21:07:29 csp-p1n1 csp1[19110]: WARNING:core:log_expiry: #5 is a module action : lookup - 13us - line 1631
TOP:
top - 21:07:29 up 350 days, 3:48, 0 users, load average: 0.15, 0.29, 0.31
Tasks: 196 total, 1 running, 195 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 0.5%sy, 0.0%ni, 98.7%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4005248k total, 3773700k used, 231548k free, 285912k buffers
Swap: 2097144k total, 40960k used, 2056184k free, 1854896k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19110 opensips 20 0 696m 19m 16m S 30 0.5 26:31.94 opensips
9 root 20 0 0 0 0 S 2 0.0 184:25.29 rcu_sched
19104 opensips 20 0 696m 19m 16m S 2 0.5 26:39.33 opensips
19112 opensips 20 0 696m 19m 16m S 2 0.5 26:27.47 opensips
19115 opensips 20 0 696m 19m 16m S 2 0.5 26:33.44 opensips
19118 opensips 20 0 696m 19m 16m S 2 0.5 26:33.73 opensips
25899 root 19 -1 193m 24m 11m S 2 0.6 6487:27 corosync
1 root 20 0 4356 572 492 S 0 0.0 13366:15 init
2 root 20 0 0 0 0 S 0 0.0 0:06.93 kthreadd
3 root 20 0 0 0 0 S 0 0.0 24:35.13 ksoftirqd/0
5 root 0 -20 0 0 0 S 0 0.0 0:00.00 kworker/0:0H
again opensips itself utilized 30% of cpu time. in addition log_expiry message doesn't show a single action that would take long, just the whole message processing takes ~150ms.
it looks like opensips for no obvious reasons randomly spikes its cpu utilization. it's not periodic, doesn't correspond to any periodic process in the system and looks completely random.
i checked with MI 'ps' command and those cpu-spiking processes are all "SIP receiver" processes.
I'm at loss. What could be causing opensips process to spike its cpu usage like that?
I'd appreciate any ideas.
Thanks,
Michael
On Wednesday, December 02, 2015 07:19:06 PM Liviu Chircu wrote:
> Hi Michael,
>
> The subst() warning is very interesting, since it's not doing I/O at all
> (bonus: it's not even allocating shared memory!).
>
> So first thing that comes to mind is some excessive CPU usage happening
> on your machine. Some cron job or daemon which may be CPU starving the
> OpenSIPS workers from time to time?
>
> Best regards,
>
> Liviu Chircu
> OpenSIPS Developer
> http://www.opensips-solutions.com
>
> On 02.12.2015 17:46, Michael Ulitskiy wrote:
> >
> > Hello
> >
> > I have opensips-1.11.4 running as frontend proxy/registrar to a
> > cluster of asterisk servers.
> >
> > It currently works under low load - around 50 messages per second.
> >
> > Nonetheless several times a day I'm receiving exec_msg_threshold
> > warnings showing
> >
> > delays in the very unexpected places. exec_msg_threshold is set to
> > 100ms (100000 microseconds).
> >
> > Here's a few examples:
> >
> > Dec 2 00:56:31 csp-p1n1 csp1[19102]: WARNING:core:log_expiry:
> > threshold exceeded : msg processing took too long - 148931 us.Source :
> > NOTIFY sip:srv.csp1.example.com SIP/2.0
> >
> > Dec 2 00:56:31 csp-p1n1 csp1[19102]: WARNING:core:log_expiry: #1 is a
> > module action : pv_proxy_authorize - 148529us - line 1433
> >
> > Dec 2 00:56:31 csp-p1n1 csp1[19102]: WARNING:core:log_expiry: #2 is a
> > module action : t_relay - 78us - line 1116
> >
> > Dec 2 00:56:31 csp-p1n1 csp1[19102]: WARNING:core:log_expiry: #3 is a
> > module action : avp_subst - 32us - line 907
> >
> > Dec 2 00:56:31 csp-p1n1 csp1[19102]: WARNING:core:log_expiry: #4 is a
> > core action : 69 - 11us - line 943
> >
> > Dec 2 00:56:31 csp-p1n1 csp1[19102]: WARNING:core:log_expiry: #5 is a
> > module action : record_route - 8us - line 455
> >
> > pv_proxy_authorize() took 148ms. the code in question:
> >
> > if (!pv_proxy_authorize("example.com")) {
> >
> > ...
> >
> > }
> >
> > Another example:
> >
> > Dec 2 01:10:24 csp-p1n1 csp1[19104]: WARNING:core:log_expiry:
> > threshold exceeded : msg processing took too long - 148508 us.Source :
> > REGISTER sip:csp1.example.com:5060 SIP/2.0
> >
> > Dec 2 01:10:24 csp-p1n1 csp1[19104]: WARNING:core:log_expiry: #1 is a
> > module action : subst - 147958us - line 1308
> >
> > Dec 2 01:10:24 csp-p1n1 csp1[19104]: WARNING:core:log_expiry: #2 is a
> > module action : t_replicate - 85us - line 1310
> >
> > Dec 2 01:10:24 csp-p1n1 csp1[19104]: WARNING:core:log_expiry: #3 is a
> > module action : save - 75us - line 1291
> >
> > Dec 2 01:10:24 csp-p1n1 csp1[19104]: WARNING:core:log_expiry: #4 is a
> > module action : avp_subst - 45us - line 907
> >
> > Dec 2 01:10:24 csp-p1n1 csp1[19104]: WARNING:core:log_expiry: #5 is a
> > module action : pv_www_authorize - 20us - line 1254
> >
> > Here's subst() took 148ms. the code in question:
> >
> > subst('/^To: .*sip:(.+)@.+/To: sip:\1@$rd/i');
> >
> > Yet another example:
> >
> > Dec 2 04:37:47 csp-p1n1 csp1[19119]: WARNING:core:log_expiry:
> > threshold exceeded : msg processing took too long - 146908 us.Source :
> > REGISTER sip:srv.csp1.example.com SIP/2.0
> >
> > Dec 2 04:37:47 csp-p1n1 csp1[19119]: WARNING:core:log_expiry: #1 is a
> > module action : www_challenge - 146535us - line 1272
> >
> > Dec 2 04:37:47 csp-p1n1 csp1[19119]: WARNING:core:log_expiry: #2 is a
> > module action : avp_subst - 57us - line 907
> >
> > Dec 2 04:37:47 csp-p1n1 csp1[19119]: WARNING:core:log_expiry: #3 is a
> > module action : sipmsg_validate - 22us - line 1002
> >
> > Dec 2 04:37:47 csp-p1n1 csp1[19119]: WARNING:core:log_expiry: #4 is a
> > module action : pv_www_authorize - 15us - line 1254
> >
> > Dec 2 04:37:47 csp-p1n1 csp1[19119]: WARNING:core:log_expiry: #5 is a
> > module action : nat_uac_test - 11us - line 1202
> >
> > www_challenge() took 146ms. the code is:
> >
> > www_challenge("example.com","1");
> >
> > I've seen it in all kind of places including pure core operations,
> > like pseudo-variables assignments.
> >
> > Does anybody have any idea what might be causing this?
> >
> > Thanks,
> >
> > Michael
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.opensips.org
> > http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20151215/9af81fe4/attachment-0001.htm>
More information about the Users
mailing list