[OpenSIPS-Users] sched_yield()
Bogdan-Andrei Iancu
bogdan at voice-system.ro
Fri Jan 22 18:39:40 CET 2010
Hi Alex,
Our tests with 1.6 (before release) did not indicated any degradation
with performance. Of course this strongly depends on what kind of
scenario you have on the proxy.
So:
1) do you use dialog support ?
2) if 1) is yes, do you use profiling ?
3) if 2) is yes, do you put all calls in a single profile ?
if you indeed use dialog + profiling, maybe you should try to A) remove
profiling and B) remove dialog support, to see how the performance
changes (if it does).
Regards,
Bogdan
Alex Massover wrote:
> Hi,
>
> Now shared memory is 1G (-m 1024), and all memory is dedicated to the virtual machine (it was shared till now).
> But it still happens, just not so often.
>
> I originate the calls for this stress test in Asterisk with the same resources and looks like Asterisk performs much better than OpenSIPS. How can it be?
> In my stress OpenSIPS does no blocking/slow requests. And it's just 4K concurrent calls, each one is 2-3 min.
>
> Maybe OpenSIPS does too much low level memory management and virtual machine is not suitable for it (despite that Asterisk runs well over VMware)?
>
> I'm not sure but I have a feeling that 1.4 performed better. What can cause performance degradation in 1.6? Storing vars on dialog, new malloc()?
>
> gdb) bt
> #0 0xb78ad424 in __kernel_vsyscall ()
> #1 0xb77e841c in sched_yield () from /lib/i686/cmov/libc.so.6
> #2 0x080bf23d in new_avp ()
> #3 0x080bf53f in add_avp ()
> #4 0x08080e6e in pv_set_avp ()
> #5 0x0808229c in pv_set_value ()
> #6 0x08053c9d in do_assign ()
> #7 0x0805447a in do_action ()
> #8 0x08053ebf in run_action_list ()
> #9 0x08056e7a in do_action ()
> #10 0x08053ebf in run_action_list ()
> #11 0x08056e7a in do_action ()
> #12 0x08053ebf in run_action_list ()
> #13 0x080569d8 in do_action ()
> #14 0x08053ebf in run_action_list ()
> #15 0x08056e7a in do_action ()
> #16 0x08053ebf in run_action_list ()
> #17 0x08057d99 in run_top_route ()
> #18 0x0808ad6c in receive_msg ()
> #19 0x080bd2f2 in udp_rcv_loop ()
> #20 0x08069339 in main ()
> (gdb) quit
>
>
> (gdb) bt
> #0 0xb78ad424 in __kernel_vsyscall ()
> #1 0xb77e841c in sched_yield () from /lib/i686/cmov/libc.so.6
> #2 0xb76f52cd in build_cell () from /usr/lib/opensips/modules/tm.so
> #3 0xb770ac4a in t_newtran () from /usr/lib/opensips/modules/tm.so
> #4 0xb76ff7b8 in t_relay_to () from /usr/lib/opensips/modules/tm.so
> #5 0xb770c501 in ?? () from /usr/lib/opensips/modules/tm.so
> #6 0x08055030 in do_action ()
> #7 0x08053ebf in run_action_list ()
> #8 0x08095cf2 in eval_expr ()
> #9 0x080958d9 in eval_expr ()
> #10 0x08095919 in eval_expr ()
> #11 0x080554e2 in do_action ()
> #12 0x08053ebf in run_action_list ()
> #13 0x080569d8 in do_action ()
> #14 0x08053ebf in run_action_list ()
> #15 0x08056e7a in do_action ()
> #16 0x08053ebf in run_action_list ()
> #17 0x08057d99 in run_top_route ()
> #18 0x0808ad6c in receive_msg ()
> #19 0x080bd2f2 in udp_rcv_loop ()
> #20 0x08069339 in main ()
>
>
> --
> Best Regards,
> Alex Massover
> VoIP R&D TL
> Jajah Inc.
>
>
>> -----Original Message-----
>> From: users-bounces at lists.opensips.org [mailto:users-
>> bounces at lists.opensips.org] On Behalf Of Andrei Dragus
>> Sent: Thursday, January 21, 2010 3:43 PM
>> To: OpenSIPS users mailling list
>> Subject: Re: [OpenSIPS-Users] sched_yield()
>>
>> My guess is that there is not enough shared memory. When an allocation
>> failes OpenSIPS tries to defragment memory to make room which takes a
>> lot of time and must be done under lock.
>>
>> Please try to increase the shared memory size and tell me if it
>> persists.
>>
>>
>> Alex Massover wrote:
>>
>>> Hi!
>>>
>>> Yes, with -DF_MALLOC.
>>>
>>> 1.6.1 from sources, I build deb package.
>>> I use 128M of shared and 10*1024*1024 private memory (can increase -
>>>
>> no problem).
>>
>>> Hmmmm, "opensipsctl fifo get_statistics all" crashes/stops the
>>>
>> opensips.
>>
>>> 'fifo uptime' or 'fifo debug' are OK.
>>>
>>> strace while 'fifo get_statistics all':
>>> Process 9509 attached - interrupt to quit
>>> pause() = ? ERESTARTNOHAND (To be
>>>
>> restarted)
>>
>>> --- SIGUSR2 (User defined signal 2) @ 0 (0) ---
>>> sigreturn() = ? (mask now [])
>>> pause() = ? ERESTARTNOHAND (To be
>>>
>> restarted)
>>
>>> --- SIGCHLD (Child exited) @ 0 (0) ---
>>> sigreturn() = ? (mask now [])
>>> waitpid(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGUSR2}], WNOHANG) =
>>>
>> 9520
>>
>>> waitpid(-1, 0xbf84b4c8, WNOHANG) = 0
>>> kill(0, SIGTERM) = 0
>>> --- SIGTERM (Terminated) @ 0 (0) ---
>>> --- SIGCHLD (Child exited) @ 0 (0) ---
>>> sigreturn() = ? (mask now [TERM])
>>> sigreturn() = ? (mask now [])
>>> rt_sigaction(SIGALRM, {0x8065920, [ALRM], SA_RESTART}, {SIG_DFL}, 8)
>>>
>> = 0
>>
>>> alarm(60) = 0
>>> wait4(-1, NULL, 0, NULL) = 9514
>>> wait4(-1, NULL, 0, NULL) = 9519
>>> wait4(-1, NULL, 0, NULL) = 9521
>>> wait4(-1, NULL, 0, NULL) = 9522
>>> wait4(-1, NULL, 0, NULL) = 9512
>>> --- SIGCHLD (Child exited) @ 0 (0) ---
>>> sigreturn() = ? (mask now [])
>>> --- SIGCHLD (Child exited) @ 0 (0) ---
>>> sigreturn() = ? (mask now [])
>>> wait4(-1, NULL, 0, NULL) = 9510
>>> wait4(-1, NULL, 0, NULL) = 9516
>>> --- SIGCHLD (Child exited) @ 0 (0) ---
>>> sigreturn() = ? (mask now [])
>>> --- SIGCHLD (Child exited) @ 0 (0) ---
>>> sigreturn() = ? (mask now [])
>>> wait4(-1, NULL, 0, NULL) = 9515
>>> wait4(-1, NULL, 0, NULL) = 9517
>>> wait4(-1, NULL, 0, NULL) = 9524
>>> wait4(-1, NULL, 0, NULL) = 9525
>>> --- SIGCHLD (Child exited) @ 0 (0) ---
>>> sigreturn() = ? (mask now [])
>>> --- SIGCHLD (Child exited) @ 0 (0) ---
>>> sigreturn() = ? (mask now [])
>>> --- SIGCHLD (Child exited) @ 0 (0) ---
>>> sigreturn() = ? (mask now [])
>>> --- SIGCHLD (Child exited) @ 0 (0) ---
>>> sigreturn() = ? (mask now [])
>>> wait4(-1, NULL, 0, NULL) = 9511
>>> wait4(-1, NULL, 0, NULL) = 9513
>>> wait4(-1, NULL, 0, NULL) = 9518
>>> wait4(-1, NULL, 0, NULL) = 9523
>>> wait4(-1, NULL, 0, NULL) = -1 ECHILD (No child
>>>
>> processes)
>>
>>> rt_sigaction(SIGALRM, {0x8066080, [ALRM], SA_RESTART}, {0x8065920,
>>>
>> [ALRM], SA_RESTART}, 8) = 0
>>
>>> stat64("/tmp/opensips_fifo", {st_mode=S_IFIFO|0660, st_size=0, ...})
>>>
>> = 0
>>
>>> unlink("/tmp/opensips_fifo") = 0
>>> munmap(0xaed25000, 134217728) = 0
>>> unlink("/var/run/opensips/opensips.pid") = 0
>>> alarm(0) = 60
>>> rt_sigaction(SIGALRM, {SIG_IGN}, {0x8066080, [ALRM], SA_RESTART}, 8)
>>>
>> = 0
>>
>>> exit_group(0) = ?
>>> Process 9509 detached
>>>
>>> --
>>> Best Regards,
>>> Alex Massover
>>> VoIP R&D TL
>>> Jajah Inc.
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: users-bounces at lists.opensips.org [mailto:users-
>>>> bounces at lists.opensips.org] On Behalf Of Andrei Dragus
>>>> Sent: Thursday, January 21, 2010 3:09 PM
>>>> To: OpenSIPS users mailling list
>>>> Subject: Re: [OpenSIPS-Users] sched_yield()
>>>>
>>>>
>>>> Hi,
>>>>
>>>> Since all the backtraces are in allocation routines my guess is that
>>>> the
>>>> shared memory lock might be causing a problem.
>>>>
>>>> Are you compiling with -DF_MALLOC?
>>>> What version of OpenSIPS are you using?
>>>> What is the total shared memory pool you are allocating?
>>>> What amount of memory are you using? ( Use : opensipsctl fifo
>>>> get_statistics all )
>>>>
>>>> Alex Massover wrote:
>>>>
>>>>
>>>>> Some more,
>>>>>
>>>>> (gdb) bt
>>>>> #0 0xb78dc424 in __kernel_vsyscall ()
>>>>> #1 0xb781741c in sched_yield () from /lib/i686/cmov/libc.so.6
>>>>> #2 0xb73d77fd in build_new_dlg () from
>>>>>
>>>>>
>>>> /usr/lib/opensips/modules/dialog.so
>>>>
>>>>
>>>>> #3 0xb73d4b81 in dlg_create_dialog () from
>>>>>
>>>>>
>>>> /usr/lib/opensips/modules/dialog.so
>>>>
>>>>
>>>>> #4 0xb73c9c9e in ?? () from /usr/lib/opensips/modules/dialog.so
>>>>> #5 0x08055030 in do_action ()
>>>>> #6 0x08053ebf in run_action_list ()
>>>>> #7 0x08056e7a in do_action ()
>>>>> #8 0x08053ebf in run_action_list ()
>>>>> #9 0x08057d99 in run_top_route ()
>>>>> #10 0x0808ad6c in receive_msg ()
>>>>> #11 0x080bd2f2 in udp_rcv_loop ()
>>>>> #12 0x08069339 in main ()
>>>>>
>>>>>
>>>>> (gdb) bt
>>>>> #0 0xb78dc424 in __kernel_vsyscall ()
>>>>> #1 0xb781741c in sched_yield () from /lib/i686/cmov/libc.so.6
>>>>> #2 0xb77242cd in build_cell () from
>>>>>
>> /usr/lib/opensips/modules/tm.so
>>
>>>>> #3 0xb7739c4a in t_newtran () from /usr/lib/opensips/modules/tm.so
>>>>> #4 0xb772e7b8 in t_relay_to () from
>>>>>
>> /usr/lib/opensips/modules/tm.so
>>
>>>>> #5 0xb773b501 in ?? () from /usr/lib/opensips/modules/tm.so
>>>>> #6 0x08055030 in do_action ()
>>>>> #7 0x08053ebf in run_action_list ()
>>>>> #8 0x08095cf2 in eval_expr ()
>>>>> #9 0x080958d9 in eval_expr ()
>>>>> #10 0x08095919 in eval_expr ()
>>>>> #11 0x080554e2 in do_action ()
>>>>> #12 0x08053ebf in run_action_list ()
>>>>> #13 0x080569d8 in do_action ()
>>>>> #14 0x08053ebf in run_action_list ()
>>>>> #15 0x08056e7a in do_action ()
>>>>> #16 0x08053ebf in run_action_list ()
>>>>> #17 0x08057d99 in run_top_route ()
>>>>> #18 0x0808ad6c in receive_msg ()
>>>>> #19 0x080bd2f2 in udp_rcv_loop ()
>>>>> #20 0x08069339 in main ()
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Alex Massover
>>>>> VoIP R&D TL
>>>>> Jajah Inc.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: users-bounces at lists.opensips.org [mailto:users-
>>>>>> bounces at lists.opensips.org] On Behalf Of Alex Massover
>>>>>> Sent: Thursday, January 21, 2010 2:24 PM
>>>>>> To: OpenSIPS users mailling list
>>>>>> Subject: Re: [OpenSIPS-Users] sched_yield()
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Another one.. It hangs for a number of seconds (but it's enough to
>>>>>> cause to SIP timeouts - MSG queue jumps to 260K), it's hard to
>>>>>>
>> make
>>
>>>> a
>>>>
>>>>
>>>>>> bt at the right moment.
>>>>>> This one looks better because there's sched_yield() there :)
>>>>>>
>>>>>> (gdb) bt
>>>>>> #0 0xb77d5424 in __kernel_vsyscall ()
>>>>>> #1 0xb771041c in sched_yield () from /lib/i686/cmov/libc.so.6
>>>>>> #2 0x080bf23d in new_avp ()
>>>>>> #3 0x080bf53f in add_avp ()
>>>>>> #4 0xb72c1c9c in ?? () from /usr/lib/opensips/modules/dialog.so
>>>>>> #5 0x08055030 in do_action ()
>>>>>> #6 0x08053ebf in run_action_list ()
>>>>>> #7 0x08056e7a in do_action ()
>>>>>> #8 0x08053ebf in run_action_list ()
>>>>>> #9 0x08056e7a in do_action ()
>>>>>> #10 0x08053ebf in run_action_list ()
>>>>>> #11 0x08056e7a in do_action ()
>>>>>> #12 0x08053ebf in run_action_list ()
>>>>>> #13 0x08057d99 in run_top_route ()
>>>>>> #14 0x0808ad6c in receive_msg ()
>>>>>> #15 0x080bd2f2 in udp_rcv_loop ()
>>>>>> #16 0x08069339 in main ()
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Alex Massover
>>>>>> VoIP R&D TL
>>>>>> Jajah Inc.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: users-bounces at lists.opensips.org [mailto:users-
>>>>>>> bounces at lists.opensips.org] On Behalf Of Alex Massover
>>>>>>> Sent: Thursday, January 21, 2010 2:05 PM
>>>>>>> To: OpenSIPS users mailling list
>>>>>>> Subject: Re: [OpenSIPS-Users] sched_yield()
>>>>>>>
>>>>>>> Hi Andrei,
>>>>>>> Hopefully this is it (with FASTLOCK)
>>>>>>>
>>>>>>> #0 0xb77d5424 in __kernel_vsyscall ()
>>>>>>> #1 0xb772babb in poll () from /lib/i686/cmov/libc.so.6
>>>>>>> #2 0xb77ba83a in ?? () from /lib/i686/cmov/libresolv.so.2
>>>>>>> #3 0xb77b8946 in __libc_res_nquery () from
>>>>>>> /lib/i686/cmov/libresolv.so.2
>>>>>>> #4 0xb77b8fdb in ?? () from /lib/i686/cmov/libresolv.so.2
>>>>>>> #5 0xb77b92ae in __libc_res_nsearch () from
>>>>>>> /lib/i686/cmov/libresolv.so.2
>>>>>>> #6 0xb77b96d4 in __res_nsearch () from
>>>>>>>
>>>>>>>
>>>> /lib/i686/cmov/libresolv.so.2
>>>>
>>>>
>>>>>>> #7 0xb77b808a in res_search () from
>>>>>>>
>> /lib/i686/cmov/libresolv.so.2
>>
>>>>>>> #8 0x0808c613 in get_record ()
>>>>>>> #9 0x0808cf05 in ?? ()
>>>>>>> #10 0x0808e385 in sip_resolvehost ()
>>>>>>> #11 0x0807a26c in mk_proxy ()
>>>>>>> #12 0xb7627d39 in t_relay_to () from
>>>>>>>
>>>>>>>
>>>> /usr/lib/opensips/modules/tm.so
>>>>
>>>>
>>>>>>> #13 0xb7634501 in ?? () from /usr/lib/opensips/modules/tm.so
>>>>>>> #14 0x08055030 in do_action ()
>>>>>>> #15 0x08053ebf in run_action_list ()
>>>>>>> #16 0x08095cf2 in eval_expr ()
>>>>>>> #17 0x080958d9 in eval_expr ()
>>>>>>> #18 0x08095919 in eval_expr ()
>>>>>>> #19 0x080554e2 in do_action ()
>>>>>>> #20 0x08053ebf in run_action_list ()
>>>>>>> #21 0x08056e7a in do_action ()
>>>>>>> #22 0x08053ebf in run_action_list ()
>>>>>>> ---Type <return> to continue, or q <return> to quit---
>>>>>>> #23 0x080569d8 in do_action ()
>>>>>>> #24 0x08053ebf in run_action_list ()
>>>>>>> #25 0x08056e7a in do_action ()
>>>>>>> #26 0x08053ebf in run_action_list ()
>>>>>>> #27 0x08057d99 in run_top_route ()
>>>>>>> #28 0x0808ad6c in receive_msg ()
>>>>>>> #29 0x080bd2f2 in udp_rcv_loop ()
>>>>>>> #30 0x08069339 in main ()
>>>>>>> (gdb)
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Alex Massover
>>>>>>> VoIP R&D TL
>>>>>>> Jajah Inc.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: users-bounces at lists.opensips.org [mailto:users-
>>>>>>>> bounces at lists.opensips.org] On Behalf Of Andrei Dragus
>>>>>>>> Sent: Wednesday, January 20, 2010 2:58 PM
>>>>>>>> To: OpenSIPS users mailling list
>>>>>>>> Subject: Re: [OpenSIPS-Users] sched_yield()
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I think that there is a lock that is being held more than it
>>>>>>>>
>>>>>>>>
>>>> should
>>>>
>>>>
>>>>>>> be
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> and that's what causes starvation. It would help us if you could
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> attach
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> to a process using gdb and give us a full backtrace.
>>>>>>>>
>>>>>>>> Temporary solutions which should work would be to reduce the
>>>>>>>>
>>>>>>>>
>>>> number
>>>>
>>>>
>>>>>>> of
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> processes to 4-6 or to recompile replacing -DFAST_LOCK with one
>>>>>>>>
>> of
>>
>>>>>>>>
>>>>>>> the
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> other options (-DUSE_POSIX_SEM or -DUSE_PTHREAD_MUTEX) but we
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> should
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> see
>>>>>>>> where this is from to fix it.
>>>>>>>>
>>>>>>>> Alex Massover wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi!
>>>>>>>>>
>>>>>>>>> Yes, from the source on debian, I build deb package. (I did
>>>>>>>>>
>> some
>>
>>>>>>>>>
>>>>>>>> minor changes to the source, but the problem happens also
>>>>>>>>
>> without
>>
>>>>>>>>
>>>>>> my
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> changes)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> 16 children on 4 cores.
>>>>>>>>>
>>>>>>>>> What do you suggest to reduce it to 4? It runs on 2.6.32 on
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>> VMware
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> ESX.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> I'm also trying now sleep(0) instead of sched_yield().
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Alex Massover
>>>>>>>>> VoIP R&D TL
>>>>>>>>> Jajah Inc.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: users-bounces at lists.opensips.org [mailto:users-
>>>>>>>>>> bounces at lists.opensips.org] On Behalf Of Andrei Dragus
>>>>>>>>>> Sent: Wednesday, January 20, 2010 1:05 PM
>>>>>>>>>> To: OpenSIPS users mailling list
>>>>>>>>>> Subject: Re: [OpenSIPS-Users] sched_yield()
>>>>>>>>>>
>>>>>>>>>> Hi Alex,
>>>>>>>>>>
>>>>>>>>>> Are you building OpenSIPS from source?
>>>>>>>>>> How many processes do you have and on how many cores?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Alex Massover wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hello!
>>>>>>>>>>>
>>>>>>>>>>> I'm facing a strange problem, sometimes under a stress
>>>>>>>>>>>
>> OpenSIPS
>>
>>>>>>>>>>> "locks" - load average jumps, SIP processing delays, opensips
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> msg
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>>>> queue fills with a lot of sip messages, opensips processes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> start
>>>>>>
>>>>>>
>>>>>>
>>>>>>> to
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>>> comsume a lot of CPU.
>>>>>>>>>>>
>>>>>>>>>>> And strace shows:
>>>>>>>>>>>
>>>>>>>>>>> sched_yield()
>>>>>>>>>>>
>>>>>>>>>>> sched_yield()
>>>>>>>>>>>
>>>>>>>>>>> sched_yield()
>>>>>>>>>>>
>>>>>>>>>>> sched_yield()
>>>>>>>>>>>
>>>>>>>>>>> ....
>>>>>>>>>>>
>>>>>>>>>>> for all processes.
>>>>>>>>>>>
>>>>>>>>>>> If I stop the stress - after a while (not immediately) - it
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>> unlocks,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>> also suddenly, I can see in top that all opensips processes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> stop
>>>>>>
>>>>>>
>>>>>>
>>>>>>> to
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>>> consume CPU.
>>>>>>>>>>>
>>>>>>>>>>> What can it be? Some kind of starvation?
>>>>>>>>>>>
--
Bogdan-Andrei Iancu
www.voice-system.ro
More information about the Users
mailing list