[OpenSIPS-Users] sched_yield()
Andrei Dragus
adragus at opensips.org
Thu Jan 21 14:43:07 CET 2010
My guess is that there is not enough shared memory. When an allocation
failes OpenSIPS tries to defragment memory to make room which takes a
lot of time and must be done under lock.
Please try to increase the shared memory size and tell me if it persists.
Alex Massover wrote:
> Hi!
>
> Yes, with -DF_MALLOC.
>
> 1.6.1 from sources, I build deb package.
> I use 128M of shared and 10*1024*1024 private memory (can increase - no problem).
>
> Hmmmm, "opensipsctl fifo get_statistics all" crashes/stops the opensips.
>
> 'fifo uptime' or 'fifo debug' are OK.
>
> strace while 'fifo get_statistics all':
> Process 9509 attached - interrupt to quit
> pause() = ? ERESTARTNOHAND (To be restarted)
> --- SIGUSR2 (User defined signal 2) @ 0 (0) ---
> sigreturn() = ? (mask now [])
> pause() = ? ERESTARTNOHAND (To be restarted)
> --- SIGCHLD (Child exited) @ 0 (0) ---
> sigreturn() = ? (mask now [])
> waitpid(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGUSR2}], WNOHANG) = 9520
> waitpid(-1, 0xbf84b4c8, WNOHANG) = 0
> kill(0, SIGTERM) = 0
> --- SIGTERM (Terminated) @ 0 (0) ---
> --- SIGCHLD (Child exited) @ 0 (0) ---
> sigreturn() = ? (mask now [TERM])
> sigreturn() = ? (mask now [])
> rt_sigaction(SIGALRM, {0x8065920, [ALRM], SA_RESTART}, {SIG_DFL}, 8) = 0
> alarm(60) = 0
> wait4(-1, NULL, 0, NULL) = 9514
> wait4(-1, NULL, 0, NULL) = 9519
> wait4(-1, NULL, 0, NULL) = 9521
> wait4(-1, NULL, 0, NULL) = 9522
> wait4(-1, NULL, 0, NULL) = 9512
> --- SIGCHLD (Child exited) @ 0 (0) ---
> sigreturn() = ? (mask now [])
> --- SIGCHLD (Child exited) @ 0 (0) ---
> sigreturn() = ? (mask now [])
> wait4(-1, NULL, 0, NULL) = 9510
> wait4(-1, NULL, 0, NULL) = 9516
> --- SIGCHLD (Child exited) @ 0 (0) ---
> sigreturn() = ? (mask now [])
> --- SIGCHLD (Child exited) @ 0 (0) ---
> sigreturn() = ? (mask now [])
> wait4(-1, NULL, 0, NULL) = 9515
> wait4(-1, NULL, 0, NULL) = 9517
> wait4(-1, NULL, 0, NULL) = 9524
> wait4(-1, NULL, 0, NULL) = 9525
> --- SIGCHLD (Child exited) @ 0 (0) ---
> sigreturn() = ? (mask now [])
> --- SIGCHLD (Child exited) @ 0 (0) ---
> sigreturn() = ? (mask now [])
> --- SIGCHLD (Child exited) @ 0 (0) ---
> sigreturn() = ? (mask now [])
> --- SIGCHLD (Child exited) @ 0 (0) ---
> sigreturn() = ? (mask now [])
> wait4(-1, NULL, 0, NULL) = 9511
> wait4(-1, NULL, 0, NULL) = 9513
> wait4(-1, NULL, 0, NULL) = 9518
> wait4(-1, NULL, 0, NULL) = 9523
> wait4(-1, NULL, 0, NULL) = -1 ECHILD (No child processes)
> rt_sigaction(SIGALRM, {0x8066080, [ALRM], SA_RESTART}, {0x8065920, [ALRM], SA_RESTART}, 8) = 0
> stat64("/tmp/opensips_fifo", {st_mode=S_IFIFO|0660, st_size=0, ...}) = 0
> unlink("/tmp/opensips_fifo") = 0
> munmap(0xaed25000, 134217728) = 0
> unlink("/var/run/opensips/opensips.pid") = 0
> alarm(0) = 60
> rt_sigaction(SIGALRM, {SIG_IGN}, {0x8066080, [ALRM], SA_RESTART}, 8) = 0
> exit_group(0) = ?
> Process 9509 detached
>
> --
> Best Regards,
> Alex Massover
> VoIP R&D TL
> Jajah Inc.
>
>
>> -----Original Message-----
>> From: users-bounces at lists.opensips.org [mailto:users-
>> bounces at lists.opensips.org] On Behalf Of Andrei Dragus
>> Sent: Thursday, January 21, 2010 3:09 PM
>> To: OpenSIPS users mailling list
>> Subject: Re: [OpenSIPS-Users] sched_yield()
>>
>>
>> Hi,
>>
>> Since all the backtraces are in allocation routines my guess is that
>> the
>> shared memory lock might be causing a problem.
>>
>> Are you compiling with -DF_MALLOC?
>> What version of OpenSIPS are you using?
>> What is the total shared memory pool you are allocating?
>> What amount of memory are you using? ( Use : opensipsctl fifo
>> get_statistics all )
>>
>> Alex Massover wrote:
>>
>>> Some more,
>>>
>>> (gdb) bt
>>> #0 0xb78dc424 in __kernel_vsyscall ()
>>> #1 0xb781741c in sched_yield () from /lib/i686/cmov/libc.so.6
>>> #2 0xb73d77fd in build_new_dlg () from
>>>
>> /usr/lib/opensips/modules/dialog.so
>>
>>> #3 0xb73d4b81 in dlg_create_dialog () from
>>>
>> /usr/lib/opensips/modules/dialog.so
>>
>>> #4 0xb73c9c9e in ?? () from /usr/lib/opensips/modules/dialog.so
>>> #5 0x08055030 in do_action ()
>>> #6 0x08053ebf in run_action_list ()
>>> #7 0x08056e7a in do_action ()
>>> #8 0x08053ebf in run_action_list ()
>>> #9 0x08057d99 in run_top_route ()
>>> #10 0x0808ad6c in receive_msg ()
>>> #11 0x080bd2f2 in udp_rcv_loop ()
>>> #12 0x08069339 in main ()
>>>
>>>
>>> (gdb) bt
>>> #0 0xb78dc424 in __kernel_vsyscall ()
>>> #1 0xb781741c in sched_yield () from /lib/i686/cmov/libc.so.6
>>> #2 0xb77242cd in build_cell () from /usr/lib/opensips/modules/tm.so
>>> #3 0xb7739c4a in t_newtran () from /usr/lib/opensips/modules/tm.so
>>> #4 0xb772e7b8 in t_relay_to () from /usr/lib/opensips/modules/tm.so
>>> #5 0xb773b501 in ?? () from /usr/lib/opensips/modules/tm.so
>>> #6 0x08055030 in do_action ()
>>> #7 0x08053ebf in run_action_list ()
>>> #8 0x08095cf2 in eval_expr ()
>>> #9 0x080958d9 in eval_expr ()
>>> #10 0x08095919 in eval_expr ()
>>> #11 0x080554e2 in do_action ()
>>> #12 0x08053ebf in run_action_list ()
>>> #13 0x080569d8 in do_action ()
>>> #14 0x08053ebf in run_action_list ()
>>> #15 0x08056e7a in do_action ()
>>> #16 0x08053ebf in run_action_list ()
>>> #17 0x08057d99 in run_top_route ()
>>> #18 0x0808ad6c in receive_msg ()
>>> #19 0x080bd2f2 in udp_rcv_loop ()
>>> #20 0x08069339 in main ()
>>>
>>> --
>>> Best Regards,
>>> Alex Massover
>>> VoIP R&D TL
>>> Jajah Inc.
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: users-bounces at lists.opensips.org [mailto:users-
>>>> bounces at lists.opensips.org] On Behalf Of Alex Massover
>>>> Sent: Thursday, January 21, 2010 2:24 PM
>>>> To: OpenSIPS users mailling list
>>>> Subject: Re: [OpenSIPS-Users] sched_yield()
>>>>
>>>> Hi,
>>>>
>>>> Another one.. It hangs for a number of seconds (but it's enough to
>>>> cause to SIP timeouts - MSG queue jumps to 260K), it's hard to make
>>>>
>> a
>>
>>>> bt at the right moment.
>>>> This one looks better because there's sched_yield() there :)
>>>>
>>>> (gdb) bt
>>>> #0 0xb77d5424 in __kernel_vsyscall ()
>>>> #1 0xb771041c in sched_yield () from /lib/i686/cmov/libc.so.6
>>>> #2 0x080bf23d in new_avp ()
>>>> #3 0x080bf53f in add_avp ()
>>>> #4 0xb72c1c9c in ?? () from /usr/lib/opensips/modules/dialog.so
>>>> #5 0x08055030 in do_action ()
>>>> #6 0x08053ebf in run_action_list ()
>>>> #7 0x08056e7a in do_action ()
>>>> #8 0x08053ebf in run_action_list ()
>>>> #9 0x08056e7a in do_action ()
>>>> #10 0x08053ebf in run_action_list ()
>>>> #11 0x08056e7a in do_action ()
>>>> #12 0x08053ebf in run_action_list ()
>>>> #13 0x08057d99 in run_top_route ()
>>>> #14 0x0808ad6c in receive_msg ()
>>>> #15 0x080bd2f2 in udp_rcv_loop ()
>>>> #16 0x08069339 in main ()
>>>>
>>>> --
>>>> Best Regards,
>>>> Alex Massover
>>>> VoIP R&D TL
>>>> Jajah Inc.
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: users-bounces at lists.opensips.org [mailto:users-
>>>>> bounces at lists.opensips.org] On Behalf Of Alex Massover
>>>>> Sent: Thursday, January 21, 2010 2:05 PM
>>>>> To: OpenSIPS users mailling list
>>>>> Subject: Re: [OpenSIPS-Users] sched_yield()
>>>>>
>>>>> Hi Andrei,
>>>>> Hopefully this is it (with FASTLOCK)
>>>>>
>>>>> #0 0xb77d5424 in __kernel_vsyscall ()
>>>>> #1 0xb772babb in poll () from /lib/i686/cmov/libc.so.6
>>>>> #2 0xb77ba83a in ?? () from /lib/i686/cmov/libresolv.so.2
>>>>> #3 0xb77b8946 in __libc_res_nquery () from
>>>>> /lib/i686/cmov/libresolv.so.2
>>>>> #4 0xb77b8fdb in ?? () from /lib/i686/cmov/libresolv.so.2
>>>>> #5 0xb77b92ae in __libc_res_nsearch () from
>>>>> /lib/i686/cmov/libresolv.so.2
>>>>> #6 0xb77b96d4 in __res_nsearch () from
>>>>>
>> /lib/i686/cmov/libresolv.so.2
>>
>>>>> #7 0xb77b808a in res_search () from /lib/i686/cmov/libresolv.so.2
>>>>> #8 0x0808c613 in get_record ()
>>>>> #9 0x0808cf05 in ?? ()
>>>>> #10 0x0808e385 in sip_resolvehost ()
>>>>> #11 0x0807a26c in mk_proxy ()
>>>>> #12 0xb7627d39 in t_relay_to () from
>>>>>
>> /usr/lib/opensips/modules/tm.so
>>
>>>>> #13 0xb7634501 in ?? () from /usr/lib/opensips/modules/tm.so
>>>>> #14 0x08055030 in do_action ()
>>>>> #15 0x08053ebf in run_action_list ()
>>>>> #16 0x08095cf2 in eval_expr ()
>>>>> #17 0x080958d9 in eval_expr ()
>>>>> #18 0x08095919 in eval_expr ()
>>>>> #19 0x080554e2 in do_action ()
>>>>> #20 0x08053ebf in run_action_list ()
>>>>> #21 0x08056e7a in do_action ()
>>>>> #22 0x08053ebf in run_action_list ()
>>>>> ---Type <return> to continue, or q <return> to quit---
>>>>> #23 0x080569d8 in do_action ()
>>>>> #24 0x08053ebf in run_action_list ()
>>>>> #25 0x08056e7a in do_action ()
>>>>> #26 0x08053ebf in run_action_list ()
>>>>> #27 0x08057d99 in run_top_route ()
>>>>> #28 0x0808ad6c in receive_msg ()
>>>>> #29 0x080bd2f2 in udp_rcv_loop ()
>>>>> #30 0x08069339 in main ()
>>>>> (gdb)
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Alex Massover
>>>>> VoIP R&D TL
>>>>> Jajah Inc.
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: users-bounces at lists.opensips.org [mailto:users-
>>>>>> bounces at lists.opensips.org] On Behalf Of Andrei Dragus
>>>>>> Sent: Wednesday, January 20, 2010 2:58 PM
>>>>>> To: OpenSIPS users mailling list
>>>>>> Subject: Re: [OpenSIPS-Users] sched_yield()
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I think that there is a lock that is being held more than it
>>>>>>
>> should
>>
>>>>> be
>>>>>
>>>>>
>>>>>> and that's what causes starvation. It would help us if you could
>>>>>>
>>>>>>
>>>>> attach
>>>>>
>>>>>
>>>>>> to a process using gdb and give us a full backtrace.
>>>>>>
>>>>>> Temporary solutions which should work would be to reduce the
>>>>>>
>> number
>>
>>>>> of
>>>>>
>>>>>
>>>>>> processes to 4-6 or to recompile replacing -DFAST_LOCK with one of
>>>>>>
>>>>>>
>>>>> the
>>>>>
>>>>>
>>>>>> other options (-DUSE_POSIX_SEM or -DUSE_PTHREAD_MUTEX) but we
>>>>>>
>>>>>>
>>>> should
>>>>
>>>>
>>>>>> see
>>>>>> where this is from to fix it.
>>>>>>
>>>>>> Alex Massover wrote:
>>>>>>
>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> Yes, from the source on debian, I build deb package. (I did some
>>>>>>>
>>>>>>>
>>>>>> minor changes to the source, but the problem happens also without
>>>>>>
>>>>>>
>>>> my
>>>>
>>>>
>>>>>> changes)
>>>>>>
>>>>>>
>>>>>>> 16 children on 4 cores.
>>>>>>>
>>>>>>> What do you suggest to reduce it to 4? It runs on 2.6.32 on
>>>>>>>
>>>>>>>
>>>> VMware
>>>>
>>>>
>>>>>> ESX.
>>>>>>
>>>>>>
>>>>>>> I'm also trying now sleep(0) instead of sched_yield().
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Alex Massover
>>>>>>> VoIP R&D TL
>>>>>>> Jajah Inc.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: users-bounces at lists.opensips.org [mailto:users-
>>>>>>>> bounces at lists.opensips.org] On Behalf Of Andrei Dragus
>>>>>>>> Sent: Wednesday, January 20, 2010 1:05 PM
>>>>>>>> To: OpenSIPS users mailling list
>>>>>>>> Subject: Re: [OpenSIPS-Users] sched_yield()
>>>>>>>>
>>>>>>>> Hi Alex,
>>>>>>>>
>>>>>>>> Are you building OpenSIPS from source?
>>>>>>>> How many processes do you have and on how many cores?
>>>>>>>>
>>>>>>>>
>>>>>>>> Alex Massover wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hello!
>>>>>>>>>
>>>>>>>>> I'm facing a strange problem, sometimes under a stress OpenSIPS
>>>>>>>>> "locks" - load average jumps, SIP processing delays, opensips
>>>>>>>>>
>>>>>>>>>
>>>> msg
>>>>
>>>>
>>>>>>>>> queue fills with a lot of sip messages, opensips processes
>>>>>>>>>
>>>>>>>>>
>>>> start
>>>>
>>>>
>>>>> to
>>>>>
>>>>>
>>>>>>>>> comsume a lot of CPU.
>>>>>>>>>
>>>>>>>>> And strace shows:
>>>>>>>>>
>>>>>>>>> sched_yield()
>>>>>>>>>
>>>>>>>>> sched_yield()
>>>>>>>>>
>>>>>>>>> sched_yield()
>>>>>>>>>
>>>>>>>>> sched_yield()
>>>>>>>>>
>>>>>>>>> ....
>>>>>>>>>
>>>>>>>>> for all processes.
>>>>>>>>>
>>>>>>>>> If I stop the stress - after a while (not immediately) - it
>>>>>>>>>
>>>>>>>>>
>>>>>> unlocks,
>>>>>>
>>>>>>
>>>>>>>>> also suddenly, I can see in top that all opensips processes
>>>>>>>>>
>>>>>>>>>
>>>> stop
>>>>
>>>>
>>>>> to
>>>>>
>>>>>
>>>>>>>>> consume CPU.
>>>>>>>>>
>>>>>>>>> What can it be? Some kind of starvation?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>>
>>>>>>>>> Alex Massover
>>>>>>>>>
>>>>>>>>> VoIP R&D TL
>>>>>>>>>
>>>>>>>>> Jajah Inc.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This mail was sent via Mail-SeCure System.
>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>> --
>>>>
>>>>
>>>>> --
>>>>>
>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list
>>>>>>>>> Users at lists.opensips.org
>>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Andrei Dragus
>>>>>>>> www.voice-system.ro
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> Users at lists.opensips.org
>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>>>
>>>>>>>> This mail was received via Mail-SeCure System.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> This mail was sent via Mail-SeCure System.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users at lists.opensips.org
>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Andrei Dragus
>>>>>> www.voice-system.ro
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at lists.opensips.org
>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>
>>>>>> This mail was received via Mail-SeCure System.
>>>>>>
>>>>>>
>>>>>>
>>>>> This mail was sent via Mail-SeCure System.
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at lists.opensips.org
>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>
>>>>> This mail was received via Mail-SeCure System.
>>>>>
>>>>>
>>>>>
>>>> This mail was sent via Mail-SeCure System.
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opensips.org
>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>
>>>> This mail was received via Mail-SeCure System.
>>>>
>>>>
>>>>
>>> This mail was sent via Mail-SeCure System.
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opensips.org
>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>
>>>
>> --
>> Andrei Dragus
>> www.voice-system.ro
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opensips.org
>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>
>> This mail was received via Mail-SeCure System.
>>
>>
>
>
> This mail was sent via Mail-SeCure System.
>
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
--
Andrei Dragus
www.voice-system.ro
More information about the Users
mailing list