[OpenSIPS-Users] opensips udp workers lock up with sched_yield

Bogdan-Andrei Iancu bogdan at opensips.org
Tue Mar 10 08:52:39 EST 2020


Hi William,

I suspect you may hit some deadlock (most like a wild guess, as there is 
not much data to check). And the only advice I can give you is to 
upgrade to 2.4 (simple one) or 3.0 (a bit more complex).

Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com
OpenSIPS Summit, Amsterdam, May 2020
   https://www.opensips.org/events/Summit-2020Amsterdam/

On 3/9/20 8:22 PM, William Simon wrote:
>
> This is opensips 2.2.7 (I understand no longer supported) . We found 
> last time this happened a very large TCP send-q from opensips to a 
> remote SIP TCP endpoint when running netstat. It seems TCP gets 
> blocked and does not recover. We do not need to reboot the server to 
> restore the service, only restart opensips.
>
> *From: *Bogdan-Andrei Iancu <bogdan at opensips.org>
> *Date: *Monday, March 9, 2020 at 5:02 AM
> *To: *William Simon <wsimon at stratusvideo.com>, OpenSIPS users mailling 
> list <users at lists.opensips.org>
> *Subject: *Re: [OpenSIPS-Users] opensips udp workers lock up with 
> sched_yield
>
> Hi William,
>
> That it's interesting. Most of the processes are idle (waiting in the 
> I/O reactor) and there are a bunch of them blocked in a lock (same 
> pattern). Nevertheless, the weird thing is there is no active process 
> (like doing something) that may hold the lock. All procs are either 
> blocked, either idle.
>
> What opensips version you have?
>
> Also, is opensips recovering from this state? or you need to do a reboot ?
>
> Regards,
>
> Bogdan-Andrei Iancu
> OpenSIPS Founder and Developer
>    https://www.opensips-solutions.com
> OpenSIPS Summit, Amsterdam, May 2020
>    https://www.opensips.org/events/Summit-2020Amsterdam/
>
> On 2/28/20 4:20 PM, William Simon wrote:
>
>     Bogdan-Andrei, thank you for your insight. Yes, we also use SIP
>     TCP & TLS. I do not see any locks in the rest of the “opensipsctl
>     trap.” Perhaps you will be able to understand it better. The trap
>     is posted at https://pastebin.com/1rs8fVEB
>
>     Thank you
>
>     William Simon
>
>     *From: *Bogdan-Andrei Iancu <bogdan at opensips.org>
>     <mailto:bogdan at opensips.org>
>     *Date: *Friday, February 28, 2020 at 4:23 AM
>     *To: *OpenSIPS users mailling list <users at lists.opensips.org>
>     <mailto:users at lists.opensips.org>, William Simon
>     <wsimon at stratusvideo.com> <mailto:wsimon at stratusvideo.com>
>     *Subject: *Re: [OpenSIPS-Users] opensips udp workers lock up with
>     sched_yield
>
>     Hi William,
>
>     That sched_yield translates into waiting for a lock. As the
>     backtrace (a bit crippled) shows as coming from "send_pr_buffer"
>     (which is responsible for sending out on the network the buffer of
>     a SIP msg), I suspect the transport is TCP or TLS (missing frame
>     #1), as they are using locking. So you have the backtraces from
>     all the procs? this will help to identify the proc holding the
>     lock and blocking all the other procs.
>
>     Best regards,
>
>
>     Bogdan-Andrei Iancu
>
>       
>
>     OpenSIPS Founder and Developer
>
>        https://www.opensips-solutions.com
>
>     OpenSIPS Summit, Amsterdam, May 2020
>
>        https://www.opensips.org/events/Summit-2020Amsterdam/
>
>     OpenSIPS Bootcamp, Miami, March 2020
>
>        https://opensips.org/training/OpenSIPS_Bootcamp_2020/
>
>       
>
>     On 2/28/20 3:58 AM, William Simon wrote:
>
>         In a SIP video environment we have a pair of opensips servers
>         load balancing traffic to freeswitch. The call volume is
>         modest among the two proxies, about 400 concurrent calls at
>         peak times.
>
>         We are occasionally seeing opensips lock up and stop
>         responding to SIP traffic. There is no error in the syslog and
>         no indication of resource exhaustion on the VM (it is a 4-core
>         VMware instance with 4GB of RAM). Once opensips locks up, CPU
>         soon reaches 100%, but before that, it was not using even 50%
>         of the CPU.
>
>         Get_statistics shows that neither the shared memory nor pkg
>         memory are heavily used. They are set at 64M / 4M
>
>         opensipsctl trace shows this on the udp worker threads
>         (children=8 in config – it was previously set to children=4
>         and showed the same behavior)
>
>         [Thread debugging using libthread_db enabled]
>
>         Using host libthread_db library
>         "/lib/x86_64-linux-gnu/libthread_db.so.1".
>
>         0x00007f5b14028bb7 in sched_yield () at
>         ../sysdeps/unix/syscall-template.S:81
>
>         #0 0x00007f5b14028bb7 in sched_yield () at
>         ../sysdeps/unix/syscall-template.S:81
>
>         No locals.
>
>         #1 0x00000000005323a5 in ?? ()
>
>         No symbol table info available.
>
>         #2 0x00007f5b0ec6c48f in send_pr_buffer () from
>         /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>
>         No symbol table info available.
>
>         #3 0x00007f5b0ec9eb9b in t_forward_nonack () from
>         /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>
>         No symbol table info available.
>
>         #4 0x00007f5b0ec6defe in t_relay_to () from
>         /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>
>         No symbol table info available.
>
>         #5 0x00007f5b0ec815ee in ?? () from
>         /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>
>         No symbol table info available.
>
>         #6 0x000000000042b20a in do_action ()
>
>         No symbol table info available.
>
>         #7 0x0000000000430590 in run_action_list ()
>
>         No symbol table info available.
>
>         #8 0x000000000046d3bc in ?? ()
>
>         No symbol table info available.
>
>         #9 0x000000000046cc1d in eval_expr ()
>
>         No symbol table info available.
>
>         #10 0x000000000046cc39 in eval_expr ()
>
>         No symbol table info available.
>
>         #11 0x000000000046cc09 in eval_expr ()
>
>         No symbol table info available.
>
>         #12 0x000000000042b19a in do_action ()
>
>         No symbol table info available.
>
>         #13 0x0000000000430590 in run_action_list ()
>
>         No symbol table info available.
>
>         #14 0x00000000004306ba in ?? ()
>
>         No symbol table info available.
>
>         #15 0x000000000042da9a in do_action ()
>
>         No symbol table info available.
>
>         #16 0x0000000000430590 in run_action_list ()
>
>         No symbol table info available.
>
>         #17 0x000000000042e62e in do_action ()
>
>         No symbol table info available.
>
>         #18 0x0000000000430590 in run_action_list ()
>
>         No symbol table info available.
>
>         #19 0x000000000042e62e in do_action ()
>
>         No symbol table info available.
>
>         #20 0x0000000000430590 in run_action_list ()
>
>         No symbol table info available.
>
>         #21 0x00000000004308d0 in run_top_route ()
>
>         No symbol table info available.
>
>         #22 0x0000000000436ef3 in receive_msg ()
>
>         No symbol table info available.
>
>         #23 0x000000000052d5c5 in ?? ()
>
>         No symbol table info available.
>
>         #24 0x000000000051536d in ?? ()
>
>         No symbol table info available.
>
>         #25 0x000000000051837a in udp_rcv_loop ()
>
>         No symbol table info available.
>
>         #26 0x0000000000519c38 in udp_start_processes ()
>
>         No symbol table info available.
>
>         #27 0x000000000041c38a in main ()
>
>         No symbol table info available.
>
>         ---end 82753
>         -------------------------------------------------------
>
>
>
>
>
> “The information transmitted is intended only for the person or entity 
> to which it is addressed and may contain proprietary, 
> business-confidential and/or privileged material. If you are not the 
> intended recipient of this message you are hereby notified that any 
> use, review, retransmission, dissemination, distribution, reproduction 
> or any action taken in reliance upon this message is prohibited. If 
> you received this in error, please contact the sender and delete the 
> material from any computer.” 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20200310/4dda4d6b/attachment-0001.html>


More information about the Users mailing list