[OpenSIPS-Users] opensips udp workers lock up with sched_yield
Bogdan-Andrei Iancu
bogdan at opensips.org
Tue Mar 10 08:52:39 EST 2020
Hi William,
I suspect you may hit some deadlock (most like a wild guess, as there is
not much data to check). And the only advice I can give you is to
upgrade to 2.4 (simple one) or 3.0 (a bit more complex).
Regards,
Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
https://www.opensips-solutions.com
OpenSIPS Summit, Amsterdam, May 2020
https://www.opensips.org/events/Summit-2020Amsterdam/
On 3/9/20 8:22 PM, William Simon wrote:
>
> This is opensips 2.2.7 (I understand no longer supported) . We found
> last time this happened a very large TCP send-q from opensips to a
> remote SIP TCP endpoint when running netstat. It seems TCP gets
> blocked and does not recover. We do not need to reboot the server to
> restore the service, only restart opensips.
>
> *From: *Bogdan-Andrei Iancu <bogdan at opensips.org>
> *Date: *Monday, March 9, 2020 at 5:02 AM
> *To: *William Simon <wsimon at stratusvideo.com>, OpenSIPS users mailling
> list <users at lists.opensips.org>
> *Subject: *Re: [OpenSIPS-Users] opensips udp workers lock up with
> sched_yield
>
> Hi William,
>
> That it's interesting. Most of the processes are idle (waiting in the
> I/O reactor) and there are a bunch of them blocked in a lock (same
> pattern). Nevertheless, the weird thing is there is no active process
> (like doing something) that may hold the lock. All procs are either
> blocked, either idle.
>
> What opensips version you have?
>
> Also, is opensips recovering from this state? or you need to do a reboot ?
>
> Regards,
>
> Bogdan-Andrei Iancu
> OpenSIPS Founder and Developer
> https://www.opensips-solutions.com
> OpenSIPS Summit, Amsterdam, May 2020
> https://www.opensips.org/events/Summit-2020Amsterdam/
>
> On 2/28/20 4:20 PM, William Simon wrote:
>
> Bogdan-Andrei, thank you for your insight. Yes, we also use SIP
> TCP & TLS. I do not see any locks in the rest of the “opensipsctl
> trap.” Perhaps you will be able to understand it better. The trap
> is posted at https://pastebin.com/1rs8fVEB
>
> Thank you
>
> William Simon
>
> *From: *Bogdan-Andrei Iancu <bogdan at opensips.org>
> <mailto:bogdan at opensips.org>
> *Date: *Friday, February 28, 2020 at 4:23 AM
> *To: *OpenSIPS users mailling list <users at lists.opensips.org>
> <mailto:users at lists.opensips.org>, William Simon
> <wsimon at stratusvideo.com> <mailto:wsimon at stratusvideo.com>
> *Subject: *Re: [OpenSIPS-Users] opensips udp workers lock up with
> sched_yield
>
> Hi William,
>
> That sched_yield translates into waiting for a lock. As the
> backtrace (a bit crippled) shows as coming from "send_pr_buffer"
> (which is responsible for sending out on the network the buffer of
> a SIP msg), I suspect the transport is TCP or TLS (missing frame
> #1), as they are using locking. So you have the backtraces from
> all the procs? this will help to identify the proc holding the
> lock and blocking all the other procs.
>
> Best regards,
>
>
> Bogdan-Andrei Iancu
>
>
>
> OpenSIPS Founder and Developer
>
> https://www.opensips-solutions.com
>
> OpenSIPS Summit, Amsterdam, May 2020
>
> https://www.opensips.org/events/Summit-2020Amsterdam/
>
> OpenSIPS Bootcamp, Miami, March 2020
>
> https://opensips.org/training/OpenSIPS_Bootcamp_2020/
>
>
>
> On 2/28/20 3:58 AM, William Simon wrote:
>
> In a SIP video environment we have a pair of opensips servers
> load balancing traffic to freeswitch. The call volume is
> modest among the two proxies, about 400 concurrent calls at
> peak times.
>
> We are occasionally seeing opensips lock up and stop
> responding to SIP traffic. There is no error in the syslog and
> no indication of resource exhaustion on the VM (it is a 4-core
> VMware instance with 4GB of RAM). Once opensips locks up, CPU
> soon reaches 100%, but before that, it was not using even 50%
> of the CPU.
>
> Get_statistics shows that neither the shared memory nor pkg
> memory are heavily used. They are set at 64M / 4M
>
> opensipsctl trace shows this on the udp worker threads
> (children=8 in config – it was previously set to children=4
> and showed the same behavior)
>
> [Thread debugging using libthread_db enabled]
>
> Using host libthread_db library
> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>
> 0x00007f5b14028bb7 in sched_yield () at
> ../sysdeps/unix/syscall-template.S:81
>
> #0 0x00007f5b14028bb7 in sched_yield () at
> ../sysdeps/unix/syscall-template.S:81
>
> No locals.
>
> #1 0x00000000005323a5 in ?? ()
>
> No symbol table info available.
>
> #2 0x00007f5b0ec6c48f in send_pr_buffer () from
> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>
> No symbol table info available.
>
> #3 0x00007f5b0ec9eb9b in t_forward_nonack () from
> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>
> No symbol table info available.
>
> #4 0x00007f5b0ec6defe in t_relay_to () from
> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>
> No symbol table info available.
>
> #5 0x00007f5b0ec815ee in ?? () from
> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>
> No symbol table info available.
>
> #6 0x000000000042b20a in do_action ()
>
> No symbol table info available.
>
> #7 0x0000000000430590 in run_action_list ()
>
> No symbol table info available.
>
> #8 0x000000000046d3bc in ?? ()
>
> No symbol table info available.
>
> #9 0x000000000046cc1d in eval_expr ()
>
> No symbol table info available.
>
> #10 0x000000000046cc39 in eval_expr ()
>
> No symbol table info available.
>
> #11 0x000000000046cc09 in eval_expr ()
>
> No symbol table info available.
>
> #12 0x000000000042b19a in do_action ()
>
> No symbol table info available.
>
> #13 0x0000000000430590 in run_action_list ()
>
> No symbol table info available.
>
> #14 0x00000000004306ba in ?? ()
>
> No symbol table info available.
>
> #15 0x000000000042da9a in do_action ()
>
> No symbol table info available.
>
> #16 0x0000000000430590 in run_action_list ()
>
> No symbol table info available.
>
> #17 0x000000000042e62e in do_action ()
>
> No symbol table info available.
>
> #18 0x0000000000430590 in run_action_list ()
>
> No symbol table info available.
>
> #19 0x000000000042e62e in do_action ()
>
> No symbol table info available.
>
> #20 0x0000000000430590 in run_action_list ()
>
> No symbol table info available.
>
> #21 0x00000000004308d0 in run_top_route ()
>
> No symbol table info available.
>
> #22 0x0000000000436ef3 in receive_msg ()
>
> No symbol table info available.
>
> #23 0x000000000052d5c5 in ?? ()
>
> No symbol table info available.
>
> #24 0x000000000051536d in ?? ()
>
> No symbol table info available.
>
> #25 0x000000000051837a in udp_rcv_loop ()
>
> No symbol table info available.
>
> #26 0x0000000000519c38 in udp_start_processes ()
>
> No symbol table info available.
>
> #27 0x000000000041c38a in main ()
>
> No symbol table info available.
>
> ---end 82753
> -------------------------------------------------------
>
>
>
>
>
> “The information transmitted is intended only for the person or entity
> to which it is addressed and may contain proprietary,
> business-confidential and/or privileged material. If you are not the
> intended recipient of this message you are hereby notified that any
> use, review, retransmission, dissemination, distribution, reproduction
> or any action taken in reliance upon this message is prohibited. If
> you received this in error, please contact the sender and delete the
> material from any computer.”
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20200310/4dda4d6b/attachment-0001.html>
More information about the Users
mailing list