[OpenSIPS-Users] Fine tuning high CPS and msyql queries

Jon Abrams ffshoh at gmail.com
Thu Jun 4 19:42:06 EST 2020

A) Is the LRN database located locally on the OpenSIPs box or is it remote?
B) Have you tried only doing sync database queries? Async introduces some
overhead, and I'm not sure if it causes extra database connections to be
created. When using sync there is a connection per child process that stays
C) Does the database have enough memory to contain the LRN and DNC datasets
fully in memory? The extra latency for the non-cache hits sent to the
database may stack up if the database has to hit disk.
D) How many child processes are you using now? If you are hitting 100% you
may need to increase them.
E) Are your memcached processes using heavy cpu? If you are caching
multiple lists, I've found it helps to use unique memcached instance per
F) Look for memory related log messages. If the memory starts getting
exhausted you will see defrag messages. This will chew up available
computation cycles.

- Jon Abrams

On Thu, Jun 4, 2020 at 2:17 PM Calvin Ellison <calvin.ellison at voxox.com>

> The scenario is INVITE -> MySQL query -> non-200 final response. No
> calls are connected here, only dipping things like LRN, Do Not Call,
> and Wireless/Landline. A similar service runs on a second port,
> specific to a different kind of traffic and dip. We're using async
> avp_db_query and memcached, with about 3:1 cache hits.
> Our target is up to 10,000 CPS across two opensips servers, which are
> dual-CPU Xeon E5620 with 48G RAM. Both are run memcached, and both
> servers are using both memcached to share a distributed cache thanks
> to this:
> 'modparam("cachedb_memcached","cachedb_url","memcached:lrn://lrn-d,lrn-e/")'.
> At a glance there are over 200mil total cached items, distributed
> nearly equally.
> The issue is that individual child processes start getting suck at
> 100% CPU. Logs indicate connection failures to the MySQL database
> causing children to run in sync mode, and there are warnings about
> delayed timer jobs tm-timer and blcore-expire. Eventually, the service
> becomes unresponsive. Restarting opensips restores service and the
> children return to single-digit CPU utilization, but eventually,
> children get suck again.
> I'm not certain if the issue is on the database server, or if the
> opensips servers are overloaded, or if the config is just not right
> yet.
> Is there an established method for fine-tuning these things?
> shared memory
> process memory
> children
> db_max_async_connections
> listen=... use_children
> modparam("tm", "timer_partitions", ?)
> What else is worth considering?
> Does a child ever return to async mode after running in sync mode?
> How do I know when my servers have reached their limit?
> opensips.cfg is available on request.
> version: opensips 2.4.7 (x86_64/linux)
> MAX_URI_SIZE 1024, BUF_SIZE 65535
> poll method support: poll, epoll, sigio_rt, select.
> git revision: 9e1fcc915
> main.c compiled on  with gcc 7
> *re-built using dpkg-buildpackage including the patch to support DB
> floating point types:
> https://opensips.org/pipermail/users/2020-March/042528.html
> $ lsb_release -d
> Description:    Ubuntu 18.04.4 LTS
> $ uname -a
> Linux TC-521 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC
> 2020 x86_64 x86_64 x86_64 GNU/Linux
> $ free -mw
>               total        used        free      shared     buffers
>    cache   available
> Mem:          48281        1085         337          87        1729
>    45128       46551
> $ lscpu
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              16
> On-line CPU(s) list: 0-15
> Thread(s) per core:  2
> Core(s) per socket:  4
> Socket(s):           2
> NUMA node(s):        2
> Vendor ID:           GenuineIntel
> CPU family:          6
> Model:               44
> Model name:          Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
> Stepping:            2
> CPU MHz:             2527.029
> BogoMIPS:            4788.05
> Virtualization:      VT-x
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            256K
> L3 cache:            12288K
> NUMA node0 CPU(s):   0,2,4,6,8,10,12,14
> NUMA node1 CPU(s):   1,3,5,7,9,11,13,15
> Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts
> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca
> sse4_1 sse4_2 popcnt aes lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow
> vnmi flexpriority ept vpid dtherm ida arat flush_l1d
> Regards,
> Calvin Ellison
> Senior Voice Operations Engineer
> calvin.ellison at voxox.com
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20200604/5d68bd0f/attachment.html>

More information about the Users mailing list