[OpenSIPS-Users] question on core statistics.

Fri Apr 19 08:13:25 UTC 2024

Bogdan,

on the augmenting drop_requests,

drop is not used in that script.  Neither is forward.  Everything is 
t_relay.  There is no b2b.

The thing is that we observed drops on udp level.

We followed the recommendations of  Ovidiu Sas's presentation of last 
year in Houston:

- increase PKG mem

- increase SHM mem

- increase workers to 24 so that the queue empties faster.

- we checked the udp queues on linux level and we saw drops there.

     => we augmented them to 50 megs (sysctl -w 
net.core.rmem_max=52428800 and sysctl -w net.core.rmem_default=52428800) 
and the drops on OS level where gone.

Also worker and memory load are max 30 %.

Hence we thought that we were okay, but still drops on opensips level.  
Net result was that this node in the system lost all connection with the 
destination of the loadbalancer although it received keep alive options 
responses from the loadbalancer destination on its NIC (we could see 
that in a continuously running tcpdump).

=> hence it seems that it is opensips's receive buffer that is too small 
(as I read  the description : "Returns the number of requests dropped 
even before entering the script routing logic.", I thought that this 
pointed to the receive buffer of opensips).  All of this is happening on 
a physical machine on which two other opensips instances are running 
also. Interestingly enough the problem is only observed in the instance 
that handles registrations and invites (1600 REG/ s and 300 INV /s).

Therefore we dived a bit deeper and came on this MAX_RECV_BUFFER_SIZE 
262144 (which is the default udp queue size setting on linux).  Could 
this be related somehow ?

Secondly, what would the recommendation be for scaling a system like this ?

On 18/04/2024 16:29, Bogdan-Andrei Iancu wrote:
> The `drop_requests` statistic is incremented when:
> * the request is dropped by a pre-script callback (like B2B when there 
> is no script execution for certain messages)
> * the stateless `forward()` core function failed to send out something.
>
> Regards,
> Bogdan-Andrei Iancu
>
> OpenSIPS Founder and Developer
>    https://www.opensips-solutions.com
>    https://www.siphub.com
> On 18.04.2024 17:19, Johan De Clercq wrote:
>> No I don't.
>> what I find strange is that MAX_RECV_BUFFER_SIZE 262144 is the 
>> default value of net.core.rmem_max and net.core.rmem_default.
>>
>> Op do 18 apr 2024 om 16:02 schreef Ben Newlin <Ben.Newlin at genesys.com>:
>>
>>     Are you calling drop() anywhere in your script?
>>
>>     https://www.opensips.org/Documentation/Script-CoreFunctions-3-4#toc13
>>
>>     Ben Newlin
>>
>>     *From: *Users <users-bounces at lists.opensips.org> on behalf of
>>     Johan De Clercq <Johan at democon.be>
>>     *Date: *Thursday, April 18, 2024 at 5:27 AM
>>     *To: *OpenSIPS users mailling list <users at lists.opensips.org>
>>     *Subject: *Re: [OpenSIPS-Users] question on core statistics.
>>
>>     * EXTERNAL EMAIL - Please use caution with links and attachments *
>>
>>     ------------------------------------------------------------------------
>>
>>     would it make sense to recompile with other flags ? And how do I
>>     set them  (I don't find these of menuconfig's compile options)?
>>
>>     Currently it has MAX_RECV_BUFFER_SIZE 262144 and BUF_SIZE 65535.
>>
>>     Can somebody explain also what both flags mean.
>>
>>     Op do 18 apr 2024 om 11:07 schreef Johan De Clercq
>>     <Johan at democon.be>:
>>
>>         would it make sense to recompile with other flags ?
>>
>>         Currently it has MAX_RECV_BUFFER_SIZE 262144 and BUF_SIZE 65535.
>>
>>         Can somebody explain also what both flags mean.
>>
>>         flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP,
>>         PKG_MALLOC, F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
>>
>>         ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144,
>>         MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
>>
>>         poll method support: poll, epoll, sigio_rt, select.
>>
>>         Op do 18 apr 2024 om 10:32 schreef Johan De Clercq
>>         <Johan at democon.be>:
>>
>>             Guys,
>>
>>             I have an opensips instance running with 24 worker children.
>>
>>             The worker load is very low.
>>
>>             UDP queues are on 50 megs.
>>
>>             when i query via the OS
>>
>>             cat /proc/net/udp
>>
>>             sl  local_address rem_address   st tx_queue rx_queue tr
>>             tm->when retrnsmt  uid  timeout inode ref pointer drops
>>
>>             590: 03231D0A:13C4 00000000:0000 07 00000000:00000000
>>             00:00000000 00000000    0        0 413684019 2
>>             ffff880074820bc0 0
>>
>>             591: 03231D0A:13C5 00000000:0000 07 00000000:00000000
>>             00:00000000 00000000    0        0 413766438 2
>>             ffff880465e4a440 0
>>
>>             592: 03231D0A:13C6 00000000:0000 07 00000000:00000000
>>             00:00000000 00000000    0        0 412035865 2
>>             ffff8803e5a56b80 0
>>
>>             934: 01231D0A:151C 00000000:0000 07 00000000:00000000
>>             00:00000000 00000000    0        0 26790 2
>>             ffff88046c054840 0
>>
>>             935: 0201FFEF:151D 00000000:0000 07 00000000:00000000
>>             00:00000000 00000000    0        0 26787 2
>>             ffff88046c054bc0 0
>>
>>             935: 01231D0A:151D 00000000:0000 07 00000000:00000000
>>             00:00000000 00000000    0        0 26791 2
>>             ffff88046c0544c0 0
>>
>>              1972: 00000000:D92A 00000000:0000 07 00000000:00000000
>>             00:00000000 00000000    0        0 15506 2
>>             ffff88046dce5040 0
>>
>>              5479: 00000000:E6DD 00000000:0000 07 00000000:00000000
>>             00:00000000 00000000    0        0 22811 2
>>             ffff880465e4ab40 0
>>
>>             12075: AA0914AC:00A1 00000000:0000 07 00000000:00000000
>>             00:00000000 00000000    0        0 20572 2
>>             ffff88086d020800 0
>>
>>             12075: 0100007F:00A1 00000000:0000 07 00000000:00000000
>>             00:00000000 00000000    0        0 20571 2
>>             ffff88086d020b80 0
>>
>>             13320: 00000000:857E 00000000:0000 07 00000000:00000000
>>             00:00000000 00000000  100        0 17515 2
>>             ffff8800368ac780 0
>>
>>             15661: 00000000:CEA3 00000000:0000 07 00000000:00000000
>>             00:00000000 00000000    0        0 15505 2
>>             ffff8800368acb00 0
>>
>>             => no drops
>>
>>             what worries me is that there are drop requests and they 
>>             go up when I query via the mi interface
>>
>>             opensipsctl fifo get_statistics drop_requests
>>
>>             core:drop_requests:: 198107
>>
>>             opensipsctl fifo get_statistics drop_requests
>>
>>             core:drop_requests:: 199157
>>
>>             opensipsctl_reg fifo get_statistics drop_requests
>>
>>             core:drop_requests:: 204116
>>
>>             I don't see any memory issue, also the processload is low.
>>
>>             so 3 questions:
>>
>>             - what exactly is drop_request.
>>
>>             - do I need to worry about this
>>
>>             - how can I make them go lower.
>>
>>     _______________________________________________
>>     Users mailing list
>>     Users at lists.opensips.org
>>     http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opensips.org
>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20240419/c0bd96c1/attachment-0001.html>