[OpenSIPS-Users] Chasing down unreleased memory
Jock McKechnie
jock.mckechnie at gmail.com
Tue Oct 2 09:19:44 EDT 2018
Good morning Răzvan,
I completely understand being busy, I was confident you weren't just
ignoring me :)
I will build the latest from git with the debugging bits and pieces
and report back after testing.
I had been running the latest 2.4.2 off yum.opensips.org when I
originally identified the problem, but have since hand compiled it to
enable all the debugging widgets:
version: opensips 2.4.2 (x86_64/linux)
flags: STATS: On, SHM_EXTRA_STATS, DISABLE_NAGLE, USE_MCAST, SHM_MMAP,
PKG_MALLOC, QM_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16,
MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: 66114a66c
main.c compiled on 07:56:12 Sep 20 2018 with gcc 4.8.5
I'll get back to you soon;
- Jock
On Tue, Oct 2, 2018 at 4:00 AM, Răzvan Crainea <razvan at opensips.org> wrote:
> Hi, Jock!
>
> Can you confirm exactly the opensips version you are running? (opensips -V).
>
> Best regards,
> Razvan
>
>
> On 10/2/18 11:41 AM, Răzvan Crainea wrote:
>>
>> Hi, Jock!
>>
>> Good news :) I've just figured out the pkg memleak in mc_compact, and I've
>> already pushed a fix upstream. Please pull the latest version and run a few
>> tests, and let us know how they go.
>> In the meantime, I will investigate the shm leak.
>>
>> Best regards,
>> Razvan
>>
>> On 10/2/18 10:13 AM, Răzvan Crainea wrote:
>>>
>>> Hi, Jock!
>>>
>>> I am pretty busy these days, so I couldn't pay too much attention to this
>>> thread. Nevertheless, I will try to prioritize it this week and get back to
>>> you.
>>> I have also tracked down the pkg mem leak to the mc_compact() module, but
>>> I can't pinpoint for now exactly the issue. Although I have some
>>> assumptions, I can't be sure until I run some tests.
>>> If I need extra info during my debug, I will definitely let you know, so
>>> keep an eye on this thread.
>>>
>>> Best regards,
>>> Răzvan
>>>
>>> On 10/1/18 9:14 PM, Jock McKechnie wrote:
>>>>
>>>> Greetings;
>>>>
>>>> I should have done this last week but, to be honest, it just didn't
>>>> occur to me. I pulled the compression module out (and commented out
>>>> mc_compact()) and re-ran my tests and the memory usage was massively
>>>> lower (losing 100k of memory per minute vs 3-5MB/minute) and within a
>>>> few minutes of calling ceasing almost all of the memory was released
>>>> (went from 42MB down to 3.8MB within 6 minutes).
>>>>
>>>> So it appears that there's something leaking in the compression module.
>>>>
>>>> I'm going to poke around a few more tests (include compression.so
>>>> without actually calling it, etc) but I wanted to pass this on as
>>>> there appears to be a bug that needs addressing.
>>>>
>>>> Is there anything I can provide that might be useful in debugging this?
>>>>
>>>> As always, thank you for your help.
>>>>
>>>> - Jock
>>>>
>>>> On Fri, Sep 21, 2018 at 8:12 AM, Jock McKechnie
>>>> <jock.mckechnie at gmail.com> wrote:
>>>>>
>>>>> Thank you very much, Răzvan;
>>>>>
>>>>> We are using the compression module for mc_compact(). We use SIP
>>>>> headers for passing around data (don't ask) and the compact headers
>>>>> are mandatory or else we hit MTU/fragmentation problems. We're on the
>>>>> cusp of moving to SIP/TCP but haven't gotten there quite yet.
>>>>> The mc_compact() lives in the main route[1] block and is simply this,
>>>>> in whole:
>>>>> mc_compact("P-Asserted-Identity|Remote-Party-ID|X-UCID");
>>>>>
>>>>> And my apologies for not including the shm dump segment - please find
>>>>> it below.
>>>>> https://pastebin.com/itSTER7Y
>>>>>
>>>>> Let me know what else I can provide, this is fairly easy to replicate
>>>>> (although it takes about an hour of runtime pummeling it to get nice
>>>>> easy to read dumps). Thank you, again, very much.
>>>>>
>>>>> - Jock
>>>>>
>>>>> On Fri, Sep 21, 2018 at 3:01 AM, Răzvan Crainea <razvan at opensips.org>
>>>>> wrote:
>>>>>>
>>>>>> Hi, Jock!
>>>>>>
>>>>>> I think you are actually heating two memory leaks: one in private
>>>>>> memory and
>>>>>> one in shared memory.
>>>>>> In the memory dump you sent I can spot a pkg mem related to (I
>>>>>> believe) the
>>>>>> compression module. Can you tell me what functions of the compression
>>>>>> module
>>>>>> you are using and what flags are you using for them?
>>>>>>
>>>>>> Besides that, you claim there is a shared memory leak as well. Can you
>>>>>> send
>>>>>> us the logs for the shared memory? They are dumped by the first
>>>>>> process
>>>>>> (attendant) and should start with the string "Memory status (shm):".
>>>>>> Can you
>>>>>> post those logs as well?
>>>>>>
>>>>>> Best regards,
>>>>>> Razvan
>>>>>>
>>>>>>
>>>>>> On 9/20/18 9:43 PM, Jock McKechnie wrote:
>>>>>>>
>>>>>>>
>>>>>>> Greetings OpenSIPS gurus;
>>>>>>>
>>>>>>> I've run into an issue that I'm not making any real headway with on
>>>>>>> my
>>>>>>> own. I have an OpenSIPS 2.4.2 running on CentOS7 that does not appear
>>>>>>> to be releasing the bulk of its shared memory when calls die off.
>>>>>>> Eventually OpenSIPS runs itself out of shared memory and goes beserk
>>>>>>> when it can't reclaim memory for new transactions, or I've also seen
>>>>>>> the kernel oomkiller take it down.
>>>>>>>
>>>>>>> The setup is, as mentioned, a 2.4.2 which has a chunk of perl living
>>>>>>> on its back that handles some API directed call decisions. Because of
>>>>>>> the inevitable delay in the API lookups I am running a large number
>>>>>>> of
>>>>>>> threads (100) to ensure that I don't start dropping packets under
>>>>>>> high
>>>>>>> levels of load (50 CPS+ - until I bumped children counts up I was
>>>>>>> seeing packets being ignored which got the End User very excitable).
>>>>>>> I
>>>>>>> appreciate having lots of children amplifies memory usage but my
>>>>>>> belief is during regular load on our side (~6000 calls at ~30 -
>>>>>>> 40CPS)
>>>>>>> I should have a full day's runtime easily w/o running into problems.
>>>>>>> The issue is that because the memory is not released overnight it
>>>>>>> starts the next day with half its shared memory gone, and then things
>>>>>>> go to pot the next day when it runs through all its remaining shared
>>>>>>> memory.
>>>>>>>
>>>>>>> I have it configured for a very large (I think) amount of memory to
>>>>>>> attempt to compensate short-term - 2048MB shared memory, 128MB
>>>>>>> package
>>>>>>> memory. This is not a well thought-out number, just me throwing RAM
>>>>>>> at
>>>>>>> the problem. (For reference this is on a VM running with 4 cores and
>>>>>>> 4096MB of RAM)
>>>>>>>
>>>>>>> I have graphed the memory stats out of the FIFO, after enabling
>>>>>>> memstats and DBG_MALLOC/etc. The graphs can be found here:
>>>>>>> https://imgur.com/a/oapKYJW
>>>>>>>
>>>>>>> You'll see that the first graph, the main shmem:, shows the usage
>>>>>>> gets
>>>>>>> tied up and remains after calls die off (these examples run for 20
>>>>>>> minutes after the calls stopped - but I can leave it overnight and
>>>>>>> the
>>>>>>> shmem: usage won't drop). This being said, the other graphed modules
>>>>>>> show that it releases memory correctly after calls quit like they
>>>>>>> should. None of the other modules graphed (eg, _all_ of the modules I
>>>>>>> was using in OpenSIPS) showed no significant usage and I've omitted
>>>>>>> them.
>>>>>>>
>>>>>>> Interesting that the 'perl' module shows no stats ever which makes me
>>>>>>> wonder if the stats on this module works. I am not storing anything
>>>>>>> inside perl, besides lots of local session variables, and all channel
>>>>>>> variables are pushed back into OpenSIPS AVPs, but I assume there must
>>>>>>> be some memory overhead, although it may not significantly change.
>>>>>>>
>>>>>>> When I shut OpenSIPS down I get child dumps like the following: (I've
>>>>>>> included only one for brevity, but they're all much of a muchness)
>>>>>>> https://pastebin.com/KG2qxxBa
>>>>>>>
>>>>>>> If I'm reading this right (and there's a strong possibility I'm
>>>>>>> misunderstanding what I'm looking at) the memory is tied up in
>>>>>>> io_wait
>>>>>>> and the parser. I'm not sure what this signifies, however.
>>>>>>>
>>>>>>> So I guess what I'm asking is:
>>>>>>> Am I understanding what I'm looking at correctly and that OpenSIPS is
>>>>>>> not releasing its shared memory which causes it to run out of memory
>>>>>>> quicker the following day? (Which lines up with the symptoms of
>>>>>>> shorter runtimes before failure)
>>>>>>>
>>>>>>> Based on the dumps where should I look for the source of this issue?
>>>>>>>
>>>>>>> Finally, and unrelated to the above, I attempted to mail the official
>>>>>>> OpenSIPS peeps to talk about paid support but none of the 'contact
>>>>>>> us'
>>>>>>> submits appear to work - when I click on the submit button it changes
>>>>>>> shade but does not appear to act. Certainly I never received any
>>>>>>> response. If there was a direct eMail address I could send to that
>>>>>>> would be preferable.
>>>>>>>
>>>>>>> As always, my thanks for your time.
>>>>>>>
>>>>>>> - Jock
>>>>>>>
>>>>>>> PS -
>>>>>>> List of modules I'm using in this config:loadmodule "signaling.so"
>>>>>>> loadmodule "sl.so"
>>>>>>> loadmodule "tm.so"
>>>>>>> loadmodule "rr.so"
>>>>>>> loadmodule "maxfwd.so"
>>>>>>> loadmodule "textops.so"
>>>>>>> loadmodule "uri.so"
>>>>>>> loadmodule "mi_fifo.so"
>>>>>>> loadmodule "siptrace.so"
>>>>>>> loadmodule "sipmsgops.so"
>>>>>>> loadmodule "dialog.so"
>>>>>>> loadmodule "uac_auth.so"
>>>>>>> loadmodule "uac.so"
>>>>>>> loadmodule "perl.so"
>>>>>>> loadmodule "acc.so"
>>>>>>> loadmodule "proto_hep.so"
>>>>>>> loadmodule "proto_udp.so"
>>>>>>> loadmodule "compression.so"
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users at lists.opensips.org
>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Răzvan Crainea
>>>>>> OpenSIPS Core Developer
>>>>>> http://www.opensips-solutions.com
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at lists.opensips.org
>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opensips.org
>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>
>>>
>>
>
> --
> Răzvan Crainea
> OpenSIPS Core Developer
> http://www.opensips-solutions.com
>
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
More information about the Users
mailing list