[OpenSIPS-Users] OpenSIPS timers

Tue Apr 12 03:43:04 UTC 2022

Just to conclude the thread. The issue here was high load combined
with the fact that tm has two timers (a second based timer *tm-timer*
that runs every second and millisecond based timer *tm-utimer* that
runs every 200ms). Both timers are protected by the same lock and the
timers cannot run in parallel. The second based timer *tm-timer*
sometimes takes more then 200ms to complete which prevents the
millisecond based timer *tm-utimer* to be executed in its 200ms
window.

-ovidiu

On Fri, Apr 1, 2022 at 10:10 AM Ovidiu Sas <osas at voipembedded.com> wrote:
>
> Hello Bogdan,
>
> During my test, it was tm-utimer only. It was a typo on my side.
>
> I also see in the logs from time to time the other timers too,
> including tm-timer.
>
> What I noticed in my tests is that as soon as I increase the
> timer_partitions, the system is able to handle less cps until workers
> are becoming 100% loaded and calls starts failing (due to
> retransmissions and udp queue being full - the udp queue is quite big
> to accommodate spikes).
>
> Is there a way to make the timer lists more efficient (in terms of ops
> in shared memory)?
>
> Please take a look at the mentioned ticket as it makes the ratelimit
> module unusable (and maybe with side effects for other modules that
> require accurate timeslots).
> Basically, for a timer that is supposed to fire every second, the
> observed behaviour is that the timer fires at approx 1s (or less by a
> few ms) and then from time to time it fires at 1.8s and the cycle
> repeats.
>
> Thanks,
> Ovidiu
>
> On Fri, Apr 1, 2022 at 9:48 AM Bogdan-Andrei Iancu <bogdan at opensips.org> wrote:
> >
> > Hi Ovidiu,
> >
> > Originally you mentioned tm-utimer, now tm-timer....which one is ? As it
> > is very important.
> >
> > When increasing the timer_partitions, what you mean by "instability" of
> > the system?
> >
> > Yes, in the reactor, the UDP workers may handle timer jobs also beside
> > the UDP traffic. While the timer procs are 100% dedicated to the timer
> > jobs only. So yes, if the workers are idle, they can act as any timer
> > procs also.
> >
> > Increasing the TM_TABLE_ENTRIES should not impact too much, at the
> > performance over the timer lists (in TM) has nothing to do with the size
> > of the hash table.
> >
> > I will check the mentioned ticket, but if what you are saying is true on
> > the HP malloc, it means the bottle neck is actually in the ops on the
> > shared memory.
> >
> > Best regards,
> >
> > Bogdan-Andrei Iancu
> >
> > OpenSIPS Founder and Developer
> >    https://www.opensips-solutions.com
> > OpenSIPS eBootcamp 23rd May - 3rd June 2022
> >    https://opensips.org/training/OpenSIPS_eBootcamp_2022/
> >
> > On 4/1/22 12:31 AM, Ovidiu Sas wrote:
> > > Hello Bogdan,
> > >
> > > Thank you for looking into this!
> > >
> > > I get warnings mostly from tm-timer. I've seen warnings from
> > > blcore-expire, dlg-options-pinger, dlg-reinvite-pinger, dlg-timer (in
> > > the logs, but not during my testing).
> > > While testing, I saw only the tm-timer warnings.
> > >
> > > I took a superficial look at the "timer_partitions" and your
> > > explanation matches my findings. However, increasing the
> > > "timer_partitions" makes the system unstable (doesn't matter how many
> > > timer procs we have).
> > > I found that I can get the most out of the system if one
> > > "timer_partiton" is used along with one timer_proc.
> > >
> > > With the reactor scheme, a UDP receiver can handle timer jobs, is that
> > > right? If yes, if the UDP workers are idle, there are enough resources
> > > to handle timer jobs, correct?
> > >
> > > I was also increasing the TM_TABLE_ENTRIES to (1<<18) and there was a
> > > little bit of performance increase, but I will need to test more to
> > > come up with a valid conclusion.
> > >
> > > On the other hand, I noticed a strange behavior on timer handling.
> > > Take a look at:
> > > https://github.com/OpenSIPS/opensips/issues/2797
> > > Not sure if this is related to the warnings that I'm seeing.
> > >
> > > The biggest performance improvement was switching to HP_MALLOC for
> > > both pkg and shm memory.
> > >
> > > I will keep you posted with my findings,
> > > Ovidiu
> > >
> > > On Thu, Mar 31, 2022 at 10:28 AM Bogdan-Andrei Iancu
> > > <bogdan at opensips.org> wrote:
> > >> Hi Ovidiu,
> > >>
> > >> As warnings from the timer_ticker, do you get only for the tm-utimer
> > >> task ? I'm asking as the key question here is where the bottleneck is :
> > >> in the whole "timer" subsystem, or in the tm-utimer task only?
> > >>
> > >> The TM "timer_partitions" creates multiple parallel timer lists, to
> > >> avoid having large "amounts" of transactions handled at a moment in a
> > >> single tm-utimer task (but rather split/partition the whole of amount of
> > >> handled transactions into smaller chunks, to be handled one at a time in
> > >> the timer task.
> > >>
> > >> The "timer_workers" creates  more than one dedicated processes for
> > >> handling the timer tasks (so scales up the timer sub-system).
> > >>
> > >> If you get warnings only on tm-utimer, I suspect the bottleneck is TM
> > >> related, mainly on performing re-transmissions (that's what that task is
> > >> doing). So the increasing the timer-partitions should be the way to help.
> > >>
> > >> Best regards,
> > >>
> > >> Bogdan-Andrei Iancu
> > >>
> > >> OpenSIPS Founder and Developer
> > >>     https://www.opensips-solutions.com
> > >> OpenSIPS eBootcamp 23rd May - 3rd June 2022
> > >>     https://opensips.org/training/OpenSIPS_eBootcamp_2022/
> > >>
> > >> On 3/24/22 12:54 AM, Ovidiu Sas wrote:
> > >>> Hello all,
> > >>>
> > >>> I'm working on tuning an opensips server. I get this pesky:
> > >>> WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled
> > >>> I was trying to get rid of them by playing with the tm
> > >>> timer_partitions parameter and the timer_workers core param.
> > >>> By increasing any of them doesn't increase performance.
> > >>> By increasing both of them, it actually decreases performance.
> > >>> The server is not at limit, the load on the UDP workers is around
> > >>> 50-60 with some spikes.
> > >>> I have around 3500+ cps sipp traffic.
> > >>>
> > >>> My understanding is that by increasing the number of timer_partitions,
> > >>> we will have more procs walking in parallel over the timer structures.
> > >>> If we have on timer structure, we have one proc walking over it.
> > >>> How is this working for two timer structures? What is the difference
> > >>> between the first and the second timer structure? Should we expect
> > >>> less work for each proc?
> > >>>
> > >>> For now, to reduce the occurrence of the warning log, I increased the
> > >>> timer interval for tm-utimer from 100ms to 200ms. This should be ok as
> > >>> the timer has the TIMER_FLAG_DELAY_ON_DELAY flag set.
> > >>>
> > >>> Thanks,
> > >>> Ovidiu
> > >>>
> > >
> >
>
>
> --
> VoIP Embedded, Inc.
> http://www.voipembedded.com

-- 
VoIP Embedded, Inc.
http://www.voipembedded.com