[OpenSIPS-Devel] Question on tm timers handling

Ovidiu Sas osas at voipembedded.com
Tue Apr 5 13:26:40 UTC 2022


Hello Bogdan,

No typos here :)

Let me rephrase with some examples:
 - at time n, both tm-timer and tm-utimer are scheduled
 - tm-timer is scheduled first, followed by tm-utimer
 - tm-timer starts running (grabbing the .ex_lock) while tm-utimer is
waiting for the .ex_lock to be released by tm-timer
 - tm-timer takes 150ms to complete its task
 - at time n + 100ms, tm-utimer is scheduled again, but because the
initial run of the tm-utimer is not completed (still waiting for the
.ex_lock), the warning is issued
 - after 50ms (time is now n + 150ms) the tm-timer job is completed,
then the tm-utimer (that was scheduled at time n) takes its turn and
finishes in 10ms.
 - now the time is n + 160ms and both tm-timer and tm-utimer jobs are done
 - at time n + 200ms, tm-utimer is scheduled again and all is back to normal

Hope this clarifies how tm-timer is interfering with tm-utimer.
The timer that is taking a long time to be processed is the WT-TIMER
inside tm-timer and this is triggering the warning on the tm-utimer.
The tm-utimer works fine, but we have this misleading warning on it
because of the tm-timer.

-ovidiu

On Tue, Apr 5, 2022 at 3:59 AM Bogdan-Andrei Iancu <bogdan at opensips.org> wrote:
>
> Hi Ovidiu,
>
> A first quick note :). You mentioned the tm_utimer as the problematic
> one - this is the 100ms based timer, used ONLY for the outbound
> retransmissions. This conflicts with your later finding on WT_TIMER
> which actually is on tm_timer, not tm_utimer.
>
> So, just some typos here, or ? :)
>
> Best regards,
>
> Bogdan-Andrei Iancu
>
> OpenSIPS Founder and Developer
>    https://www.opensips-solutions.com
> OpenSIPS eBootcamp 23rd May - 3rd June 2022
>    https://opensips.org/training/OpenSIPS_eBootcamp_2022/
>
> On 4/4/22 8:16 AM, Ovidiu Sas wrote:
> > Hello all,
> >
> > The tm module handles all it's internal timers via two handlers:
> >   - timer_routine (second based timers)
> >   - utimer_routine (100ms based timers)
> > Each of these routines handles 4 different timers each.
> > Both routines are very similar in functionality and there is no timer
> > that is handled by both routines.
> > Because both routines are protected by the same lock
> > (timertable[(long)set].ex_lock), these two routines cannot run in
> > parallel (assuming that we have only one set, i.e. a single
> > timer_partition).
> >
> > In my testing, I noticed that the tm_utimer routine has difficulties
> > running smoothly.
> > After doing more testing and some profiling, it looks like the culprit
> > is the WT_TIMER.
> > For around 10-15K records in the WT_TIMER detached timer list, we
> > spend around 3ms to create the list and 200-300ms to
> > run_handler_for_each. Because of this, the tm_utimer (which is
> > scheduled to run every 100ms) is blocked by the lock on the first run
> > and on the second run the scheduler detects that the previous run is
> > still running (waiting for the lock) and therefore issues the famous
> > "already scheduled" warning.
> >
> > The check_and_split_time_list function has its own locks and then each
> > handlers operates on its own list (with locks for dealing with cells),
> > so why do we have the timertable[(long)set].ex_lock?
> >
> > I removed the lock, tested with one single timer_partition, then with
> > two timer_partitions and the performance increased dramatically. Is
> > there a reason for keeping this lock or is it something that was
> > inherited and nobody bothered to check why and remove it?
> >
> > Thanks,
> > Ovidiu
> >
>


-- 
VoIP Embedded, Inc.
http://www.voipembedded.com



More information about the Devel mailing list