[OpenSIPS-Devel] Question on tm timers handling

Bogdan-Andrei Iancu bogdan at opensips.org
Tue Apr 5 07:59:23 UTC 2022


Hi Ovidiu,

A first quick note :). You mentioned the tm_utimer as the problematic 
one - this is the 100ms based timer, used ONLY for the outbound 
retransmissions. This conflicts with your later finding on WT_TIMER 
which actually is on tm_timer, not tm_utimer.

So, just some typos here, or ? :)

Best regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com
OpenSIPS eBootcamp 23rd May - 3rd June 2022
   https://opensips.org/training/OpenSIPS_eBootcamp_2022/

On 4/4/22 8:16 AM, Ovidiu Sas wrote:
> Hello all,
>
> The tm module handles all it's internal timers via two handlers:
>   - timer_routine (second based timers)
>   - utimer_routine (100ms based timers)
> Each of these routines handles 4 different timers each.
> Both routines are very similar in functionality and there is no timer
> that is handled by both routines.
> Because both routines are protected by the same lock
> (timertable[(long)set].ex_lock), these two routines cannot run in
> parallel (assuming that we have only one set, i.e. a single
> timer_partition).
>
> In my testing, I noticed that the tm_utimer routine has difficulties
> running smoothly.
> After doing more testing and some profiling, it looks like the culprit
> is the WT_TIMER.
> For around 10-15K records in the WT_TIMER detached timer list, we
> spend around 3ms to create the list and 200-300ms to
> run_handler_for_each. Because of this, the tm_utimer (which is
> scheduled to run every 100ms) is blocked by the lock on the first run
> and on the second run the scheduler detects that the previous run is
> still running (waiting for the lock) and therefore issues the famous
> "already scheduled" warning.
>
> The check_and_split_time_list function has its own locks and then each
> handlers operates on its own list (with locks for dealing with cells),
> so why do we have the timertable[(long)set].ex_lock?
>
> I removed the lock, tested with one single timer_partition, then with
> two timer_partitions and the performance increased dramatically. Is
> there a reason for keeping this lock or is it something that was
> inherited and nobody bothered to check why and remove it?
>
> Thanks,
> Ovidiu
>




More information about the Devel mailing list