[OpenSIPS-Users] mysql problem on 1.5

Brett Nemeroff brett at nemeroff.com
Wed Apr 22 17:06:28 CEST 2009


Hey Bogdan,I've been working on this issue and just wanted to update the
group with my results. Keep in mind, this is a torture test.

So there were a number of issues, some DB, some opensips script.

Firstly, the opensips script:
I'm doing call limiting using the dialog module. The call limits are by
source IP address. I store a key/value pair in memcache for IP/Call Limits,
this way, I don't have to look up the calling limits every time I get a call
(yay memcache!).

Well, this is what I found out. And I'm not sure if maybe I'm doing
something wrong to cause this.. The first time OpenSIPs hits the DB to
discover the call limits for an IP (using avp_db_query) the DB returns an
INTEGER and stores it to memcache. Limits are checked and then the call is
allowed or rejected. Works fine.. no errors.. Any future calls (within the
memcache timeout) produce "type mismatch" error in the avp_check operation
that checks the call counts. This is because when cache_fetch retrieves the
account limits, it pulls the limits as a STRING, not an INT. So I had to do
a type conversion. BTW, I couldn't find anything when I searched for "type
casting" with opensips. So for those who are curious, look up {s.int}.

Calls, and the logic worked with that type error in there, but it spit out a
lot of error messages in the syslog. Since I put the type cast in there, no
more errors. And amazingly, I've about quadrupled my performance!

DB Issues:
In addition, I think there is some define locking contention issues with
MySQL/InnoDB with the ACC writes and my rating script reading. I have to run
my rating script every minute, or else, I'll get horribly backlogged with
calls to rate. My rating script can only process something like 800 calls
per second.. So if opensips is sustaining 200CPS @ 10 sec calls and I run my
rating script once every 60 seconds.. well you can see there is a finite
limit to the CPS I'll be able to achieve solely based on how quickly I can
rate the calls (which honestly I wasn't expecting).

To solve, some of my locking issues with MySQL, before my rating script
wakes up, I create a copy of the acc table into a MySQL temp table (with an
index of course, very important). Then process the copy. This solved a ton
of issues. Although, I'm not sure how "sane" it is.

Some questions for the group:
I know there are a lot of issues regarding pretending OpenSIPS can act like
a B2BUA and do proper billing. I use it for billing, and feel that it's
reasonably accurate. The acc module, as it is, is hard to work with (well,
for me at least). I have to do this fancy logic to group acc records into a
single unified call record that lists the call, duration, pdd, etc. The way
I'm doing it, works really well for me, but:
1. It's an external process to opensips
2. It has to play in the same tables as opensips, causing locking issues
3. Because acc logs actions are based on received signaling, there are
*many* db hits.

What I imagine, is perhaps a "cdr" module based on dialog callbacks. Similar
to the acc module but in general only write records on dialog completions
and optionally can write records on dialog establishments and intra-dialog
events (180, 183, 3XX, etc). End result being that a single dialog would
have a single record. Yes, I understand that this requires more processing
inside of opensips and probably limits the overall capacity, but I think
it's a well justified tradeoff.

Thoughts?
-Brett





On Wed, Apr 22, 2009 at 2:38 AM, Bogdan-Andrei Iancu <bogdan at voice-system.ro
> wrote:

> Hi Brett,
>
> I think the problem is at DB level - what you can try to do (just to spot
> the problem) is to stop whatever other apps/procs that are using the
> opensips DB and let only opensips runing on it. Redo the test and see if you
> get the same behaviour.
>
> Regards,
> Bogdan
>
> Brett Nemeroff wrote:
>
>> Bogdan,
>> I no longer get crashes. However the opensips process hangs pretty badly
>> while the DB operations are going on. I've tried to rewrite my queries to do
>> more small queries rather than longer slow ones.
>> So what I'm doing, I'm using sipp performing calls at 30CPS lasting 10
>> seconds (to generate a lot of call records).
>>
>> While this is running, I run my rating script, which gathers unique
>> callid. smashes records together into a cdr record.
>>
>> My database engine is InnoDB and I'm using transactions. I'm not actually
>> getting to a commit in any of this.
>>
>> So while my script is running. I see on the UAS side of sipp, it stops
>> receiving calls, and starts performing retransmissions. I've verified with
>> tshark that packets are hitting opensips, but not getting a reply.
>>
>> I have 20 children running. Am I doing something wrong?
>>
>> Thanks for your help,
>> Brett
>>
>>
>> On Wed, Apr 8, 2009 at 7:54 AM, Bogdan-Andrei Iancu <
>> bogdan at voice-system.ro <mailto:bogdan at voice-system.ro>> wrote:
>>
>>    Both.
>>
>>    Brett Nemeroff wrote:
>>
>>        Is that on the 1_5 branch or trunk?
>>
>>
>>        On Wed, Apr 8, 2009 at 7:45 AM, Bogdan-Andrei Iancu
>>        <bogdan at voice-system.ro <mailto:bogdan at voice-system.ro>
>>        <mailto:bogdan at voice-system.ro
>>        <mailto:bogdan at voice-system.ro>>> wrote:
>>
>>           Hi Brett,
>>
>>           thanks to your logs, I spoted the problem. The fix is
>>        available on
>>           SVN.
>>
>>
>>           Thanks and regards,
>>           Bogdan
>>
>>           Brett Nemeroff wrote:
>>
>>               Bogdan,
>>               For what it's worth, I've updated to latest 1_5 tonight
>>        (about
>>               20 minutes ago) and I still am having problems. Full out
>>               crashes as well.
>>
>>               I rewrote my queries so I'd have a bunch of little
>>        (select *
>>               from acc where callid=X) kinds of queries. Of course,
>>        there is
>>               a lot of DB activity while this happens. Crashes start to
>>               happen within seconds of the DB activity ramping up.
>>
>>               For grins, I slowed my queries down to ensure I only
>>        did one
>>               query per second (in my database, not opensips).. after
>>        about
>>               15-20 queries (different each time really) opensips
>>        would just
>>               crash.
>>
>>               I have acc and sip_trace loaded up, sip_trace isn't
>>        active for
>>               these calls. Also potentially relevant, my acc table is an
>>               InnoDB table.
>>
>>               Now if I slowed my call volume to 1CPS and keep the
>>        queries at
>>               1 QPS, it seemed to be happier, but still crashes
>>        eventually.
>>
>>               -Brett
>>
>>
>>
>>               On Mon, Apr 6, 2009 at 11:27 AM, Bogdan-Andrei Iancu
>>               <bogdan at voice-system.ro <mailto:bogdan at voice-system.ro>
>>        <mailto:bogdan at voice-system.ro <mailto:bogdan at voice-system.ro>>
>>               <mailto:bogdan at voice-system.ro
>>        <mailto:bogdan at voice-system.ro>
>>               <mailto:bogdan at voice-system.ro
>>        <mailto:bogdan at voice-system.ro>>>> wrote:
>>
>>                  Hi Brett,
>>
>>                  it looks like the DB connections are dropped and
>>        reconnect is
>>                  taking place (this are the errors about). But to
>>        find out
>>               the real
>>                  cause, I can enable some more logs to spot the
>>        reason for
>>                  re-connect...
>>
>>                  I will do it later as right now I'm in the middle of
>>        some DB
>>                  debugging and I'm afraid of mixing different patches and
>>               what goes
>>                  on SVN :)
>>
>>                  Regards,
>>                  Bogdan
>>
>>                  Brett Nemeroff wrote:
>>
>>                      Hi All,
>>                      So I'm doing some load testing with sipp on my
>>        opensips 1.5
>>                      system. I just checked out (like 2 hours ago,
>>        the 1.5
>>               branch
>>                      from SVN).  Everything works just fine, until I
>>        run some
>>                      rating scripts on my database (perl scripts
>>        accessing the
>>                      mysql db directly). While my scripts are
>>        running, I see the
>>                      UAS in sipp retransmitting the 200 OKs and the
>>               following gets
>>                      printed to the syslog:
>>                      http://www.pastebin.ca/1381169
>>
>>                      As soon as my perl script is done, the 200OKs stop
>>                      retransmitting...
>>                      My PERL script isn't doing anything terribly
>>        unusual,
>>               however,
>>                      it is performing the queries inside of a
>>        transaction,
>>                      including a "SELECT/DELETE * FROM acc WHERE "
>>        kind of
>>               clause.
>>
>>                      Any ideas as to what is causing this? I'm afraid
>>        I may be
>>                      losing call records..
>>
>>                      -Brett
>>
>>
>>  ------------------------------------------------------------------------
>>
>>                      _______________________________________________
>>                      Users mailing list
>>                      Users at lists.opensips.org
>>        <mailto:Users at lists.opensips.org>
>>               <mailto:Users at lists.opensips.org
>>        <mailto:Users at lists.opensips.org>>
>>               <mailto:Users at lists.opensips.org
>>        <mailto:Users at lists.opensips.org>
>>               <mailto:Users at lists.opensips.org
>>        <mailto:Users at lists.opensips.org>>>
>>
>>
>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.opensips.org/pipermail/users/attachments/20090422/9327cfe2/attachment-0001.htm 


More information about the Users mailing list