[OpenSIPS-Users] Command "opensipsctl dialplan reload" randomly hangs
Bogdan-Andrei Iancu
bogdan at voice-system.ro
Mon May 3 10:00:26 CEST 2010
Hi Dan,
The vital information is the backtrace (with gdb) of the FIFO process
while blocked = as there is not CPU usage, I say the procs stuck in a
IO op. BTW, try to run the test with debug=6 (if possible).
If you need help with the gdb syntax, let me know.
Regards,
Bogdan
DanB wrote:
> Hey Bogdan,
>
> Hereby some more tests (I should mention that I cannot block on demand
> the fifo, it simply randomly blocks, so I must be lucky when I am
> trying to reproduce it).
>
> Based on ps the cpu and memory consuption do not increase during the
> hang (CPU:0.0 Memory:1.0).
>
> Ta,
> DanB
>
> Log of the actions:
> -------------------------------------------------------------------
>
> Before reload:
>
> sip1:~# opensipsctl ps|grep FIFO
>
> Process:: ID=13 PID=26368 Type=MI FIFO
>
> sip1:~# ps uww -p 26368
>
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
>
> opensips 26368 0.0 1.0 1147236 10788 ? S 07:30 0:00
> /usr/sbin/opensips -P /var/run/opensips/opensips.pid -m 1024 -u opensips -g
> opensips
>
> sip1:~#
>
>
>
> During hanging reload:
>
> sip1:~# ps uww -p 26368
>
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
>
> opensips 26368 0.0 1.0 1147644 11040 ? R 07:30 0:00
> /usr/sbin/opensips -P /var/run/opensips/opensips.pid -m 1024 -u opensips -g
> opensips
>
>
> After reload interrupted with crtl+c and /etc/init.d/opensips restart
>
>
> sip1:~# ps uww -p 26368
>
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
>
> sip1:~# opensipsctl ps|grep FIFO
>
> Process:: ID=13 PID=26716 Type=MI FIFO
>
> sip1:~# ps uww -p 26716
>
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
>
> opensips 26716 0.0 0.1 1147164 1988 ? S 14:59 0:00
> /usr/sbin/opensips -P /var/run/opensips/opensips.pid -m 1024 -u opensips -g
> opensips
>
> On Mon, Apr 26, 2010 at 12:25 PM, Bogdan-Andrei Iancu
> <bogdan at voice-system.ro> wrote:
>
>> Hi Dan,
>>
>> Your descriptions point to a blocked fifo process. Blocking maybe
>> because of some internal locking (you see 99% cpu usage) or some I/O
>> (normal cpu usage).
>>
>> So, do the followings:
>>
>> 1) do "opensipsctl fifo ps" to see the PID of the fifo process
>> 2) make fifo to block
>> 3) check if the fifo process (by pid) is there - if yes, see how much
>> cpu it uses and try to attache with gdb to it to get a backtrace.
>>
>> Regards,
>> Bogdan
>>
>> DanB wrote:
>>
>>> Hey Bogdan,
>>>
>>> Thanks for coming back so fast.
>>>
>>> There was no error reported neither on console nor in the syslog
>>> (debug 7). I will need to check for dead process since all I could
>>> spot was no reply back and console hanging, and be able to stop it
>>> only with Ctrl+C. After Ctrl+C could not get the any other opensipsctl
>>> commands to work.
>>> Will need to wait few days more to know about dead process.
>>>
>>> DanB
>>>
>>> On Thu, Apr 22, 2010 at 6:55 PM, Bogdan-Andrei Iancu
>>> <bogdan at voice-system.ro> wrote:
>>>
>>>
>>>> Hi Dan,
>>>>
>>>> Did you notice any error from the fifo process during the reload ? it
>>>> may be something related to locking (during reload) of the table -> this
>>>> may affect all the other processes.
>>>>
>>>> So, any errors? any dead processes (like fifo one) ?
>>>>
>>>> Regards,
>>>> Bogdan
>>>>
>>>> DanB wrote:
>>>>
>>>>
>>>>> Hey Guys,
>>>>>
>>>>> I have noticed the "opensipsctl dialplan reload" command randomly
>>>>> hanging, sometimes even the server itself becoming non responsible,
>>>>> other times reloading the dialplan into memory but not reporting
>>>>> anything on console, the last one becoming unusable until server
>>>>> restart. This happened in the past as well but with the traffic
>>>>> increase, it becomes more and more annoying. I suspect the same bug
>>>>> which was present in the past with fifo hanging.
>>>>> I should mention that I got about 2000 records in the dialplan table,
>>>>> so I would say not that much loaded.
>>>>>
>>>>> The version I am running:
>>>>> sip1:/home/employee/dan# opensips -V
>>>>> version: opensips 1.6.1-notls (x86_64/linux)
>>>>> flags: STATS: Off, USE_IPV6, USE_TCP, DISABLE_NAGLE, USE_MCAST,
>>>>> SHM_MEM, SHM_MMAP, PKG_MALLOC, F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
>>>>> ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16,
>>>>> MAX_URI_SIZE 1024, BUF_SIZE 65535
>>>>> poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
>>>>> svnrevision: 2:6509M
>>>>> @(#) $Id: main.c 6169 2009-09-22 12:48:37Z bogdan_iancu $
>>>>> main.c compiled on 14:43:30 Jan 11 2010 with gcc 4.3.2
>>>>>
>>>>>
>>>>> All I could find in the logs was database reconnection in the case of
>>>>> hang but no error or something else reported (running debug 7).
>>>>> Bellow some of the log:
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:mi_fifo:mi_fifo_server: entered consume
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:mi_fifo:mi_fifo_server: **** done consume
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:mi_fifo:mi_fifo_server: done parsing the mi tree
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]: DBG:dialplan:dp_load_db: init
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> INFO:db_mysql:db_mysql_submit_query: disconect event for 0x77f060
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> INFO:db_mysql:reset_all_statements: reseting all statements on
>>>>> connection: (0x77fd18) 0x77f060
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_connect: opening connection:
>>>>> mysql://xxxx:xxxx@192.168.11.253/sipeandb
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_connect: connection type is 192.168.11.253 via
>>>>> TCP/IP
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_connect: protocol version is 10
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_connect: server version is 5.0.51a-24+lenny1-log
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> INFO:db_mysql:db_mysql_submit_query: re-connected successful for
>>>>> 0x77f060
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:core:db_new_result: allocate 48 bytes for result set at 0x784258
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: 8 columns returned from the query
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:core:db_allocate_columns: allocate 224 bytes for result columns at
>>>>> 0x7842a0
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7842e0)[0]=[dpid]
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: use DB_INT result type
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7842f0)[1]=[pr]
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: use DB_INT result type
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x784300)[2]=[match_op]
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: use DB_INT result type
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x784310)[3]=[match_exp]
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: use DB_STRING result type
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x784320)[4]=[match_len]
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: use DB_INT result type
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x784330)[5]=[subst_exp]
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: use DB_STRING result type
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x784340)[6]=[repl_exp]
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: use DB_STRING result type
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x784350)[7]=[attrs]
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_get_columns: use DB_STRING result type
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:db_mysql:db_mysql_fetch_result: converting row 0 of 2042 count 500
>>>>> Apr 22 14:27:19 sip1 /usr/sbin/opensips[16299]:
>>>>> DBG:core:db_allocate_rows: allocate 136000 bytes for result rows and
>>>>> values at 0x79c5d8
>>>>>
>>>>> Ta,
>>>>> DanB
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at lists.opensips.org
>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> Bogdan-Andrei Iancu
>>>> www.voice-system.ro
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opensips.org
>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opensips.org
>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>
>>>
>>>
>> --
>> Bogdan-Andrei Iancu
>> www.voice-system.ro
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opensips.org
>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>
>>
>
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
>
--
Bogdan-Andrei Iancu
www.voice-system.ro
More information about the Users
mailing list