[OpenSIPS-Devel] [opensips] error in infinite loop "io_watch_del: BUG - trying to del fd 34 with flags 2 1" (#591)

Bogdan Andrei IANCU notifications at github.com
Mon Aug 10 16:24:04 CEST 2015


@rgupta0110 , I did a lot of testing and research on this. Still I was not able to reproduce the the behavior you describe (at least not the last step).  While I have the conn in SYN_SENT, if (A) conn is expired by opensips, it is simply removed or if (b) the kernel is timing it out (due low tcp_syn_retries), I get the "connection %p fd %d is now writable" message and it is closed.

What is really strange in your case is the fact that the FD never gets added to poll with the READ indication (if you grep in your log for FD 34 in proc 13168) for io_watch_xxx ops, you will see that the FD is added only for WRITE:
 
  DBG:core:io_watch_add: io_watch_add op on 34 (0x7e5f40, 34, 2, 0x7f18fc1809e8,2), fd_no=27

Still, the triggering on FD 34:

  DBG:core:handle_tcpconn_ev: data available on 0x7f18fc1809e8 34

shows a READ triggering (handle_tcpconn_ev is triggered by epoll only with IO_WATCH_READ..

So, I my only logical assumption is that we deal with a mis-mapping of the epoll event to the IO_WATCH event. What I suspect is the EPOLLHUP epoll event, which according to the man : 

             Hang up happened on the associated file descriptor.
              epoll_wait(2) will always wait for this event; it is not
              necessary to set it in events.  Note that when reading from a
              channel such as a pipe or a stream socket, this event merely
              indicates that the peer closed its end of the channel.
              Subsequent reads from the channel will return 0 (end of file)
              only after all outstanding data in the channel has been
              consumed.

So, even if added by epoll_ctl() for EPOLLOUT event, our FD reports the EPOLLHUP which is automatically translated by the io_wait epoll implementation (see io_wait_loop_epoll()) to a IO_WATCH_READ event. Which leads to the described bug.

Now, I need to confirm this . For this, in io_wait_loop_epoll(), line 833, change the "#if 0" into an "#if 1" to print the actual epoll events.
Also, in io_watch_del(), after the LM_ERR printing the bug log "BUG - trying to del fd %d with flags %d %d" (line 606), add an "abort();" to force opensips to core dump when getting there - this will help us to investigate the whole function's stack.
Please do the above changes, recompile and run the test again. Be sure with the "ulimit -c unlimited" in order to get a core file.
Send me the opensips logs again, plus the backtrace from the core file.

Thanks and regards,
Bogdan

---
Reply to this email directly or view it on GitHub:
https://github.com/OpenSIPS/opensips/issues/591#issuecomment-129471918
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/devel/attachments/20150810/4bacc2fd/attachment.htm>


More information about the Devel mailing list