<div dir="ltr">Thanks for that Johan - I hadn't thought about that aspect. All theoretic at the moment, but IBM Voice Gateway, at least, does claim to be able to handle it using SIPREC - so maybe they are confident about their ability to differentiate between caller and callee in a single stream?...<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">"The voice gateway provides the ability to transcribe caller and callee (e.g. contact-center agent) audio from an active phone call in real time using the SIPREC protocol." - <a href="https://www.ibm.com/docs/en/voice-gateway?topic=gateway-about-voice">https://www.ibm.com/docs/en/voice-gateway?topic=gateway-about-voice</a><br></blockquote><div> </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 17 Sept 2021 at 10:33, johan <<a href="mailto:johan@democon.be">johan@democon.be</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <p>The issue with siprec (based on rtpproxy) is that you have only 1

      stream containing the voice from caller to callee and callee to

      caller. So that will give a hard time on the ASR :-).  I do know

      that rtpengine has something similar to siprec but I don't know

      the details. <br>

    </p>

    <p><br>

    </p>

    <p>Bottom line, in my opinion, you need to have 2 separate streams

      before you can start STT. <br>

    </p>

    <p><br>

    </p>

    <p>wkr, <br>

    </p>

    <p><br>

    </p>

    <div>On 17/09/2021 11:04, Mark Allen wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">I'm just starting to look at Speech-to-Text (STT)

        processing for calls - initially recordings but moving on to

        real-time. I would see this working along the lines of either: 

        <div><br>

        </div>

        <div>- a call is recorded, and when the call ends an event is

          triggered to initiate transcription of the recording</div>

        <div>- a call starts, the RTP is forked to the STT engine which

          sends real-time transcription<br>

          <div><br>

          </div>

          <div>I can see that with OpenSIPS, the SIPREC and Media

            Exchange modules allow for forking of the RTP, providing a

            means of sending the data for processing, but is anybody

            actually doing this? If so, what has been your experience?

            Is there a toolset that works well with this (e.g. IBM Voice

            Gateway, Google, Amazon etc)? </div>

        </div>

      </div>

      <br>

      <fieldset></fieldset>

      <pre>_______________________________________________

Users mailing list

<a href="mailto:Users@lists.opensips.org" target="_blank">Users@lists.opensips.org</a>

<a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users" target="_blank">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a>

</pre>

    </blockquote>

  </div>

_______________________________________________<br>

Users mailing list<br>

<a href="mailto:Users@lists.opensips.org" target="_blank">Users@lists.opensips.org</a><br>

<a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a><br>

</blockquote></div>