<div dir="ltr">Thanks for that Johan - I hadn't thought about that aspect. All theoretic at the moment, but IBM Voice Gateway, at least, does claim to be able to handle it using SIPREC - so maybe they are confident about their ability to differentiate between caller and callee in a single stream?...<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">"The voice gateway provides the ability to transcribe caller and callee (e.g. contact-center agent) audio from an active phone call in real time using the SIPREC protocol." - <a href="https://www.ibm.com/docs/en/voice-gateway?topic=gateway-about-voice">https://www.ibm.com/docs/en/voice-gateway?topic=gateway-about-voice</a><br></blockquote><div> </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 17 Sept 2021 at 10:33, johan <<a href="mailto:johan@democon.be">johan@democon.be</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>The issue with siprec (based on rtpproxy) is that you have only 1
stream containing the voice from caller to callee and callee to
caller. So that will give a hard time on the ASR :-). I do know
that rtpengine has something similar to siprec but I don't know
the details. <br>
</p>
<p><br>
</p>
<p>Bottom line, in my opinion, you need to have 2 separate streams
before you can start STT. <br>
</p>
<p><br>
</p>
<p>wkr, <br>
</p>
<p><br>
</p>
<div>On 17/09/2021 11:04, Mark Allen wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">I'm just starting to look at Speech-to-Text (STT)
processing for calls - initially recordings but moving on to
real-time. I would see this working along the lines of either:
<div><br>
</div>
<div>- a call is recorded, and when the call ends an event is
triggered to initiate transcription of the recording</div>
<div>- a call starts, the RTP is forked to the STT engine which
sends real-time transcription<br>
<div><br>
</div>
<div>I can see that with OpenSIPS, the SIPREC and Media
Exchange modules allow for forking of the RTP, providing a
means of sending the data for processing, but is anybody
actually doing this? If so, what has been your experience?
Is there a toolset that works well with this (e.g. IBM Voice
Gateway, Google, Amazon etc)? </div>
</div>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Users mailing list
<a href="mailto:Users@lists.opensips.org" target="_blank">Users@lists.opensips.org</a>
<a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users" target="_blank">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a>
</pre>
</blockquote>
</div>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@lists.opensips.org" target="_blank">Users@lists.opensips.org</a><br>
<a href="http://lists.opensips.org/cgi-bin/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.opensips.org/cgi-bin/mailman/listinfo/users</a><br>
</blockquote></div>