<div dir="ltr">I'm just starting to look at Speech-to-Text (STT) processing for calls - initially recordings but moving on to real-time. I would see this working along the lines of either: <div><br></div><div>- a call is recorded, and when the call ends an event is triggered to initiate transcription of the recording</div><div>- a call starts, the RTP is forked to the STT engine which sends real-time transcription<br><div><br></div><div>I can see that with OpenSIPS, the SIPREC and Media Exchange modules allow for forking of the RTP, providing a means of sending the data for processing, but is anybody actually doing this? If so, what has been your experience? Is there a toolset that works well with this (e.g. IBM Voice Gateway, Google, Amazon etc)? </div></div></div>