To achieve our solution, we are going to do the following:
-
On the view-thread page, we are going to provide a button next to Reply named Reply with Audio to keep things simple
- The user is going to record an audio to our server using this interface
- The audio will be saved on our servers temporarily
- The uploaded audio will then be sent to Cloud Speech API to detect text
- Once we get back the response, we upload this to Cloudinary, a file hosting service
- Once the upload to Cloudinary is successful, we will get back the public URL
- Using all the data we have gathered so far, we are going to create a new message and then respond with it
- The Angular app will process the response and update the thread
Uploading audio to Cloudinary is optional. I implemented it to show an end-to-end flow.
Before we start the implementation, please make sure you have an API key for Cloudinary.