Natural Language Processing for call reason identification
Industrial Work
UnitedHealth Group (Optum), India, August 2022 - December 2022
- Recordings of customer and agent was converted to transcripts using state of the art speech to text libraries (Whisper - OpenAI)
- Transcripts were summarized using BART based model
- Clustering algorithms were used based on cosine similarity to group together similar transcripts
- Latent Dirchlet Allocation (LDA) was used to pick out the topics being discussed within the transcripts
- LDA was used to curate a labelled data set
- The labelled data set was used to retrain an exsisting model to predict the topic of discussion for a given summary of the transcript
- The pipeline for speech to text was deployed in GCP kubernetes and exposed via fast api