Artificial intelligence (AI) has been exploding in popularity as it becomes more publicly available and finds more applications in business, art, and personal life. While AI has improved significantly since its inception, it still has limits. Its most successful use cases tend to be with processes and activities that are structured and repeatable, where it can learn and follow a pattern.
After our leadership team completed an executive course on AI Business Strategy with MIT Sloan, they were inspired to find new ways to implement it into our existing workflows. A natural choice was our popular Data and Technology Roadmap service offering.
Everest’s Data and Technology Roadmap process is designed to assess the strategic alignment of data and technology to your key business processes. Combining our core skills and partnership with strategic experts at Parallax Life Sciences Consulting, we apply a structured, proven framework to understand your organization’s current state, long-term vision, and drivers of success, much like navigating a well-defined roadmap. For a recent project, we enlisted AI to assist in capturing, synthesizing, and extracting insights gathered from the interview stage of our process.
Capture
AI first came into play in the recording and transcribing of interviews. Historically, we just recorded the interview video and audio for our reference, using tools like OBS and the native Zoom recording function. For this project, we took it a step further and utilized real-time transcription tools.
The most important performance indicators we looked for with these AI-powered transcriptions were attribution – who said what – and diarization – who spoke when. The first tool we reviewed was Microsoft Teams, which had the advantage of being native to the meeting software. Four years into remote work and virtual meetings, we’re all familiar with how the little squares light up or become outlined to indicate which participant is speaking. Microsoft Teams’ AI transcript had this data readily available when attributing speech to each participant and indicating when they started and stopped talking. We found that Microsoft Teams was the clear winner regarding attribution and diarization. Also, we found the live transcript function was helpful for manual notetaking, as it allowed our human notetaker to follow the conversation while capturing interview answers.
However, its shortfalls led us to experiment with other tools. We found that there were two methods we could use: uploading a pre-recorded mp4 video file to the tool and producing a transcript or having the tool’s bot join the meeting and record the transcript in real-time.
Our first foray was with Otter.ai, which used the method of uploading a file to the service. With all tools, this method was consistently terrible for attribution. Not only did it cause them to misattribute sections of speech, but they also labeled speakers like “Speaker 1” rather than by name, as the real-time transcriptions did. Other tools we tried were Supernormal and Fireflies.ai. Their bots joined the meeting and had moderate success. Fireflies.ai was marginally better at attribution and diarization, but neither was as successful as Microsoft Teams.
Synthesis
After completing eleven interviews, each between thirty and ninety minutes long, we had a lot of content to dig through. In previous projects, our team set up a matrix of functional areas and topics we asked about before spending days reviewing our notes ad nauseam while painstakingly filling in the matrix.
In this project, we experimented with allowing AI to do the heavy lifting. We primarily used ChatGPT, both the team and individual versions. The individual version served mostly as a proof of concept as well as a tool to check on hallucinations (more on hallucinations later). We uploaded the transcript file to ChatGPT and asked it to read it and answer questions like “What were the drivers of success identified for the company?”
We got a little more advanced with the team edition of ChatGPT, uploading all the transcript files together. This allowed us to synthesize the data across all interviews and identify when interviewees agreed or differed on their opinions.
Analysis
Beyond synthesizing the data, we wanted to use AI to extract meaningful insights. Before, we would use the matrix mentioned earlier to help us identify whether the stakeholders had a unified vision for the organization or had different interpretations of the mission. We also looked for places to promote synergy across functional areas and optimize technology and data use, like closing informational gaps between the marketing and sales teams with a CRM or introducing patient claims data to assist with market validation, targeting exercises, and forecasting.
To do this with AI, we spent lots of time querying the ChatGPT bot about what different interviewees said. We learned a great deal about prompt engineering in this process. ChatGPT’s answers are only as good as your question, and even then, may leave something to be desired. Asking questions more than once or in various ways was necessary to get the answers we needed.
Successes
So, did using AI for the Data and Technology Roadmap work? Ultimately, yes.
Using a hybrid set of tools produced the most successful result. Combining the transcript and recording capabilities of Microsoft Teams and the synthesis and analysis capabilities of ChatGPT (Team Version) produced meeting summaries that were over 90% accurate and over 90% complete. In some cases, we loaded multiple transcripts into a GPT with RAG (retrieval augmented generation) to synthesize data and sentiment across interviews and interviewees.
This capability still needs some improvement, but it enabled us to create our deliverable: a comprehensive and detailed roadmap of our customer’s data and technology needs and recommendations to fulfill them.
The potential value of synthesizing twelve hours of interview content into a single structured output with mere minutes of work allows us to scale the data and technology roadmap service. For an offering that is so valuable, but that typically requires several consultants and weeks of work conducting interviews, reviewing the data, and workshopping the deliverable, this is promising.
A side effect of this experiment was increasing our comfort level with Microsoft Teams recording, transcription, and meeting AI enough that we now use it routinely for all important internal and external meetings.
One of our favorite outcomes of using AI on this project is that it enables our team to focus on the conversation during our interviews. We can be engaged with our interviewees, removing the burden of pauses and repeat requests caused by note-taking activities. Who would have thought AI could make us feel more human?
"Who would have thought AI could make us feel more human?"
Shortfalls
Despite the successes, there is still plenty of room for improvement. Hallucinations are instances where artificial intelligence generates outputs that are inaccurate, nonsensical or unrelated to the input data or task at hand. A few we saw in this project were:
- A tactic our interviewers commonly use is providing the interviewee with an example or a prompt to help illustrate our question and gain their insight. Often, the AI would capture these statements as conclusive, failing to recognize that they were meant to elicit a response.
- We got better at engineering our prompts, but learned that we often needed to ask questions multiple times anyway. For example, we asked about all the interviewee responses about the company vision. The AI told us the responses from ten of eleven people. We asked, “And what did Bob say?” and finally got an answer about Bob’s response to the question.
- If we asked the same question more than once, formulated the same way, we were guaranteed to get multiple different answers from the GPT.
Several shortfalls came earlier in the process during the transcription stage. No tool was perfect, and often speaker attribution and diarization were full of errors. People who hadn’t yet joined the meeting had quotes attributed to them. Natural speech patterns demonstrating active listening led to diarization mistakes, with sections of speech misattributed.
Additionally, certain key terms were often missing or misconstrued in the transcription. For example, the customer has a role that is usually called by its abbreviation, “OAM.” However, the transcripts almost exclusively spelled it “OEM.” The company and product names were also consistently misspelled in various creative ways. The same happened for key industry and company-specific terms. We found ourselves wishing we could pre-seed the NLP transcription tool with domain jargon and abbreviations to help it understand key terms. If we could have told it the word “uveal” would show up a lot, maybe it would have caught it instead of transcribing it as “evil.”
Another pain point we experienced was finding the meeting recaps after the fact. Microsoft Teams saves the recaps in the meeting seemingly. To locate the recap, you must find the meeting again, which often requires either: knowing what date and time it was and looking for it on the calendar view; or scrolling through an endless list of chat records. Once you find the recap, it’s pretty useful! We even found that it became even more useful throughout the project.
A Sea of Change
Over the couple of months this project spanned, we experienced numerous changes in the ecosystem and marketplace. At the beginning of the project in January 2024, AI-powered transcription tools were not available within the Zoom app, so we opted to switch to Microsoft Teams for this capability. In the time it took to complete the interviews, Microsoft Teams improved its meeting recap interface, and Zoom’s version was introduced. We also saw the commercial availability of other new tools, like the Whisper speech-to-text (NLP Library) and many offerings utilizing it.
As we already said, Microsoft Teams got significantly better by the end of the project, with improvements to the meeting recaps including follow-up tasks and transcript time stamps linked to the video recording.
We also saw major improvements from ChatGPT, which advanced from version 3.5 to 4 to 4o during our project. We noticed that when we posed the same prompt to the different models, we got increasingly more sophisticated and accurate answers from the later versions. ChatGPT also released an update of their Team Edition, which we put to use as previously mentioned to synthesize all the interview data into one library.
Late in the process, we were also introduced to Fathom.ai through a recommendation from an associate. While this didn’t make it into the project, we have seen some surprisingly good meeting summarizations from their platform.