From review to roadmap: How mining reviews can inform your subscription app’s features
Proven techniques for turning scattered reviews into future improvements

Summary
App store reviews provide unstructured, high-signal customer feedback for improving features, copy, onboarding, and retention. Effective review mining separates ratings from sentiment, segments by platform/version, and accounts for recency. Approaches include direct LLM prompting, LLM x NLP workflows, off-the-shelf tools, custom models, or in-house data science. Apply insights via prioritization and experiments, then additional voice of customer research for deeper takeaways.
Sometimes, I steal from customers.
Wait — before you judge me, I’ll caveat: it’s not customer data or money. I steal their words (and it’s for their own good).
Why? Because buried in app store reviews are the raw, emotional truths your team needs to build better features, write better copy, and stop guessing what users actually want.
When I first confessed this, I felt pretty proud of my rebellious revelation. Especially when others fed back that they’d become fellow customer-centric thieves, eagerly using customers’ words for good.
That was until I met the real Robin Hood, Juliana Jackson. As Cloud Director Data Science at Jellyfish and writer of the newsletter Beyond the Mean , Juliana has taken those same insights from app reviews and turned them into strategic gold — helping major apps make sense of customer feedback.
After learning this at a recent conference, I immediately badgered her into chatting with me about all things review mining.. She graciously let me ask some (very nerdy) questions, which turned into this guide.
So here it is, your roadmap to ‘stealing’ words from your customers and putting them to powerful use. Huge thanks to Juliana Jackson for generously sharing her approach and making it all feel so doable.
Why steal from app reviews?
We begin with the most important question: Why? Why invest time in analyzing your app reviews?
Juliana broke it down like this: Any kind of unstructured data is valuable because it offers a different lens through which to view your product.

These are things that happen outside your control; raw, emotional signals that help you understand how your app subscribers feel. And those are the kinds of insights that everyone from developers to designers can connect with — and no one can argue against their importance.
It shifts you from pushing your own agenda to reflecting theirs.
Another reason I’ve become such a big fan of reviews (and a self-declared professional thief) is this: even if you don’t uncover a single insight about features or bugs, you’ll absolutely gain something else — real, honest customer language. That alone can unlock huge improvements across the entire journey, from acquisition through to retention.
Case study: The impact of review mining
In case the idea of going through all your reviews is off-putting (understandable), let’s walk through an example of the impact first.
Juliana worked with a hugely popular food and beverage app in Europe that had over 500k reviews. We can’t share who they are, but we guarantee most of you have enjoyed their food at one point or another.
The goal was better understanding of customer behavior to drive app improvements. Simple enough, right?
Not quite. Early analysis revealed a surprising pattern: ratings and reviews often didn’t align. For example, a user might leave a five-star rating, followed by a rant about how difficult the app was to use. This made it difficult to use ratings as a guide for feedback.
So as part of their data processing, Juliana and her team separated ratings from reviews to focus on the actual sentiment behind the words, rather than blindly trusting the star count. They then went deeper, segmenting the data by platform (Android vs. iOS) and by app version.
Juliana also shared that app stores show an app’s aggregate rating, but the underlying algorithms heavily weight recent reviews and ratings because they provide the most accurate and relevant picture of an app’s current quality, stability, and user experience. This helps users make informed decisions based on the app’s most up-to-date state.
However, it’s a nuanced approach: while prioritizing recency provides essential, up-to-the-minute feedback, and encourages developers to constantly improve, it can also inadvertently obscure or diminish the visibility of long-standing or recurring issues that might have been highlighted in older reviews.
This analysis surfaced a range of hypotheses to improve the user experience. It also uncovered distinct user behaviors and unmet needs. For example, Android users were more focused on practicality and convenience, while iOS users cared more about premium design. Meaning Android users and iOS users had a completely different user journey, different issues, and different needs. These insights led to personalized onboarding flows and targeted app optimizations for each version.
The team also discovered widespread issues with login and registration, e.g. specific screen dimensions weren’t performing as well as others. Once addressed, the improvements weren’t just reflected in usage metrics; they showed up in the tone of the reviews themselves.
The research proved invaluable and was one of Juliana’s favorite projects to work on. Much of the product backlog and many of the experience improvements came directly from this review analysis work.
When is it time to implement an app store review process?
In the early days, when reviews were few and far between, it was usually customer care or even the founder scanning through them manually, says Juliana. It’s not the most efficient process, but with low volume, it works.
Then at a certain point, doing it manually just doesn’t cut it. You start missing patterns, trends, and actionable insights. Maybe you get lazy as a result, not reading every review as diligently as you used to before. That’s when AI and a more structured process become essential.
I’ve always struggled to pinpoint that tipping point — aka when is it actually worth getting systematic? I braced for the classic ‘it depends’ answer when I asked Juliana, but instead she gave two clear signals that it’s time to level up your review analysis:
- Review quantity: When your total number of reviews hits 5k+
- Review velocity: When you’re receiving 100+ new reviews per month
Of course, review frequency varies wildly between apps. Food and beverage apps, for example, may see spikes from specific promotions or seasonal deals. Running or training apps might experience bursts around New Year’s or summertime. So instead of looking at your peak volume, focus on your average to assess whether you’re ready to introduce a model.
If you’re seeing high volume and consistent velocity, that’s your cue — it’s time to take a more structured approach.
Before you start review mining
It’s tempting to dive straight into scouring your reviews. I’ll admit even as I was writing, I had to double back and remind myself of this step.
Review mining without a clear purpose is a waste of time. As Juliana puts it: “You just end up with a really expensive dashboard”. Before you dig in, you need to define your purpose:
- What are you trying to learn?
- What decisions do you hope to inform?
Juliana emphasized starting with the product roadmap and understanding what stakeholders value — and why. She interviews each stakeholder individually to learn what they’re hoping to uncover, before doing any analysis.
For example:
- Are you trying to improve retention with the analysis?
- Are you trying to reduce bugs and functional issues?
- Are you trying to understand which features people value vs. which they don’t?
You need to know the answers to these questions to set up the right prompts and pull out the most relevant data. Think of it as doing user research, except your users are your stakeholders.
Juliana also uses this knowledge later on in the process to anchor insights to specific goals and clearly communicate why something should feed into the next release — meaning she’s able to drive far more impact.
This approach helps avoid getting lost in the inevitable noise that comes with analyzing large volumes of unstructured data. Instead, focus on answering the right questions and identifying the most valuable opportunities.
So, before you even open your review dashboard or start pulling insights, take a step back. Decide the core questions you want to answer, and make sure everyone is aligned on why you’re doing this.
5 Techniques for review mining on app stores
I’m an old-school thief (think handmade lock picks) and I love manually combing through reviews and immersing myself in the words. I’ve often manually tagged reviews to get me thinking about what’s being said and the why behind it. Like I said, old-school. Whilst I do believe there’s a lot of value in getting into the head of your customer, it becomes very exhausting and time-consuming after a few hundred reviews.
Lucky for us, Juliana is a pro: she doesn’t have just a more efficient and scalable approach to review mining — she has five structured options to choose from, depending on your needs and resources.
We’ll dive into all five in more detail, but here’s a quick overview and how to choose what approach is best for you.
Approach | Best for | Pros | Cons | Who does the work | When to use |
1. Direct LLM prompting | Very small datasets and early-stage teams | Fast, cheap, zero setup Good for directional insights | Prone to hallucination, limited to small data Privacy risks if PII is included Requires repeat analysis | You (manual prompts) | You’re just starting and have <5,000 reviews |
2. LLM-assisted, structured analysis | Lean teams with mid-size data and light technical skills | More structure supports larger datasets Allows time-based comparisonStill affordable | Still not statistically validated Requires more technical knowledge Still limited scalability | You (potentially with some technical support) | You want more control than direct LLM prompting, but have limited resources to invest in NLP tools or custom models/data scientists |
3. NLP tools | Mid-stage apps with growing complexity | Professionally built models often with dashboards and easier to scale | Tool selection matters May be costly Requires careful evaluation Are out of the box so no customization included | The tool (off-the-shelf) | You have thousands of reviews and need regular analysis and reporting, but miss the expertise in house |
4. Custom model (expert-built) | Complex apps or those needing custom logic or predictive power | Tailored to your app, user journeys, and segments; Enables deeper insights | Requires budget and trusted partner Slower to launch | Freelance data scientist or agency | When you want segmentation, predictive modelling, or advanced signal filtering |
5. In-house data scientist | Large-scale apps or data-driven orgs | Full ownership Integrates easier into sprint cycles Drives deep insights across teams | Higher cost, only worth it if ROI is clear | Full-time hire | You have high DAU/WAU, complex data, and need continuous insight generation |
You might start with the simplest option and scale up as you start to see the value of analyzing your reviews, but generally the larger your app and the more complex your customer journey is, the more likely you’ll need one of the latter options.
1. Direct LLM prompting
If you’re working with a smaller dataset, this is likely the easiest and most accessible option. It’s inexpensive and, with a bit of prompting knowledge, totally manageable. You simply feed the app reviews into an LLM and ask questions like this:
- What are the main themes we’re seeing?
- What features are customers talking about most?
But don’t get too carried away with the simplicity. Data protection still matters — yes, even Robin Hood has a moral code. Juliana stresses the importance of stripping out any personal information, like names or addresses, before feeding anything into AI tools. Be conscious of legal legislations around AI like the EU AI Act.
That part’s relatively easy: remove the personal data. What’s trickier with this first option is what many of us have encountered: the dreaded LLM hallucination. Especially with large volumes of text; models can get lazy, trail off, or stop before fully analyzing the dataset.
Which brings us to the next approach — ideal if you’re working with a larger dataset and want a bit more structure.
2. LLM-assisted, structured analysis
For leaner teams or those with larger datasets, Juliana recommends another scrappy but effective method:
High-level workflow:
- Data acquisition (Google Sheets):
- Customer reviews or other textual data are systematically organized within a Google Sheet. Each distinct piece of text to be analyzed resides in a dedicated cell or row.
- Workflow automation (Google Apps Script):
- Google Apps Script, a serverless JavaScript platform, is deployed to manage the interaction. This script is engineered to:
- Read the relevant text data from the specified columns within your Google Sheet
- Construct and dispatch requests to the Google Cloud Natural Language API
- Google Apps Script, a serverless JavaScript platform, is deployed to manage the interaction. This script is engineered to:
(Here is where you can use Claude or any coding copilot to help you write the code that runs the script.)
- Intelligent processing (Google Cloud Natural Language API):
- The Google Cloud Natural Language API receives the text data. This pre-trained machine learning service then applies its algorithms to:
- Determine sentiment: It assigns a score (typically between -1.0 for highly negative and +1.0 for highly positive) and a magnitude (indicating the strength of the emotion, irrespective of its polarity) for the overall document sentiment
- (The API can also perform other analyses like entity extraction or content classification, if configured)
- The Google Cloud Natural Language API receives the text data. This pre-trained machine learning service then applies its algorithms to:
- Insight integration (Google Sheets):
- Upon receiving the structured analysis results from the Google Cloud Natural Language API, the Google Apps Script dynamically writes these sentiment scores and magnitudes back into corresponding columns within your original Google Sheet, providing immediate visibility
Prerequisites and setup notes:
- Google Cloud project and billing: Accessing the Google Cloud Natural Language API requires an active Google Cloud project with a linked billing account. While the Natural Language API offers a significant free tier for initial usage, a billing account is mandatory for API access.
- API key/service account: Authentication for the Natural Language API typically uses either an API key (simpler for quick setups) or, for production environments, a service account. A service account provides secure, programmatic access for your Apps Script project to interact with Google Cloud services, following the principle of least privilege. Permissions for the Natural Language API must be explicitly granted to this account.
For more on how to do this (as for us non-techies it can be a bit technical), Juliana has a whole free walkthrough on YouTube: How to Use Google NLP with GSheets and GPT-3 to Analyze Sentiment.
This hybrid approach gives you more structure than raw prompting and allows you to handle a larger dataset. It also helps you compare reviews over time and generate more consistent outputs.
That said, it’s still not perfect. You can’t fully trust or depend on the insights to be statistically robust. But it’s a solid starting point if you want to move beyond manual scanning without diving headfirst into full-scale tooling.
📌 Note
Whilst Juliana doesn’t recommend LLMs as a long-term approach, it’s a good way to build a proof of concept and test how it works before moving on to a permanent solution.
3. Natural language processing (NLP) tools
NLP tools take care of the heavy lifting when it comes to processing reviews and building structured models. Unlike out-of-the-box LLMs, the models behind these tools are typically built by pros and designed to be statistically sound.
There are plenty of NLP tools out there, and more are emerging every week. Juliana has used a few solid ones herself, but don’t feel limited to her list:
“Brandwatch, Sprinklr, Chattermill, or Qualtrics XM are out-of-the-box tools that work here. I’ll add the caveat that they are out of the box, so they won’t be customized to the business — generally that’s okay until the business gets to the point of needing a lot of customization.”
To ensure you make the right decision regarding investing in an NLP tool, you can use Juliana’s free NLP Tool Evaluation Checklist to choose the right one. It includes the 10 key criteria Juliana looks for, plus red flags to watch out for:
- Type of nlp capability
- Model transparency & methodology
- Validation & performance metrics
- Taxonomy fit & customizability
- Context & language nuance handling
- Multilingual support
- Bot/spam/noise filtering
- Fine-tuning & feedback loop
- Data privacy & regulatory compliance
- Interpretability & decision support
Juliana’s biggest thing to watch for? Don’t blindly trust raw outputs. Look for tools that validate their accuracy using metrics like F1 score, precision, and recall.
For simple apps, options one or two (LLM-based approaches) may be perfectly fine. Take something like Waterllama, a hydration tracker. Even if it has 100k+ reviews, the app’s use case is straightforward; there aren’t too many user journeys or layers of complexity.
Now contrast that with IKEA’s app, which, fittingly for Juliana (who lives in Sweden), was top of mind. The complexity is on another level:
- There’s an offline/online hybrid experience
- Hundreds of distinct products, each with their own reviews
- Global reach with geographic nuances
- The app acts more like a discovery tool than a standalone product
In that case, a one-size-fits-all tool won’t cut it. You need a solution designed specifically around your business model and data landscape.
4. Custom model (expert-built)
This is where you bring in a freelancer or agency to build a tailored model based on your customer journeys, segments, and product setup. If you’re looking for more predictive power, or want to segment sentiment by cohort, this is the route to go.
It also ensures you’re working with someone who can assess statistical significance, so you’re not mistaking noise for a real signal.
5. In-house data scientist
Thanks to the rise of AI, Juliana says data scientists are officially back. You’ll likely hit a point where you want to do more advanced things: segmenting feedback, predicting churn risk, and identifying drivers of high-value users.
That’s when hiring a dedicated data scientist makes sense, especially if you have a large base of daily or weekly active users. It’s really a question of ROI: will the insights justify the investment?
Your team setup matters, too. If someone on your team has foundational skills but needs support to scale, bringing in a data scientist can level up the whole operation.
Applying insights from app reviews and using them to improve your app
If you started with clear questions and goals, you should now be uncovering rich insights tied to your area of focus — whether that’s onboarding, retention, or something else. From here, use your usual prioritization framework to decide what to tackle first. That likely means evaluating:
- The number of users impacted
- The effort required to solve the issue
- The potential ROI
Don’t just treat reviews as a source for bug fixes — they can spark smart A/B test hypotheses. If multiple users mention a confusing button or a missing feature, that’s a signal worth validating through experimentation.
Feeling overwhelmed by the flood of insights? Zoom in on core functionality: what’s going to help users better accomplish their job to be done? A great way to cut through the noise is to focus on emotion. Strong emotional reactions — frustration, confusion, relief — often point to moments that matter. Emotions teach you so much about what drives long-term loyalty.
If you’re seeing conflicting insights, go deeper. How many users are saying this? Are there differences in who is saying it, how long they’ve been a customer, where they came from, or what app version they’re using?
One of the standout things Juliana shared was the importance of combining qualitative and quantitative insights.
“We can gain a much deeper understanding of our customers by enriching our first-party behavioral data with high-quality qualitative data from customer voice and search intent analysis. This process of data enrichment fills the contextual gaps, providing a fuller, more accurate story of how people are consuming our product.”
Finally, Juliana recommends baking review insights into your product sprint rituals. Make it a habit to revisit this feedback every sprint, not as a one-off project, but as part of how your team continuously improves.
Other data sources to consider for customer feedback
While we’ve focused on app reviews, they’re really just the tip of the iceberg. The same principles of review mining can, and should, be applied to all forms of unstructured data, including:
- User interviews
- Search queries
- Customer care conversations
- Social media comments
- Reddit threads
You’ll notice most of these don’t happen inside your app. That’s because only 5% of the customer journey occurs there. If you’re trying to better understand your users or improve acquisition, you need to zoom out. Your app data gives you one piece of the puzzle, but the rest is happening across the web.
Search queries are a goldmine
Juliana emphasized the value of search queries, both inside and outside your app. Whether it’s what users type in your app’s search bar, what they’re Googling before downloading, or what they ask in your help center, these short phrases are pure intent. You’re seeing what people expect your app to deliver before they even interact with it.
A great tool to understand this that Juliana recommends is AlsoAsked. Imagine you have a pilates workout app. If you type this into AlsoAsked you can learn a wealth of insights on concerns and questions that you could consider addressing in your onboarding:

Results in the UK from AlsoAsked for the term “Pilates workout app”
Pair that with review analysis, and suddenly you have a clear picture of the ‘jobs’ your product is being hired to do.
Consider competitor reviews [with caution]
For smaller apps with less data, you can look at competitor reviews, alongside benchmarks like SensorTower, to see how you’re performing. Just be mindful of data privacy here (as Juliana diligently reminded me): it isn’t your data to use.
If you use a tool like Appfigures for competitive analysis, you’re on safer ground since they act as the data controller, not you.
App reviews are just the beginning. The same approach applies to call transcripts, Reddit threads, support tickets, and even in-app search data. Wherever your customers are talking, they’re giving you clues — if you’re ready to listen. Having a robust and scalable approach to analysing unstructured data is a competitive advantage for subscription apps.
7 Steps to review mining
To recap, here’s how to begin:
- Define your goals: What do you want to learn?
- Assess your volume: Got >5k total reviews and 100+ new ones a month? It’s time to automate
- Choose your approach: Manual deep-dive? LLMs? NLP tools? Expert support?
- Scrub your data: Always remove personal info before analysis
- Analyze and prioritize: Use your existing frameworks to filter signals from noise
- Connect and act: Integrate the insights into your product roadmap
- Track change over time: Don’t just analyze once — make it a habit
Juliana was crystal clear on one thing: don’t treat this like a side quest. Review analysis only drives real impact when it’s embedded into your product cycle — visible in sprint planning, roadmap decisions, and team conversations.
One of her favorite ways to drive action? Cross-functional workshops. Bring product, marketing, and CX into the same room to look at key insights together. It creates alignment and momentum that a lonely dashboard never could.
Review mining isn’t just user research. It’s a way to ‘steal’ insights directly from your users and use them to build better, smarter experiences.
Whatever approach you take, the priority is the same:
Objectively listen. Learn what your users need, then make something better because of it — don’t just use reviews for social proof, use them for good.
Disclaimer: This article shares general guidance on analysing reviews and other sources of unstructured data based on personal experience and an expert interview. Always consult your legal and data privacy team before processing customer data or using third-party tools.
About Juliana Jackson: Juliana is an enterprise strategist specializing in AI, product, digital experience, and organizational clarity. Cloud Director Data Science at Jellyfish, (previously the Associate Director of Data & Digital Experience at Monks), she’s known for saying what others won’t. Her straight talk and relentless focus on the end customer are what drive real impact, sharing weekly advice on AI and data strategy in her newsletter Beyond the Mean.
You might also like
- Blog post
How to hack your app store ratings
Hack your app store ratings by capturing happy users at the right moment.
- Blog post
Why ratings and reviews in ASO are more important than ever
Redbox Mobile's Anil Ozdemir on how recent changes in the app stores mean that it's time to have a dedicated approach to boosting ratings & reviews.
- Blog post
Net Promoter Score (NPS) for subscription apps: What you need to know
How to use customer feedback from NPS surveys to drive growth