The “Reply Already Submitted” Crash
A look inside how the SDK team at RevenueCat works
Two of my favorite things about working at RevenueCat are the intelligence of my team members and the collaborative environment that allows me to soak some of that intelligence up. A recent customer crash report created the perfect stage for those two aspects to shine. Read on for a look inside how the SDK team at RevenueCat operates and some creative debugging tips I learned from my teammates…
Information gathering
The journey begins with a Github issue and, of course, the subsequent dive into a Stack Overflow-post-rabbithole. You can check out the issue here (spoiler alert on the resolution!), but the gist is that a Flutter user reported an IllegalStateException crash with the message “Reply already submitted”. The error message appeared to be coming from Flutter, with a stack trace revealing the source as one of our most heavily-used SDK methods. Uh-oh.
I first went to our good old friend, Stackoverflow. It seemed that “Reply already submitted” was a pretty pervasive error message with a rather elusive cause. One thing that was clear is that the crash can occur when a Result, a callback object we use to communicate between native and dart code, is invoked twice for any given function. Diving into both our native Android and our Flutter SDKs, I couldn’t find any double Result call. I also tried reproducing the crash by calling the same methods as our customer, but no dice.
I needed more information, so I turned to a new tool from Google for answers. We had recently joined a beta for the Play SDK Console: a dashboard for SDK developers to view adoption rates and crash reports and to mark SDK versions as unsupported/buggy.
The SDK console provided some really useful information. Firstly, I noticed a similar crash from another method, which linked a couple of related customer reports (and raised some questions about the culprit). Next, I saw via crash rankings that these combined to be the second-most frequent crash. With a new sense of urgency, I sought more details. Filtering the crash reports by SDK version, I saw that the version they surfaced in corresponded with our upgrade to Google’s in-app purchasing library, BillingClient.
Nothing in BillingClient’s documentation indicated an issue with the underlying functions, and we hadn’t changed our handling of the result much in that release. At this point, I felt a little burnt out and wasn’t sure where to turn next.
Asking for help
Our team’s culture is one where I’ve learned to ask for help early. We have such a variety of backgrounds between the four of us which translate into unique problem-solving approaches. Even if someone can’t help me solve the issue, I’ll often get tips for what to try next.
This time, my teammate Cesar hopped into the ring with me, suggesting something that has become a core debugging step for me: manually replicating the stacktrace via one of our sample apps.
Instead of trying an existing flow in the app to cause the crash, this approach introduces new code to produce the same stacktrace. This strategy can offer a more concrete way of pinpointing the location of the issue. Determining the cause of a crash can be especially tricky for SDK developers, considering the issue could be in a number of places: the app developer’s code, our SDK code (which, for hybrids, involves a native SDK, a hybrid “bridge” and the hybrid SDK itself), the BillingClient/AppStore, or, even more confoundingly, an issue with the customer’s store configuration.
Replicating the stacktrace
The offending code was our getOfferings method, which calls into the BillingClient’s querySkuDetailsAsync. We handle the BillingClient’s response in our native Android SDK and pass it back to Flutter where the developer handles either a Result.onError or Result.onSuccess.
We confirmed that we could duplicate the IllegalStateException by directly calling into the Result twice from the Flutter code. We also saw that it could be two successes, two errors, or one of each, ruling out a specific pattern of Results. But since we were pretty positive our code wasn’t calling Result twice and had a hunch that the BillingClient was involved, we experimented with other potential recreations of the stacktrace.
Cesar’s creativity came to the rescue when he pondered whether a FlutterPlatformException would bubble up to the Result as an error, triggering the first “reply”. If we then somehow received another completion, that would set off our infamous “Reply Already Submitted” IllegalStateException. We tested this behavior and it generated the same stacktrace. Eureka!
Reproducing the crash
Now that we understood the probable pattern of events causing the crash, we could flip back to trying to reproduce it. Our educated guess was that the BillingClient threw an exception, so we now faced another obstacle in reproducing the crash: how could we force an exception from a third-party library? In one of our original attempts to reproduce, Cesar remembered seeing a TimeoutException, which typically would’ve been the result of a poor network connection. The log unfortunately was quickly lost, and we weren’t sure it was relevant…but now we had an idea of where to start.
The next few ingenious steps earned Cesar the debugging crown, in my opinion. His process went something like this:
- Try using airplane mode to force a timeout…no luck. However, products still returned from BillingClient, indicating some form of caching by Google Play. Interesting…let’s keep that in mind.
- Try slowing down the internet connection on an emulator to force a timeout…aaaaand nope 🙁
- Try the above steps in a new emulator with a new Play account to avoid any cached responses, and in case this scenario only happens for certain account states. Again, no timeout.
Asking for help, part 2
At this point we had spent a day and a half on the issue, and it seemed it was time again to ask for advice from the team. Our more Apple-savvy teammates Josh and Andy suggested the Network Link Conditioner, a handy tool built into Mac to allow for more fine-tuned controlling of a network connection.
…Actually reproducing the crash
With this new tool in our arsenal, we got back to forcing a timeout from the BIllingClient. All we had to do was set the NLC to 100% packet loss, right? Sound simple? Not quite…
Each time we tested, we needed to:
- Wipe all data on an emulator to clear all Play Store data (remember how Play Store caches products?)
- Set NLC back to full speed from the last round (I felt like I was absolutely losing it when I missed this step)
- Since step A) logged us out, log back into the Play Store
- Set a breakpoint after the BillingClient has successfully established a connection but before we call the method to fetch products
- Run the app in debugging mode
- Set NLC to drop 100% packets when the breakpoint was hit
- Observe the timeout
You can imagine the number of exasperated sighs and commiserating Slack messages when we forgot one of these steps and had to start fresh. Again, I was grateful for the camaraderie of my team.
To track the thread in which calls were happening on and what error codes we received from the BillingClient, we added some logging:
1val random = Random.nextInt()
2withConnectedClient { Log.d("[Purchases] - WEIRD", "querySkuDetailsAsync $random") querySkuDetailsAsync(params) { billingResult, skuDetailsList -> Log.d("[Purchases] - WEIRD", "querySkuDetailsAsync back $random ${billingResult.responseCode}") } }
And, lo and behold, the logs revealed that a timeout from the BillingClient would invoke the callback twice: once for the timeout and once with an error.
12021-09-30 11:23:06.994 8277-8277/com.revenuecat.purchases_sample D/[Purchases] - WEIRD: querySkuDetailsAsync -946844681
2
32021-09-30 11:23:27.013 8277-8366/com.revenuecat.purchases_sample D/[Purchases] - WEIRD: querySkuDetailsAsync back -946844681 6
4
52021-09-30 11:23:35.512 8277-8277/com.revenuecat.purchases_sample D/[Purchases] - WEIRD: querySkuDetailsAsync back -946844681 -3
We had done it. We had reconstructed the cagey crash, exposed the elusive exception. Alliteration aside, it was a pretty triumphant feeling. Our whole team had come together to address a major customer issue, combining our individual expertise into a powerful debugging force.
Fixing the issue
After celebrating (with some Slack high fives, a company callout, and for me, a Starbucks latte), Cesar and I pair-programmed on a “solution”. We couldn’t stop the BillingClient from calling the completion block twice, but we could avoid the crash to provide a better end-user experience. Our solution added a flag after the first BillingClient completion came through, ensuring that our completion block (the one consumed by our Flutter end users), would never be called more than once. Writing the unit tests for this tricky edge case would require a whole second blog post…
We opted for this quicker, on-location fix to stop crashes and created a ticket with a longer-term vision for preventing and identifying similar crashes in the future. We also opened a bug report with Google. And finally, the part I find most rewarding, we let the customer know we had a fix. Once that fix was incorporated, the SDK console provided reassurance that it worked.
Applying all the learnings
The “Reply Already Submitted” journey provided me with some valuable new tools for debugging:
- Google’s SDK Console
- Asking for help early
- Replicating the stack trace
- Network Link Conditioner
While I have begun utilizing some of these daily, it was the less tangible learnings that stuck with me in a larger way. I was reminded that we all end up moving faster and learning more by asking questions, and that everyone gets stuck somewhere. I was also reminded of the importance of diversity in backgrounds and thought, as each of us brought something different to the problem-solving process. But none of that matters without a collaborative and kind team.
Before joining RevenueCat, I was concerned that I wouldn’t be able to learn through pair programming because of the remote setup of the company, but what I found was quite the opposite. Our team is incredibly collaborative, casual, and there have been a lot of opportunities to look over someone’s shoulder (albeit digitally) to learn from one another.
I’m impressed every day by my coworkers’ intelligence, but I’m absolutely floored by their willingness to help each other grow, learn, or simply feel less alone in a frustrating problem. The fateful “Reply Already Submitted” crash was just one example of many that made me feel extremely grateful to be a ‘Cat.
You might also like
- Blog post
How we built the RevenueCat SDK for Kotlin Multiplatform
Explore the architecture and key decisions behind building the RevenueCat Kotlin Multiplatform SDK, designed to streamline in-app purchases across platforms.
- Blog post
Inside RevenueCat’s engineering strategy: Scaling beyond 32,000+ apps
The strategies and principles that guide our global team to build reliable, developer-loved software
- Blog post
RevenueCat Ship-a-ton
The hackathon that’s all about shipping… a ton.