How RevenueCat handles errors in Google Play’s Billing Library
Lessons on Billing Library error handling from RevenueCat's engineering team
The Google Play’s Billing Library is a library provided by Google that facilitates communication between Android apps and the Google Play Store. This communication channel is essential for querying product information, purchasing in-app products, and managing subscriptions directly within the app.
RevenueCat offers a solution that encapsulates the Billing Library within its broader suite of tools. By wrapping the BillingClient, RevenueCat simplifies the integration process, allowing developers to manage in-app purchases in an easier and more efficient way.
RevenueCat enhances the functionality of the BillingClient by interfacing it with its backend systems. This integration provides developers with advanced features such as detailed analytics, subscription management, and cross-platform support. As a result, apps can offer a more personalized and user-friendly purchasing experience, while developers gain deeper insights into user behavior and revenue patterns.
The purpose of this blog post is to share our journey in addressing some of the nuanced challenges encountered with the Google Play Billing Library, including our initial approach to error handling, the resolution of certain issues that we’ve encountered along the years, and our strategies for handling specific error codes and connection disruptions. By sharing our experiences and the solutions we’ve implemented, we aim to provide guidance to developers facing similar challenges.
How the BillingClient works
When an app initiates a successful connection with the BillingClient, the app can start interacting using a series of functions, like launchBillingFlow
to start purchases, queryProductDetailsAsync
to get product details, or queryPurchasesAsync
to get current active purchases for a user.
After the connection is established, there’s a callback (onBillingSetupFinished
) that indicates if the connection was successful or if there was any issue. There is also a callback to indicate the connection can drop at any moment, where the billing client will call a function indicating it has disconnected (onBillingServiceDisconnected
).
Why Google’s suggested approach to handling errors wasn’t working
Before we landed on our current implementation, we were doing very basic error handling for the errors we were receiving from the BillingClient. We were trying to reconnect using an exponential backoff and checking if the client was connected before interacting with it (and, if not, we were reconnecting). We were not retrying to reconnect on disconnections except when trying to interact with the service and finding it disconnected.
At the time we implemented the BillingClient wrapper, the documentation was scarce and there was no clear guidance on how to handle errors and disconnections. Google provides a sample implementation that we’ve used in the past as a guide on how to correctly implement the Billing Library. But as you can see, it’s not a complete example.
We knew we needed to tackle better error handling, but what we didn’t know was that our backoff retries were completely broken. We got a report that indicated the BillingClient was retrying indefinitely.
We had added this retry with backoff mechanism because we had gotten some reports on ANRs, and the suggested Google solution was to retry to reconnect with a backoff. That’s what they do in their example, but it clearly doesn’t work.
We got more reports of the same issue, and after tons of debugging and help from customers, we realized that a way to replicate it was to change the language in the device while the service was connected. After doing that, the service disconnects and won’t ever reconnect until restarting the device or clearing Google Play caches.
Another issue we had at the same time was one that only occurred on Samsung devices, which were failing with a +999 intent connections issue. We have never been able to reproduce this issue, but we think it’s probably related to the infinite reconnections.
We knew we needed to do something about this ASAP, since we kept receiving reports. It’s pretty easy for a developer to run into this while testing localization for their app. We had been thinking for a while of adding a retry mechanism for specific errors, since some of the errors are recoverable.
How RevenueCat now handles errors (and details of our implementation)
We realized Google had updated their documentation on errors and released an extensive guide on how to handle them.
Based on the information of the guide, we defined different types of error handling mechanisms:
- Simple: Try the request again without any sophisticated handling.
- Exponential backoff: Delay the next attempt, with increasing intervals.
- Reconnect and retry: Attempt to re-establish a connection and then retry the request.
- Propagate error: Send the error up to be handled by the developer or inform the customer.
- Call and return error: Make a different function call and handle the error based on its response.
Here’s a summary of the guide:
Retryable errors
Error | Retry mechanism |
NETWORK_ERROR | Exponential backoff or simple retry (when users are in session). |
SERVICE_TIMEOUT (not returned in BC6) | Exponential backoff or simple retry (when users are in session). In BC6 SERVICE_UNAVAILABLE is returned. |
SERVICE_DISCONNECTED | Exponential backoff or simple retry (when users are in session) Re-establish connection using BillingClient.startConnection . |
SERVICE_UNAVAILABLE | Exponential backoff (when users not in session). For users in session, error out. |
BILLING_UNAVAILABLE | Manual retry after user addresses the issue. |
ERROR | Exponential backoff or simple retry (when users are in session). |
ITEM_ALREADY_OWNED | Simple retry logic after calling BillingClient.queryPurchasesAsync() . |
ITEM_NOT_OWNED | Simple retry logic after calling BillingClient.queryPurchasesAsync() . |
Non-retryable errors
Error | Retry mechanism |
FEATURE_NOT_SUPPORTED | Check feature support using BillingClient.isFeatureSupported() . |
USER_CANCELED | |
ITEM_UNAVAILABLE | Refresh product details using BillingClient.queryProductDetailsAsync() and check product eligibility configuration. |
DEVELOPER_ERROR | Ensure correct use of Play Billing Library calls and check debug message |
These are some details on the final implementation of our error handling:
BILLING_UNAVAILABLE issue
We ended up retrying connectivity if the BillingClient is not connected when trying to interact with it. We don’t reconnect when we receive a disconnection callback. The reason behind this is that we found some inconsistencies under certain circumstances where the BillingClient would keep calling the disconnected callback constantly, even if we didn’t try to start the connection again. We were trying to fix this issue and realized the results returned by Google were super inconsistent.
If you change the language, the BillingClient can’t connect again, most likely due to an issue with the caches. When the device is in this weird state, when calling startConnection
, the onBillingSetupFinished
gets called multiple times with either a BILLING_UNAVAILABLE
error or an OK
. I’ve seen it being called once per thread in the BillingClient thread pool (nine times IIRC). It will also call onBillingServiceDisconnected
. The multiple calls to these listeners are occurring with just one single call to startConnection
.
According to the errors documentation, we should be retrying to reconnect when onBillingServiceDisconnected
gets called, with a maximum number of attempts. The main problem is that when there’s a retry mechanism in your code, this strange behavior makes it very easy to end up in a reconnection retry loop. BillingClient will call onBillingSetupFinished
with an OK, which makes the code think there’s a result, then instantly call a onBillingServiceDisconnected
. Sometimes it would also send BILLING_UNAVAILABLE
as an error code, several times.
We ended up removing the retries off onBillingServiceDisconnected
because those were problematic in this weird scenario. We also removed the retries when we get a BILLING_UNAVAILABLE
on onBillingSetupFinished
.
If Google fixes the issue and calls onBillingSetupFinished with BILLING_UNAVAILABLE
just once and that’s it, it would be easier to determine not to retry. Apart from that, the service is not getting disconnected, it’s already disconnected and it shouldn’t be calling onBillingServiceDisconnected
.
To improve the error message in that case, we literally compare the message we get from Google to a static string. The message says “Google Play In-app Billing API version is less than 3
”, which is super useful if you take into account that Version 3 is from 2012, the year they added in-app purchases.
Also calling endConnection
doesn’t prevent from getting more onBillingServiceDisconnected
. The only way to stop the service from calling that callback is quitting the app.
Retry with backoff
We added a retry with backoff only for NETWORK_ERROR
and ERROR
that happen when acknowledging and consuming purchases, but only if those purchases acknowledging/consumption is not triggered when restoring or purchasing, which involve user interaction. We only retry with backoff when getting the current active purchases when the app starts.
We also retry with backoff if the app is in background and we get a SERVICE_UNAVAILABLE
error.
We have a max of 15 minutes for the backoff, we won’t go over that. Most app sessions don’t last that long anyways.
Simple retries
We do simple retries (max of three) for NETWORK_ERROR
and ERROR
, when getting product details, launching the billing flow and restoring. And also for consumption and acknowledgement when triggered from a purchase or a restore.
Non-retryable use cases
For BILLING_UNAVAILABLE
, FEATURE_NOT_SUPPORTED
, USER_CANCELED
, DEVELOPER_ERROR
, ITEM_UNAVAILABLE
we simply don’t retry and just send an error to the client.
Caching issues
It’s possible to get ITEM_ALREADY_OWNED
when launching a billing flow, or ITEM_NOT_OWNED
when consuming or acknowledging a purchase. In these cases, the recommendation is to call queryPurchases again, to refetch the purchases and confirm that the user owns it or not.
We have not implemented this yet.
Ensuring one response
The BillingClient sometimes responds more than once after a call, for example you might call queryProductDetails
and get two OK
responses.
1val hasResponded = AtomicBoolean(false)
2billingClient.queryProductDetailsAsync(params) { billingResult, productDetailsList ->
3 if (hasResponded.getAndSet(true)) {
4 logError()
5 return@queryProductDetailsAsync
6 }
7 listener.onProductDetailsResponse(billingResult, productDetailsList)
8}
9
+999 intent connections issue
We haven’t been able to reproduce this issue and hope the retry mechanisms we implemented fix it, but just in case we catch the IllegalStateException
when calling startConnection
and return a nicer error message to the client. The error message states This has been reported to occur on Samsung devices on unknown circumstances
.
Conclusion (and error-response reference)
Then we tried to determine what to do for each of the functions we use. The following list shows you the different functions and possible errors the BillingClient can return, and how you should react to each of them. We hope that these recommendations will prove useful to you should you encounter the same errors!
Alternatively, you can just use RevenueCat and we’ll handle all of the errors for you 😄
- For NETWORK_ERROR, or ERROR, a simple retry should be done if the user is in session, or a retry with exponential backoff if the user is not in session. To get more into detail, calls to queryProductDetailsAsync, launchBillingFlow, queryPurchaseHistoryAsync should retry simple. Calls to queryPurchasesAsync and showInAppMessages should retry with exponential backoff because are initiated without user interaction, and calls to consumeAsync and acknowledgePurchase should retry simply if they are performed after a purchase or a restore, otherwise they should retry with backoff.
- For connection to the billing service issues (SERVICE_DISCONNECTED), the general advice is to reconnect and retry the request.
- Billing issues (BILLING_UNAVAILABLE, ITEM_UNAVAILABLE), unsupported features (FEATURE_NOT_SUPPORTED), user cancellation (USER_CANCELED), and developer errors (DEVELOPER_ERROR) should propagate the error.
- When the item is already owned (ITEM_ALREADY_OWNED), you should query the purchase again and handle accordingly. This can only happen when starting a product purchase.
- If an item is not owned (ITEM_NOT_OWNED) but a function is called as if it was, a check for the purchase is recommended before returning an error. This error can occur when starting, consuming or acknowledging a purchase.
- If the service is unavailable (SERVICE_UNAVAILABLE), back off if the user is not in an active session, or return an error if they are. We determine if the user is in an active session by checking if the app is in background or not.
You might also like
- Blog post
How we built the RevenueCat SDK for Kotlin Multiplatform
Explore the architecture and key decisions behind building the RevenueCat Kotlin Multiplatform SDK, designed to streamline in-app purchases across platforms.
- Blog post
Inside RevenueCat’s engineering strategy: Scaling beyond 32,000+ apps
The strategies and principles that guide our global team to build reliable, developer-loved software
- Blog post
RevenueCat Ship-a-ton
The hackathon that’s all about shipping… a ton.