By R. Paul Singh
There are various reasons to adopt REST as the underlying layer and build RESTful APIs for data access. REST is enjoying widespread and rapid adoption for SOA database connectivity, including business-to-business, mobile access and systems integration.
While connectivity is the obvious focus for building a RESTful API, it’s not enough. It’s critical to address factors like API usability, app server functions and performance.
This article dives into each of these factors and offers practical advice on what developers should watch out for while building these APIs.
The most apparent element of the RESTful API is the shape of the data it returns, how much data is returned, and how to discover relationships. So the three key areas to watch for are:
- Document-oriented API
If you’re a database programmer, it’s easy to see how to map database CRUD functions onto a RESTful API: Just create an endpoint for each base/view table, map CRUD functions to REST verbs, provide arguments for filter and sort, and so forth.
The trouble is, that’s not likely to satisfy mobile-app developers, who are looking for a document-oriented API, that is, lists of nested/related data (e.g. customer rows include a list of Orders), because:
- It reduces latency, since it requires only one trip to the server (versus three)
Here is an example:
Moreover, such document-oriented results are business objects, and so not only address mobile, but also SOA Integration functions. Exchanging data between systems is rarely a set of homogeneous rows—these are transactions, consisting of a set of related rows.
Large result sets can have a devastating effect on the client, network, server and database. Pagination is required to truncate large results, with provisions to retrieve remaining results (e.g. when/if the end user scrolls).
This can be a complex problem. Consider a Customer that has Orders and Items. If there are many Orders, pagination must occur at this level, with provisions for including the Line Items on subsequent pagination requests.
A key principle of REST is that software should be able to discover relationships between endpoints, directly from the RESTful response. A typical approach is to provide an array of Link objects, like this:
App Server Functions: Logic, Security and Performance
The usability issues above are the most apparent, since they involve the basic shape of the API. More subtle—and perhaps more critical—are the API semantics. Make no mistake about it; a RESTful API is essentially an app server and bears responsibility for ensuring:
- Integrity logic of the data
- Security of endpoint and data access
This has always been important.
Integrity Logic of the Data
A significant part of most database applications is the logic required to process update transactions. This logic enforces our business policy (e.g. credit limit checks, structural constraints) and so forth.
It has been a long-accepted practice that such logic be factored out of client applications so that the logic can be centralized and shared, even for non-interactive applications. The well-known anti-pattern of “fat client” refers to inappropriate business logic buried in client code. This quickly leads to inconsistencies between applications and excessive network communication.
As important as it is in client server and Web apps, factoring out logic is critical for mobile apps and for integration: It is simply unthinkable for a mobile phone to enforce business policy, or for a partner’s computer to enforce your logic. It is therefore critical that your API enforce logic; an SQL pass-through API is simply not a viable option.
Security of Endpoint and Data Access
The same considerations apply to securing the data. The most basic security is authentication: Is John allowed access to this application? In most cases, this will require interfacing with corporate security systems (e.g. Active Directory, LDAP, OAUTH, etc.).
But that’s not sufficient. The system must also address what data John is allowed to see and alter: Can John see Mary’s salary? Change Mary’s salary?
The real requirement is row/column security, with emphasis on specific rows and columns for a given role. It has been common practice for client applications to enforce such security. While dubious for Web apps, this is simply inappropriate for mobile apps; it must be enforced in the API.
Performance is a key requirement for APIs that form the backbone of a large system supporting many users. Key considerations include:
- Optimistic locking
- Server-enforced integrity
- Batched updates
- Single-message update
- SQL Handling
Most accesses are reads, so these should acquire no database locks. For all other access, users should be able to update unlocked data, which your API should verify has not been altered since retrieval. This can be done with time stamps, or better, by checksums (which don’t intrude into database design).
An infamous anti-pattern is to place business logic in the client. This does not ensure integrity (particularly when the clients are partners) and causes multiple client/server trips. For example, inserting a new Line Item may require business logic that updates the Order, the Customer and the Product. If these are issued from the client, the result is four client/server trips when only one should be required.
Network considerations apply to updates as well as retrieval. Consider many rows retrieved into a client, followed by an update.
APIs should be designed to enable clients to send only the changes, instead of the entire set of objects. They are further designed to enable clients to send multiple row types (e.g. an Order and Items) in a single message. This results in a single, small update message.
Single Message Update
Business logic consists not only of validations, but derivations. These derivations can often involve rows visible to, but not directly updated by the client. For example, saving an Order might update the customer’s balance. It’s critical that the updated balance be reflected on the screen.
Clients typically solve this problem by re-retrieving the data. This is unfortunate in a number of ways. First, it’s an extra client/server trip over a high latency network. And sometimes, it’s difficult to program, such as when the Order’s key is system assigned: The client may not know the computed key. In such cases, the client may need to re-retrieve the entire rich result set.
In the ideal case, the RESTful server returns the refresh information in the update response. So with a single message, the client can communicate a set of updates and use the response to show the computations on related data.
All of the traditional approaches for eliminating expensive database operations apply to building a RESTful API. You can use caching for data that is low in volume and not frequently updated.
An important but often overlooked element is caching inside a transaction to ensure that the same row is read/updated only once. For example, changing five Line Items should result in one update to the Purchase Order, not five.
Major advantages can be realized with proper SQL handling. Imagine we might want to check a customer’s credit limit. This requires that the balance be computed. There are important design choices:
- Aggregate queries: The most obvious approach is to add the Orders’ amounts, either by a select sum query, or by reading the data into memory. But it’s not that simple: the Orders’ amount is itself an aggregation.
- Adjustment updates: Such considerations often drive designers to maintain stored aggregates so they can be adjusted with one-row updates.
This simple example illustrates the significant performance implications of SQL strategy. Imagine changing two Line Items in an Order and maintaining the Customer balance. The system should be able to process this transaction without any aggregate queries (e.g. adding all the Orders for the Customer or the Line Items for each Order). Instead the system should be able to process it by issuing a single update to the Customer and one to Order.
Building a RESTful API from your data is complex and needs careful attention to usability, functions and performance of the API. We have presented some of the points every developer should consider. These can be useful criteria for choosing a framework or product. Many of these functions could be performed by a basket of open source technologies and lots of developer time, or you can use a commercial framework that takes care of these issues and makes this process far simpler.
R. Paul Singh is CEO of Espresso Logic. He has served as a founder, CEO, and advisor to many software startups in the mobile, Internet, and telecom arenas.
- The Key Skills Needed by Big Data Engineers
- Interview Questions for Data Warehouse Developers
- Why Reactive Programming for Databases Is Awesome
Images: R. Paul Singh