API Best Practices

Introduction

In this article, we will outline the best practices we believe constitute effective and efficient interaction with our Next-Gen API, including recommendations for handling errors and exceptions and optimizing performance and scalability. Additionally, we will provide practical examples and code snippets to demonstrate these best practices in action.

REST

A RESTful API (Representational State Transfer Application Programming Interface) is a type of web service that allows two systems to communicate with each other over the internet. REST is a set of architectural constraints and principles used for creating web services that are scalable, reliable, and easy to maintain.

In a RESTful API, resources are represented as unique URIs (Uniform Resource Identifiers), and the HTTP methods (GET, POST, PUT, DELETE) are used to interact with them. The HTTP method used determines the operation to be performed on the resource. For example, a GET request retrieves a representation of a resource, while a POST request creates a new resource.
A RESTful API uses HTTP status codes to indicate the success or failure of an operation and typically returns data in a standard data format such as JSON (JavaScript Object Notation) or XML (Extensible Markup Language). This makes it easy for client applications to consume the API's data.

The key benefits of a RESTful API include simplicity, scalability, and reliability. By adhering to a set of well-defined principles, developers can create web services that are easy to understand, extend, and maintain. Additionally, because RESTful APIs are based on HTTP, they can be easily cached, which can improve performance and reduce server load.

Fault-Tolerant API Integration

Transient Errors

Transient errors are temporary errors that occur in a system, typically due to network latency, system overload, or other transient conditions. These errors are not indicative of a persistent problem with the system and often resolve themselves over time.

Exponential Backoff

Exponential backoff is a technique used to handle transient errors in between two systems. When an error occurs, the system waits for a certain amount of time before retrying the operation. If the operation fails again, the system waits for a more extended period of time before retrying. This process continues, with each successive wait time increasing exponentilly, until the operation succeeds or a maximum retry limit is reached.

Exponential backoff is a method to avoid overwhelming a system with repeated failed attempts to perform an operation, which can exacerbate the transient error condition. By gradually increasing the time between retries, the system can reduce the load on the system and increase the likelihood of a successful operation.
For example, consider a web application that makes a request to an external API. If the API returns an error due to a transient condition, such as a network timeout, the application can use exponential backoff to retry the request after waiting for an increasing amount of time. This can help ensure that the request eventually succeeds, even if there are temporary issues with the external API.

Short Circuit Pattern

The short circuit pattern is a technique used to handle transient errors in APIs by quickly detecting and bypassing failed components in the system.

When an API request encounters a transient error, the short circuit pattern can be used to quickly detect the error and bypass the failing component. This can be accomplished by using a circuit breaker, which monitors the health of the downstream component and can quickly open the circuit if it detects a failure.
When the circuit is open, subsequent requests are routed to a fallback or alternate component instead of the failing component. This can improve system availability and prevent cascading failures, where a failing component causes other components to fail as well.

The short circuit pattern can be combined with other techniques, such as exponential backoff, to provide a robust and fault-tolerant system for handling transient errors in APIs. By quickly detecting and bypassing failed components, the short circuit pattern can help ensure that API requests are processed successfully, even in the face of transient errors.

Best Practices

Parameterize Page Size

Parameterizing page size allows for you system to dynamically adjust the page size. This will provide flexibility in your integration.

Monitor Response Times

Latency of requests has a number of factors, but is directly proportionally to the page size used. If the response times approach 5 seconds, consider reducing page size.

Monitor HTTP response codes

The HTTP protocol has a method of telling whether an error happened because of the client or the server. When it's the client's error, a response code that starts with 4xx. On the other hand, if the server has an error, the response code will start with 5xx. The client should be able to process all response codes and act accordingly. 4xx level response means that the request is incorrect and should not be repeated.

Monitor 500 response code and transient network errors

Exponential backoffs with a short circuit pattern are strongly encouraged. A recommended pattern to achieve this is:

Wait for 5 seconds before your first retry.
For each subsequent retry, the increase the wait exponentially, up to 120 seconds.
Set a maximum number of retries, such as 10, after which your application will open the short circuit and move forward handling the error state.

Store results and process asynchronously

Storing results and processing the data asynchronously decouples your pipelines from external data providers, like VTS.
Requests that are executed inline within a pipeline will limit the options around fault tolerance. When rate-limiting thresholds are reached or transient errors occur, a pipeline must halt and wait for these issues to resolve.

Decoupling will allow for production issues to be tested, debugged and resolved without impacting external systems and generating load on external data providers. In many integrations, a single client’s daily data pull may be many gigabytes in size. Re-running a set of failing executions would have a non-trivial impact on external data providers.

Tightly coupled integration with external data providers does not allow for the processing of previously pulled data. A pipeline's only concept of the ingested data is the data’s current state as pulled from the external data provider. This greatly impacts reproducibility. Given a failure in the pipeline, developers require the exact data to be able to reproduce the problem. Subsequent executions of the pipeline may ingest different data, which may not exercise the problem areas of the pipeline and therefore make debugging difficult.