Quick and Practical Introduction to RESTful APIs
Intro
This is a quick and rough introduction for fresh developers to RESTful APIs.
It should cover just enough to get you started while focusing only on the practical aspects of REST APIs.
Why?
Why should you build a RESTful API? The reason is, basically, that it’s easy to use and support such interfaces.
As Wikipedia defines it:
Representational state transfer (REST) is a software architectural style that was created to guide the design and development of the architecture for the World Wide Web.
It was noticed by Roy Fielding, that the early WWW concepts can be applied not only to the websites but the APIs too. There were a few traits of web-based systems:
- They use HTTP protocol which is simple to implement
- Operations are limited to a few main methods (GET, DELETE, POST, PATCH/PUT)
- The web is stateless (at least most of the early web)
- The responses are cacheable
Using these insights he derived a few main guidelines to follow.
I would like to believe that REST-like interfaces are here to stay for a long time and will be a fundamental technique to build distributed APIs on the web.
The web itself is REST-like and we’ve seen now for the last 40 years that it works quite well, so it’s reasonable to believe that APIs might follow the same case.
Architectural Constraints
These constraints (guidelines) are (were):
- Client-server architecture
- Statelessness
- Cacheability
- Layered system
Code on demand- Uniform Interface:
- Resource identification in requests
- Resource manipulation through representations
- Self-descriptive messages
- Hypermedia as the engine of application state (HATEOAS)
There is no need to explain their original definitions but for the modern APIs, it roughly translates to the guidelines outlined below.
Modern guidelines
So what does all of that mean? Let’s rework these rules into something easy to understand.
Use HTTP protocol
The HTTP protocol is simple and well-understood. There is a lot of infrastructure to facilitate HTTP-based APIs (client and server libraries) so it’s easy to use it.
Almost all programming languages support HTTP protocol out-of-the-box which wouldn’t be the case with RPC-based frameworks like RMI.
Resources should have well-defined locations (URIs)
First of all, what is a resource? The easiest way to describe, it would be to say that the resource is an entity. It could be a User, Order, LogEntry, etc.
Resource should be identifiable so it means it should have an ID.
Next, each resource should have an easy to use URI (URL since we are using HTTP(S)). Here are some examples:
/users/{user-id}
/users/{user-id}/orders/{order-id}
The nesting of resources should reflect the logical structure:
/users/
- a place to find all of the users/users/{user-id}
- a place to find one specific user/users/{user-id}/orders
- a place to retrieve all orders for one specific user/users/{user-id}/orders/{order-id}
- retrieve one specific order for a specific user (and throw an error (404) if the order withorder-id
doesn’t belong to that specific user )/orders
- return all orders on the system/orders?status=completed
- return all completed orders
This makes resources easy to find and retrieve. Naturally, these URLs should contain
full URL like https://mydomain.com/users/{user-id}
.
Use HTTP methods to facilitate different operations
Now we know where is our stuff is stored, so we can start doing something with it.
For that, we use the following HTTP verbs
GET
GET is used to retrieve content
GET /users/
- get all of the usersGET /users/123
- get one specific user
Please note, that the GET should be idempotent meaning that if you request the resource multiple times, it shouldn’t change the state of the resource and it should always return the same resource.
Basically, besides logging, you should not make any changes on the system because of GET request.
POST
POST - to create new content
POST /users/
@CONTENT - create a new user with the details that have been specified in the @CONTENT payload.
When doing the post requests, there are a few ways to handle the responses:
- Return the object immediately as a part of the response with its current state
- Return the “Location” header and redirect the user to the newly created resource
Both ways will work fine.
DELETE
DELETE - to remove entities/content:
DELETE /users/123/
- delete a specific user from a systemDELETE /users/1234/orders/AR203
- cancel an order after it was created.
PUT/PATCH
PUT/PATCH - update/change existing contents.
PUT /users/123
[email protected] - update some specific value in the resource
HATEOS
HATEOS (Hypermedia as the engine of application state), in my opinion, is a bit too fancy thing for a regular developer to care about.
I’ll borrow an example from here:
GET http://api.domain.com/management/departments/10
{
"departmentId": 10,
"departmentName": "Administration",
"links": [
{
"href": "10/employees",
"rel": "employees",
"type" : "GET"
}
]
}
The links
part shows us related resources and we can follow up on them. This is useful if
you are exploring API and can provide an additional context of what could be done
with the entity.
There is a so-called Maturity Model for APIs:
- Level 0 - no RESTful aspects on the API
- Level 1 - API has a concept of resources
- Level 2 - HTTP Verbs are used to facilitate domain logic
- Level 3 - API is using HATEOS
In practice, Level 2 is plenty. HATEOS is usually an overkill for an API that’s gonna be used by 2-10 people. And that’s the most common use case because most of the APIs are internal and are consumed by internal teams.
Burdening your developers with fancy stuff like links to the sub-resources (somebody needs to build that stuff) is often too much, and a much simpler approach like “Hey Jack, where I can get a list of orders for the user?” is often good enough solution.
Practical tips
There are some practical aspects of building RESTful APIs.
Use JSON
It might be an obvious one, but just use JSON to transfer the data.
JSON is as widely supported as HTTP itself and you will have fewer problems with your clients. Also, it’s easy for humans to read and understand (definitely better than ProtoBuf).
XML used to be an option 10-15 years ago but these days it’s too verbose and XML parsing libraries are quite heavy (dependency wise) and sluggish.
Version API
There are two ways to version APIs:
- Use Content-Type with custom vendor extensions and include the version there like
vnd.mycompany.OrderV2+json
- Include the version as part of the URL:
/api/v1/orders/
I find it better and easier to use the URL version of the versioning because dealing with Headers is often much more tedious.
Also, please note that if we introduce a new version of orders via URL, it doesn’t mean that the other versions need to be moved/bumped two. There could be multiple versions active at the same time:
/api/v1/orders/
/api/v2/orders/
/api/v1/users/
- etc
and the new version of the API use can be rolled out gradually on the client side while supporting the old clients.
Furthermore, it’s a good idea to prefix the API service
with /api
to allow you to
add non-API endpoints in the future and route/redirect them differently if there is a need.
Return the full resource
Quite often, you can hear people worrying about sending too much data over the wire. They start introducing all sorts of Data Transfer Objects and will stop working with the original Entities.
An argument that comes up quite often is
I only need a list of order IDs - why do I need to fetch all the other information from the database and then send it over the HTTP?
If you are following Domain-Driven Design (or at least if you are using Aggregates), you will have to retrieve the entire document from the database. And then, there is a very little extra cost of sending a few more kB over the wire.
However, you only have one URL for the same resources. Multiple clients will be accessing the same URL with different needs. So why you might need only the ID of the Order, the other users might need many more fields.
Normally, there should be no problems handling 1-4MB response sizes for your HTTP client. In 4MB you can fit in 1000 entries that are 4kB big. And 4kB is a lot of text. Compressing that with gzip (and you should enable that on your webserver), can reduce that size 10x. Also, here you would normally be using pagination :).
Finally, if you really want to limit data to just a few fields you need, depending on your
framework (or your skills), you can add an option fields
or something similar to return only
the fields you need:
GET /orders?fields=id,status
[
{
"id": 1,
"status": "pending",
},
{
"id": 2,
"status": "pending",
}
]
In the end, premature optimization is the root of all evil. Optimize your payloads when it’s actually needed.
Do not make API breaking changes
Seldom it is a good idea to break API for existing users. Let’s take a look at the changes which are breaking and which are not.
What’s OK:
- Adding new field
- Adding new endpoint/resource
- Adding additional query parameters
What’s BAD:
- Renaming/Removing existing field
- Renaming/Removing existing URL/Resource
- Renaming default Content-Type
- Renaming/Removing existing query parameters
Pitfalls to avoid
Some pitfalls are easy to notice and should be avoided.
Do not use verbs in the URL
If you find “RESTful” APIs with URLs like these:
POST /orders/create
GET /users/{user-id}/delete
It means they are failing to use URIs as a way to communicate the structure of the resources and HTTP verbs are crying in the corner because you are not using them.
Even HTTP clients will be confused as you will be getting errors with GET requests:
GET /users/1/delete
- OKGET /users/1/delete
- Error. Not found.
HTTP and REST is not your domain layer
This is a tricky one. Just because you are getting some content from the client it doesn’t mean that you have to accept it.
For example, let’s say there is an order:
{
"id": 1,
"total": 12,
"status": "pending"
}
and let’s say there is a business process that requires the order to be approved first before it can be shipped. The whole state machine looks like this:
pending -> approved -> handling -> shipped"
So if your order is in “pending” status, and the client sends a request like this:
{
"status": "handling"
}
you will have to reject it.
And it would be the wrong approach to add just a bunch of ifs on the REST layer. You should facilitate a process like this:
order = findOrderById(id)
order = order.readyForHandling()
// OR
order = order.status("handling") // "handling" is retrieved from the request
and .readyForHandling()
or .status()
should throw an exception because
we want to move it to an illegal states.
Business process logic should be the domain logic and it shouldn’t be part of the RESTful interface.
Supporting multiple content types
While often it is easy to add support for XML and JSON and whatever else using
frameworks (like Spring Boot) often there is no need to do that.
Just having JSON is enough on 99% of occasions and if by pure chance somebody will start using the XML version of the API, you will have to maintain that.
Just stick to one content type and do not change it.
Reinventing too complex security
In 95% of cases, it should be enough to use:
- HTTP Basic Authentication.
- HTTPS
HTTP Basic Auth should be in the format user_id:api_key
. This way you will support all of the HTTP
clients out-of-the-box and it will be super simple to use.
There is no need to do strange stuff like digests, hashing, encryption, and so on because HTTPS will handle that for you.
These days there is almost no excuse for not using HTTPS.
Not using existing HTTP status codes
If every request responds with HTTP status code 200 or 500 there might be missed opportunities to utilize the HTTP stack better.
Use the following codes:
- 201 - Resource created
- 401 - Unauthenticated
- 403 - Unauthorized (originally, it’s 401 unauthorized)
- 404 - no such resource
- 409 - Resource is already out-of-date
Resources
- The initial dissertation can be found here
- A great resource by InfoQ
- RESTful Web APIs: Services for a Changing World
- RESTful Web Services