Foreword

What is the purpose of counting requests to your web application?

As I wrote in previous post, knowing the number of requests helps you answer next important business questions:

  • Is anyone using my API? (if requests count is zero then it’s probably nobody)
  • Is my API working? (if requests count is zero than it’s probably broken)
  • Is my API under a DDoS attack ? (if requests count during the last hour is much higher than average than probably it is)

In my case it was a business need - every request with status code “200” to a specific method of my REST API was adding a few cents to our company’s bank account. That’s why we decided to go into the matter.

First of all, we explored all existing paid and free monitoring tools to make a long story short, none of them was a perfect fit.

Secondly, I googled for npm libraries that count requests. I found that in 90% of cases developers count requests for requests rate limiting purposes. Rate limiting is another subject not related to my task in this case.

Roughly speaking, my task was to count all the requests grouped by methods and status codes.

Writing a middleware

My web app is a REST API written on Node.js + Express. To simplify things here is the boilerplate:

const app = require('express')()

app.get('/api/', (req, res) => {
    res.sendStatus(200)
})

app.listen(3000, () => {
    console.log('Server started')
})

The only one legit method to capture all the requests in Express framework is to implement a middleware function and load it before any other handlers.

Quote from the official Express.js docs:

Middleware functions are functions that have access to the request object (req), the response object (res), and the next function in the application’s request-response cycle. The next function is a function in the Express router which, when invoked, executes the middleware succeeding the current middleware. Middleware functions can perform the following tasks: Execute any code. Make changes to the request and the response objects. End the request-response cycle. Call the next middleware in the stack. If the current middleware function does not end the request-response cycle, it must call next() to pass control to the next middleware function. Otherwise, the request will be left hanging.

Just to understand what was happening in my app, I wrote this middleware function (see below) and made several requests.

app.use((req, res, next) => {
   console.log(`${req.method} ${req.originalUrl}`) 
   next()
})

The results are

> curl http://localhost:3000/api
GET /api

> curl http://localhost:3000/api/
GET /api/

> curl http://localhost:3000/api?q=test
GET /api?q=test

Ok, it’s working. Let’s add an ability to capture the response status code. Node.js has a default event that is fired when the response has been sent. More specifically, this event is emitted when the last segment of the response headers and body have been handed off to the operating system for transmission over the network. This hook is res.on("finish").

I should notice that not every request comes to the “finish” state, in real life client can close the connection before the response is sent. In this case Node.js emits only res.on("close") event. To keep this post as simple as it can be I decided to ignore these types of requests.

I modified my middleware to add the info about the response status code

app.use((req, res, next) => {
   res.on("finish", () => {
       console.log(`${req.method} ${req.originalUrl} ${res.statusCode}`) 
   })
   next()
})

The results are

> curl http://localhost:3000/api
GET /api 200

> curl http://localhost:3000/api/
GET /api/ 200

> curl http://localhost:3000/api/?q=test
GET /api?q=test 200

We captured the http verb, the status code and the original url. As you can see the originalUrl is different for each request but the handler path is always the same, it’s api.get("/api"). Let’s capture the handler path instead of the originalUrl. It’s a bit tricky.

Express stores the data about the handler path in req.route object. The object is filled with data only after the handler processed the request. As mentioned above the hook res.on("finish") is called after all the handlers have been executed and the response has been sent. So we should inject a capturing code right in res.on("finish"). Also we should keep in mind that there may be requests without a handler and we also should process them somehow.

I wrote a small helper function to get the correct handler path

getRoute(req) {
   const route = req.route ? req.route.path : '' // check if the handler exist
   const baseUrl = req.baseUrl ? req.baseUrl : '' // adding the base url if the handler is a child of another handler

   return route ? `${baseUrl === '/' ? '' : baseUrl}${route}` : 'unknown route'
 }

And modified the middleware

app.use((req, res, next) => {
   res.on(‘finish’, () => {
       console.log(`${req.method} ${getRoute(req)} ${res.statusCode}`) 
   })
   next()
})

Now the results are consistent

> curl http://localhost:3000/api
GET /api 200

> curl http://localhost:3000/api/
GET /api 200

> curl http://localhost:3000/api?q=test
GET /api 200

> curl http://localhost:3000/
GET unknown route 404

> curl -X POST http://localhost:3000/
POST unknown route 404

Data persistance

The last but not least step is storing the captured data. I decided to store the data in next format:

{
    "GET /stats/ 200": 11, // "route name": "number of requests"
    "GET /api/ 200": 7,
    "GET unknown route 404": 2,
    "POST unknown route 404": 1
}

For demo purposes we will store the statistics in a JSON file. Let’s add two helper methods to read and dump the data.

const fs = require('fs')
const FILE_PATH = 'stats.json'

// read json object from file
const readStats = () => {
    let result = {}
    try {
        result = JSON.parse(fs.readFileSync(FILE_PATH))
    } catch (err) {
        console.error(err)
    }
    return result
}

// dump json object to file
const dumpStats = (stats) => {
    try {
        fs.writeFileSync(FILE_PATH, JSON.stringify(stats), { flag: 'w+' })
    } catch (err) {
        console.error(err)
    }
}

Also, I modified the middleware to add persistance to the statistics

app.use((req, res, next) => {
    res.on('finish', () => {
        const stats = readStats()
        const event = `${req.method} ${getRoute(req)} ${res.statusCode}`
        stats[event] = stats[event] ? stats[event] + 1 : 1
        dumpStats(stats)
    })
    next()
})

And created the /stats method which returns the statistics.

app.get('/stats/', (req, res) => {
    res.json(readStats())
})

We’re done, let’s make a few requests and check the stats.

> curl -X GET  http://localhost:3000/api/
> curl -X POST http://localhost:3000/api/
> curl -X PUT http://localhost:3000/api/
> curl http://localhost:3000/stats/
{
    "GET /api/ 200": 1,
    "POST unknown route 404": 1,
    "PUT unknown route 404": 1
}

As you can see, we have number of request for every route in our app. The whole code of this sample app can be found on GitHub

Conclusion and next steps

In this post I described the basics of request counting. Keeping all your data in a file might not work properly in production, you should persist it in somewhere less volatile, such as a database. It could be Redis, InfluxDB, ElasticSearch, MongoDB, etc. Personally, our Node.js + Express monitoring service SLAO uses an InfluxDB cluster.

Also, in the real world you’d like to have more detailed stats like requests per day/minute/hour along with an ability to view your data in a more convenient way than a JSON returned by your API. What’s more, a plain dashboard with statistics is useless unless you have an alerting attached to it. We’ll cover all these topics later on. As for now, you can check 📊 SLAO, it implements all of these features.