how-to-ip-ban-a-bot.md

Disclaimer: I'm not a security hardening expert. I am aware of basic security principles, but this advice may or may not be best practice. This post is just about how I stopped a handful of bots and other unwanted clients from accessing my server.

I found so many bots in my nginx logs

So, I'm sitting in my favorite cafe on a chilly September evening. I decide to check my email, as one does before starting any real work. There's new messages! ... But it's all spam. Sigh.

I decided to go check my web server logs to see if there was any suspicious activity there, too: connections with weird user agent strings, bots or crawlers I didn't want, exploit attempts, stuff like that. Of course, I found all three, in decently large quantities:

...
194.38.20.13 - - [22/Sep/2024:01:58:43 +0000] "GET /admin/php-ofc-library/ofc_upload_image.php HTTP/1.1" 404 162 "-" "ALittle Client" @afterlight.dev
...
5.188.86.25 - - [22/Sep/2024:02:32:37 +0000] "GET /.git/config HTTP/1.1" 500 42 "-" "Go-http-client/1.1" @164.90.154.173
...
51.222.253.13 - - [22/Sep/2024:03:04:57 +0000] "GET /repository/crucible/commit/f19ac218c5cca52179a7db77cf3e8b38d9b7b036/blob/executables/identify/CMakeLists.txt HTTP/1.1" 200 957 "-" "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)" @git.echowritescode.dev
...

Fortunately, these clients had to tell me their IP addresses if they expected my machine to respond to them. So I had enough information to cut them off at the source.

Blocking IP addresses with ufw

Basic security says to block Known Bad Stuff™ as close to the network boundary as possible. nginx possesses the ability to deny individual IP addresses, and some bots might respect a robots.txt, but a much better solution is to tell unwelcome guests that not only is nobody home, there isn't even a home here.

The best way to accomplish this is with a firewall. Firewalls are pretty simple in concept: based on a list of rules you specify, connections to your machine will be allowed or disallowed at the kernel level. In practice, configuring a firewall can be very complicated, but blocking IP addresses is usually not too hard.

Since I'm using a DigitalOcean droplet with Ubuntu 22.04 installed, the firewall I have is ufw (which is really a wrapper around iptables, the Linux kernel's firewall). Here's how to block an IP with ufw:

$ sudo ufw insert 1 deny from <the ip you want to block> comment '<the reason you want to block it>'

There are a few parts to this command:

sudo ufw: hopefully it's obvious that only root can modify the firewall.
insert 1: ufw rules work on a first-match basis. It's important that the DENY rules all come before any blanket ALLOW rules, because the first rule that matches a request will be the one applied; any later matching rules will be ignored. insert 1 here means "apply this rule before any other rules."
deny from <the ip you want to block>: all requests from <the ip you want to block> will be dropped immediately, as though there were no server here. This is in contrast to reject from, which informs the other side that their request was acknowledged but denied.
comment '<the reason you want to block it>': I don't know about you, but once I configure something on a server, I tend to not touch it again unless something breaks. That is very likely to be months or years from the last time I touched it. Since I'll have no other context for why I blocked those IPs, it's important to leave a little clue for myself.

Bad and good things about this solution

Blocking IPs this way is a very brute-force tool. Arguably, if you're working on a service that needs to be highly visible, you can't really afford to block every IP that annoys you, because some of them might still access your site legitimately later. It's not unlikely at all for the same host that keeps trying to GET /eval.php?command=/bin/bash to be a real user that just has a virus on their computer, or is behind a router or VPN with another user that does, or is a legitimate server with a floating IP.

It also scales very poorly if you're dealing with a very high ratio of unwanted traffic to legitimate users. If I'm coffee'd up and in the zone, I can block maybe 10 IPs by hand in about 2 minutes. There are over 4 billion IPv4 addresses; even 1% of that is a lot of trips to the coffee shop.

On the other hand, it's good to have a hand on the root shutoff switch for any public-facing service. Knowing how to say No, Stop It to requests that you don't want means that ultimately, you decide who and what is allowed to access you and your work.

It's also a dead simple solution that doesn't require setting up any other services or keeping any data besides the comments in your firewall rules. If you're only dealing with a small number of persistent delinquents, Just Block Them is a neat, tidy, uncomplicated answer.

For my server, which at the moment is just a small machine hosting a handful of services for me and my friends, I decided this solution was exactly the right level of effectiveness for how lazy it allowed me to be. (That may change in the future, in which case, expect an article about how to set up fail2ban or something similar.)

Addendum: configuring nginx to log hostnames

A minor annoyance about nginx is that its default combined log format, used for access.log, doesn't show any information about the virtual host that handled the request. Here's a simple configuration change you can apply to /etc/nginx/nginx.conf to add the hostname to the end of the log line:

http {
	...

	log_format combined_with_host
		'$remote_addr - $remote_user [$time_local] '
		'"$request" $status $body_bytes_sent '
		'"$http_referer" "$http_user_agent" '
		'@$host';

	access_log /var/log/nginx/access.log combined_with_host;

	...
}

Note that you can't apply this to error.log too; the error_log directive doesn't accept custom formats.

100

101

102

103

104

105

106

107

108

109

110

111

112

113

**Disclaimer:** I'm not a security hardening expert. I am aware of basic security principles, but
this advice may or may not be best practice. This post is just about how I stopped a handful of
bots and other unwanted clients from accessing my server.

## I found so many bots in my nginx logs

So, I'm sitting in my favorite cafe on a chilly September evening. I decide to check my email, as
one does before starting any _real_ work. There's new messages! ... But it's all spam. _Sigh._

I decided to go check my web server logs to see if there was any suspicious activity there, too:
connections with weird user agent strings, bots or crawlers I didn't want, exploit attempts, stuff
like that. Of course, I found all three, in decently large quantities:

```
...
194.38.20.13 - - [22/Sep/2024:01:58:43 +0000] "GET /admin/php-ofc-library/ofc_upload_image.php HTTP/1.1" 404 162 "-" "ALittle Client" @afterlight.dev
...
5.188.86.25 - - [22/Sep/2024:02:32:37 +0000] "GET /.git/config HTTP/1.1" 500 42 "-" "Go-http-client/1.1" @164.90.154.173
...
51.222.253.13 - - [22/Sep/2024:03:04:57 +0000] "GET /repository/crucible/commit/f19ac218c5cca52179a7db77cf3e8b38d9b7b036/blob/executables/identify/CMakeLists.txt HTTP/1.1" 200 957 "-" "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)" @git.echowritescode.dev
...
```

Fortunately, these clients had to tell me their IP addresses if they expected my machine to respond
to them. So I had enough information to cut them off at the source.

## Blocking IP addresses with ufw

Basic security says to block Known Bad Stuff&trade; as close to the network boundary as possible.
nginx possesses the ability to deny individual IP addresses, and _some_ bots might respect a
`robots.txt`, but a much better solution is to tell unwelcome guests that not only is nobody home,
there isn't even a home _here_.

The best way to accomplish this is with a firewall. Firewalls are pretty simple in concept: based
on a list of rules you specify, connections to your machine will be allowed or disallowed at the
kernel level. In practice, configuring a firewall can be very complicated, but blocking IP
addresses is usually not too hard.

Since I'm using a DigitalOcean droplet with Ubuntu 22.04 installed, the firewall I have is ufw
(which is really a wrapper around iptables, the Linux kernel's firewall). Here's how to block an IP
with ufw:

```
$ sudo ufw insert 1 deny from <the ip you want to block> comment '<the reason you want to block it>'
```

There are a few parts to this command:

- `sudo ufw`: hopefully it's obvious that only root can modify the firewall.
- `insert 1`: ufw rules work on a first-match basis. It's important that the `DENY` rules all come
before any blanket `ALLOW` rules, because the first rule that matches a request will be the one
applied; any later matching rules will be ignored. `insert 1` here means "apply this rule before
any other rules."
- `deny from <the ip you want to block>`: all requests from `<the ip you want to block>` will be
dropped immediately, as though there were no server here. This is in contrast to `reject from`,
which informs the other side that their request was acknowledged but denied.
- `comment '<the reason you want to block it>'`: I don't know about you, but once I configure
something on a server, I tend to not touch it again unless something breaks. That is very likely
to be months or years from the last time I touched it. Since I'll have no other context for why I
blocked those IPs, it's important to leave a little clue for myself.

## Bad and good things about this solution

Blocking IPs this way is a very brute-force tool. Arguably, if you're working on a service that
needs to be highly visible, you can't really afford to block every IP that annoys you, because some
of them might still access your site legitimately later. It's not unlikely at all for the same host
that keeps trying to `GET /eval.php?command=/bin/bash` to be a real user that just has a virus on
their computer, or is behind a router or VPN with another user that does, or is a legitimate server
with a floating IP.

It also scales very poorly if you're dealing with a very high ratio of unwanted traffic to
legitimate users. If I'm coffee'd up and in the zone, I can block maybe 10 IPs by hand in about 2
minutes. There are over 4 billion IPv4 addresses; even 1% of that is a lot of trips to the coffee
shop.

On the other hand, it's good to have a hand on the root shutoff switch for any public-facing
service. Knowing how to say No, Stop It to requests that you don't want means that ultimately, you
decide who and what is allowed to access you and your work.

It's also a dead simple solution that doesn't require setting up any other services or keeping any
data besides the comments in your firewall rules. If you're only dealing with a small number of
persistent delinquents, Just Block Them is a neat, tidy, uncomplicated answer.

For my server, which at the moment is just a small machine hosting a handful of services for me
and my friends, I decided this solution was exactly the right level of effectiveness for how lazy
it allowed me to be. <small>(That may change in the future, in which case, expect an article about how to
set up `fail2ban` or something similar.)</small>

## Addendum: configuring nginx to log hostnames

A minor annoyance about nginx is that its default `combined` log format, used for `access.log`,
doesn't show any information about the virtual host that handled the request. Here's a simple
configuration change you can apply to `/etc/nginx/nginx.conf` to add the hostname to the end of
the log line:

```
http {
...

log_format combined_with_host
'$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'@$host';

access_log /var/log/nginx/access.log combined_with_host;

...
}
```

Note that you can't apply this to `error.log` too; the `error_log` directive doesn't accept custom
formats.