# Investigate problems due to User-Agent using Bash

**Date:** 2019-08-18  
**Author:** Kees C. Bakker  
**Categories:** bash  
**Original:** https://keestalkstech.com/investigate-problems-due-to-user-agent-using-bash/

![Investigate problems due to User-Agent using Bash](https://keestalkstech.com/wp-content/uploads/2019/08/photo-1477519242566-6ae87c31d212.jpg)

---

Last week we had some problems with the Google Ads bot. It could not crawl a bunch of URLs, while the browser had no problem getting through. The only difference was the User-Agent. This send us on a debugging journey through Cloudflare, gateways and micro-sites.

To assist us, we've created a small bash script to visit a URL and show the HTTP status code, the [Location header](https://developer.mozilla.org/nl/docs/Web/HTTP/Headers/Location) and some size information to find out which part of our setup was causing problems.

## Enter the bash

Let's use [cURL](https://curl.haxx.se/docs/manpage.html) and [grep](https://ss64.com/bash/grep-regex.html) to visit a URL and display the result:

```sh
#!/bin/bash

function visit {

  local url="$1";
  local userAgent="$2"

  echo -e "Visiting $url"
  echo -e "Using User-Agent: $userAgent, results in:";
  echo "--------------------------------";
  curl \
    --user-agent "$userAgent" \
    --verbose \
    --write-out "\nHeader size: %{size_header}, Download size: %{size_download}\n" \
    --silent "$url" 2>&1 | \
  grep --extended-regexp "^(< HTTP)|(< Location)|(Header size)";
  echo "--------------------------------";
  echo;
}
```

This function makes it easier to test and tinker.

## Calling it

Our bug has to do with the User-Agent (UA). When the Google Ads bot UA is used, an [HTTP 502 Bad Gateway](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/502) is returned. But which gateway is it and why?

Let's call all the "stations" between outside and the micro-site with two the UA's and discover the difference.

```sh
UA_Chrome="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36";
UA_AdsBot="AdsBot-Google (+http://www.google.com/adsbot.html)";

URL="sea/C21/1AI/D01/?v=3.1&NavState=/_/N-1xv8Zmpk&kind=Regenlaars+(Dames)";

# Cloudflare
visit "https://www.wehkamp.nl/$URL" "$UA_Chrome";
visit "https://www.wehkamp.nl/$URL" "$UA_AdsBot";

# main gateway
visit "https://gateway.wehkamp.internal/$URL" "$UA_Chrome";
visit "https://gateway.wehkamp.internal/$URL" "$UA_AdsBot";

# local gateway
visit "https://pop-gateway.wehkamp.internal/$URL" "$UA_Chrome";
visit "https://pop-gateway.wehkamp.internal/$URL" "$UA_AdsBot";

# micro-site
visit "https://pop-site.wehkamp.internal/$URL" "$UA_Chrome";
visit "https://pop-site.wehkamp.internal/$URL" "$UA_AdsBot";
```

## What's the verdict?

Well, it turned out that our micro-site gave back some rather big [CSP headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP) when a *non-standard* user-agent visited, increasing the size of the headers with 240%. This caused the local NGINX gateway to log `upstream sent too big header while reading response header from upstream` and to return a 502.

Re-configuring the local gateway to [have more buffers](https://stackoverflow.com/a/27551259/201482) fixed the problem immediately:

```
proxy_buffer_size          128k;
proxy_buffers              4 256k;
proxy_busy_buffers_size    256k;
```

The next step is making sure our micro-site does not send different CSP headers to the Google Ads, but that will take more effort.
