Files
nginx-unit/test/test_access_log.py
Alejandro Colomar fcff55acb6 HTTP: optimizing $request_line.
Don't reconstruct a new string for the $request_line from the parsed
method, target, and HTTP version, but rather keep a pointer to the
original memory where the request line was received.

This will be necessary for implementing URI rewrites, since we want to
log the original request line, and not one constructed from the
rewritten target.

This implementation changes behavior (only for invalid requests) in the
following way:

Previous behavior was to log as many tokens from the request line as
were parsed validly, thus:

Request              -> access log              ; error log

"GET / HTTP/1.1"     -> "GET / HTTP/1.1"     OK ; =
"GET   / HTTP/1.1"   -> "GET / HTTP/1.1"    [1] ; =
"GET / HTTP/2.1"     -> "GET / HTTP/2.1"     OK ; =
"GET / HTTP/1."      -> "GET / HTTP/1."     [2] ; "GET / HTTP/1. [null]"
"GET / food"         -> "GET / food"        [2] ; "GET / food [null]"
"GET / / HTTP/1.1"   -> "GET / / HTTP/1.1"  [2] ; =
"GET /  / HTTP/1.1"  -> "GET /  / HTTP/1.1" [2] ; =
"GET food HTTP/1.1"  -> "GET"                   ; "GET [null] [null]"
"OPTIONS * HTTP/1.1" -> "OPTIONS"           [3] ; "OPTIONS [null] [null]"
"FOOBAR baz HTTP/1.1"-> "FOOBAR"                ; "FOOBAR [null] [null]"
"FOOBAR / HTTP/1.1"  -> "FOOBAR / HTTP/1.1"     ; =
"get / HTTP/1.1"     -> "-"                     ; " [null] [null]"
""                   -> "-"                     ; " [null] [null]"

This behavior was rather inconsistent.  We have several options to go
forward with this patch:

-  NGINX behavior.

   Log the entire request line, up to '\r' | '\n', even if it was
   invalid.

   This is the most informative alternative.  However, RFC-complying
   requests will probably not send invalid requests.

   This information would be interesting to users where debugging
   requests constructed manually via netcat(1) or a similar tool, or
   maybe for debugging a client, are important.  It might be interesting
   to support this in the future if our users are interested; for now,
   since this approach requires looping over invalid requests twice,
   that's an overhead that we better avoid.

-  Previous Unit behavior

   This is relatively fast (almost as fast as the next alternative, the
   one we chose), but the implementation is ugly, in that we need to
   perform the same operation in many places around the code.

   If we want performance, probably the next alternative is better; if
   we want to be informative, then the first one is better (maybe in
   combination with the third one too).

-  Chosen behavior

   Only logging request lines when the request is valid.  For any
   invalid request, or even unsupported ones, the request line will be
   logged as "-".  Thus:

   Request              -> access log [4]

   "GET / HTTP/1.1"     -> "GET / HTTP/1.1"     OK
   "GET   / HTTP/1.1"   -> "GET   / HTTP/1.1"  [1]
   "GET / HTTP/2.1"     -> "-"                 [3]
   "GET / HTTP/1."      -> "-"
   "GET / food"         -> "-"
   "GET / / HTTP/1.1"   -> "GET / / HTTP/1.1"  [2]
   "GET /  / HTTP/1.1"  -> "GET /  / HTTP/1.1" [2]
   "GET food HTTP/1.1"  -> "-"
   "OPTIONS * HTTP/1.1" -> "-"
   "FOOBAR baz HTTP/1.1"-> "-"
   "FOOBAR / HTTP/1.1"  -> "FOOBAR / HTTP/1.1"
   "get / HTTP/1.1"     -> "-"
   ""                   -> "-"

   This is less informative than previous behavior, but considering how
   inconsistent it was, and that RFC-complying agents will probably not
   send us such requests, we're ready to lose that information in the
   log.  This is of course the fastest and simplest implementation we
   can get.

   We've chosen to implement this alternative in this patch.  Since we
   modified the behavior, this patch also changes the affected tests.

[1]:  Multiple successive spaces as a token delimiter is allowed by the
      RFC, but it is discouraged, and considered a security risk.  It is
      currently supported by Unit, but we will probably drop support for
      it in the future.

[2]:  Unit currently supports spaces in the request-target.  This is
      a violation of the relevant RFC (linked below), and will be fixed
      in the future, and consider those targets as invalid, returning
      a 400 (Bad Request), and thus the log lines with the previous
      inconsistent behavior would be changed.

[3]:  Not yet supported.

[4]:  In the error log, regarding the "log_routes" conditional logging
      of the request line, we only need to log the request line if it
      was valid.  It doesn't make sense to log "" or "-" in case that
      the request was invalid, since this is only useful for
      understanding decisions of the router.  In this case, the access
      log is more appropriate, which shows that the request was invalid,
      and a 400 was returned.  When the request line is valid, it is
      printed in the error log exactly as in the access log.

Link: <https://datatracker.ietf.org/doc/html/rfc9112#section-3>
Suggested-by: Liam Crilly <liam@nginx.com>
Reviewed-by: Zhidao Hong <z.hong@f5.com>
Cc: Timo Stark <t.stark@nginx.com>
Cc: Andrei Zeliankou <zelenkov@nginx.com>
Cc: Andrew Clayton <a.clayton@nginx.com>
Cc: Artem Konev <a.konev@f5.com>
Signed-off-by: Alejandro Colomar <alx@nginx.com>
2023-04-12 11:50:56 +02:00

305 lines
7.6 KiB
Python

import time
import pytest
from unit.applications.lang.python import TestApplicationPython
from unit.option import option
class TestAccessLog(TestApplicationPython):
prerequisites = {'modules': {'python': 'any'}}
def load(self, script):
super().load(script)
assert 'success' in self.conf(
f'"{option.temp_dir}/access.log"', 'access_log'
), 'access_log configure'
def set_format(self, format):
assert 'success' in self.conf(
{
'path': f'{option.temp_dir}/access.log',
'format': format,
},
'access_log',
), 'access_log format'
def wait_for_record(self, pattern, name='access.log'):
return super().wait_for_record(pattern, name)
def test_access_log_keepalive(self):
self.load('mirror')
assert self.get()['status'] == 200, 'init'
(resp, sock) = self.post(
headers={
'Host': 'localhost',
'Connection': 'keep-alive',
},
start=True,
body='01234',
read_timeout=1,
)
assert (
self.wait_for_record(r'"POST / HTTP/1.1" 200 5') is not None
), 'keepalive 1'
resp = self.post(sock=sock, body='0123456789')
assert (
self.wait_for_record(r'"POST / HTTP/1.1" 200 10') is not None
), 'keepalive 2'
def test_access_log_pipeline(self):
self.load('empty')
self.http(
b"""GET / HTTP/1.1
Host: localhost
Referer: Referer-1
GET / HTTP/1.1
Host: localhost
Referer: Referer-2
GET / HTTP/1.1
Host: localhost
Referer: Referer-3
Connection: close
""",
raw_resp=True,
raw=True,
)
assert (
self.wait_for_record(r'"GET / HTTP/1.1" 200 0 "Referer-1" "-"')
is not None
), 'pipeline 1'
assert (
self.wait_for_record(r'"GET / HTTP/1.1" 200 0 "Referer-2" "-"')
is not None
), 'pipeline 2'
assert (
self.wait_for_record(r'"GET / HTTP/1.1" 200 0 "Referer-3" "-"')
is not None
), 'pipeline 3'
def test_access_log_ipv6(self):
self.load('empty')
assert 'success' in self.conf(
{"[::1]:7080": {"pass": "applications/empty"}}, 'listeners'
)
self.get(sock_type='ipv6')
assert (
self.wait_for_record(
r'::1 - - \[.+\] "GET / HTTP/1.1" 200 0 "-" "-"'
)
is not None
), 'ipv6'
def test_access_log_unix(self, temp_dir):
self.load('empty')
addr = f'{temp_dir}/sock'
assert 'success' in self.conf(
{f'unix:{addr}': {"pass": "applications/empty"}}, 'listeners'
)
self.get(sock_type='unix', addr=addr)
assert (
self.wait_for_record(
r'unix: - - \[.+\] "GET / HTTP/1.1" 200 0 "-" "-"'
)
is not None
), 'unix'
def test_access_log_referer(self):
self.load('empty')
self.get(
headers={
'Host': 'localhost',
'Referer': 'referer-value',
'Connection': 'close',
}
)
assert (
self.wait_for_record(r'"GET / HTTP/1.1" 200 0 "referer-value" "-"')
is not None
), 'referer'
def test_access_log_user_agent(self):
self.load('empty')
self.get(
headers={
'Host': 'localhost',
'User-Agent': 'user-agent-value',
'Connection': 'close',
}
)
assert (
self.wait_for_record(
r'"GET / HTTP/1.1" 200 0 "-" "user-agent-value"'
)
is not None
), 'user agent'
def test_access_log_http10(self):
self.load('empty')
self.get(http_10=True)
assert (
self.wait_for_record(r'"GET / HTTP/1.0" 200 0 "-" "-"') is not None
), 'http 1.0'
def test_access_log_partial(self):
self.load('empty')
assert self.post()['status'] == 200, 'init'
resp = self.http(b"""GE""", raw=True, read_timeout=1)
time.sleep(1)
assert (
self.wait_for_record(r'"-" 400 0 "-" "-"') is not None
), 'partial'
def test_access_log_partial_2(self):
self.load('empty')
assert self.post()['status'] == 200, 'init'
self.http(b"""GET /\n""", raw=True)
assert (
self.wait_for_record(r'"-" 400 \d+ "-" "-"') is not None
), 'partial 2'
def test_access_log_partial_3(self):
self.load('empty')
assert self.post()['status'] == 200, 'init'
resp = self.http(b"""GET / HTTP/1.1""", raw=True, read_timeout=1)
time.sleep(1)
assert (
self.wait_for_record(r'"-" 400 0 "-" "-"') is not None
), 'partial 3'
def test_access_log_partial_4(self):
self.load('empty')
assert self.post()['status'] == 200, 'init'
resp = self.http(b"""GET / HTTP/1.1\n""", raw=True, read_timeout=1)
time.sleep(1)
assert (
self.wait_for_record(r'"-" 400 0 "-" "-"') is not None
), 'partial 4'
@pytest.mark.skip('not yet')
def test_access_log_partial_5(self):
self.load('empty')
assert self.post()['status'] == 200, 'init'
self.get(headers={'Connection': 'close'})
assert (
self.wait_for_record(r'"GET / HTTP/1.1" 400 \d+ "-" "-"')
is not None
), 'partial 5'
def test_access_log_get_parameters(self):
self.load('empty')
self.get(url='/?blah&var=val')
assert (
self.wait_for_record(
r'"GET /\?blah&var=val HTTP/1.1" 200 0 "-" "-"'
)
is not None
), 'get parameters'
def test_access_log_delete(self):
self.load('empty')
assert 'success' in self.conf_delete('access_log')
self.get(url='/delete')
assert self.search_in_log(r'/delete', 'access.log') is None, 'delete'
def test_access_log_change(self, temp_dir):
self.load('empty')
self.get()
assert 'success' in self.conf(f'"{temp_dir}/new.log"', 'access_log')
self.get()
assert (
self.wait_for_record(r'"GET / HTTP/1.1" 200 0 "-" "-"', 'new.log')
is not None
), 'change'
def test_access_log_format(self):
self.load('empty')
def check_format(format, expect, url='/'):
self.set_format(format)
assert self.get(url=url)['status'] == 200
assert self.wait_for_record(expect) is not None, 'found'
format = 'BLAH\t0123456789'
check_format(format, format)
check_format('$uri $status $uri $status', '/ 200 / 200')
def test_access_log_variables(self):
self.load('mirror')
# $body_bytes_sent
self.set_format('$uri $body_bytes_sent')
body = '0123456789' * 50
self.post(url='/bbs', body=body, read_timeout=1)
assert (
self.wait_for_record(fr'^\/bbs {len(body)}$') is not None
), '$body_bytes_sent'
def test_access_log_incorrect(self, temp_dir, skip_alert):
skip_alert(r'failed to apply new conf')
assert 'error' in self.conf(
f'{option.temp_dir}/blah/access.log',
'access_log/path',
), 'access_log path incorrect'
assert 'error' in self.conf(
{
'path': f'{temp_dir}/access.log',
'format': '$remote_add',
},
'access_log',
), 'access_log format incorrect'