nginx-unit

Author	SHA1	Message	Date
Zhidao HONG	14d6d97bac	HTTP: added basic URI rewrite. This commit introduced the basic URI rewrite. It allows users to change request URI. Note the "rewrite" option ignores the contained query if any and the query from the request is preserverd. An example: "routes": [ { "match": { "uri": "/v1/test" }, "action": { "return": 200 } }, { "action": { "rewrite": "/v1$uri", "pass": "routes" } } ] Reviewed-by: Alejandro Colomar <alx@nginx.com>	2023-04-20 23:20:41 +08:00
Alejandro Colomar	fcff55acb6	HTTP: optimizing $request_line. Don't reconstruct a new string for the $request_line from the parsed method, target, and HTTP version, but rather keep a pointer to the original memory where the request line was received. This will be necessary for implementing URI rewrites, since we want to log the original request line, and not one constructed from the rewritten target. This implementation changes behavior (only for invalid requests) in the following way: Previous behavior was to log as many tokens from the request line as were parsed validly, thus: Request -> access log ; error log "GET / HTTP/1.1" -> "GET / HTTP/1.1" OK ; = "GET / HTTP/1.1" -> "GET / HTTP/1.1" [1] ; = "GET / HTTP/2.1" -> "GET / HTTP/2.1" OK ; = "GET / HTTP/1." -> "GET / HTTP/1." [2] ; "GET / HTTP/1. [null]" "GET / food" -> "GET / food" [2] ; "GET / food [null]" "GET / / HTTP/1.1" -> "GET / / HTTP/1.1" [2] ; = "GET / / HTTP/1.1" -> "GET / / HTTP/1.1" [2] ; = "GET food HTTP/1.1" -> "GET" ; "GET [null] [null]" "OPTIONS * HTTP/1.1" -> "OPTIONS" [3] ; "OPTIONS [null] [null]" "FOOBAR baz HTTP/1.1"-> "FOOBAR" ; "FOOBAR [null] [null]" "FOOBAR / HTTP/1.1" -> "FOOBAR / HTTP/1.1" ; = "get / HTTP/1.1" -> "-" ; " [null] [null]" "" -> "-" ; " [null] [null]" This behavior was rather inconsistent. We have several options to go forward with this patch: - NGINX behavior. Log the entire request line, up to '\r' \| '\n', even if it was invalid. This is the most informative alternative. However, RFC-complying requests will probably not send invalid requests. This information would be interesting to users where debugging requests constructed manually via netcat(1) or a similar tool, or maybe for debugging a client, are important. It might be interesting to support this in the future if our users are interested; for now, since this approach requires looping over invalid requests twice, that's an overhead that we better avoid. - Previous Unit behavior This is relatively fast (almost as fast as the next alternative, the one we chose), but the implementation is ugly, in that we need to perform the same operation in many places around the code. If we want performance, probably the next alternative is better; if we want to be informative, then the first one is better (maybe in combination with the third one too). - Chosen behavior Only logging request lines when the request is valid. For any invalid request, or even unsupported ones, the request line will be logged as "-". Thus: Request -> access log [4] "GET / HTTP/1.1" -> "GET / HTTP/1.1" OK "GET / HTTP/1.1" -> "GET / HTTP/1.1" [1] "GET / HTTP/2.1" -> "-" [3] "GET / HTTP/1." -> "-" "GET / food" -> "-" "GET / / HTTP/1.1" -> "GET / / HTTP/1.1" [2] "GET / / HTTP/1.1" -> "GET / / HTTP/1.1" [2] "GET food HTTP/1.1" -> "-" "OPTIONS * HTTP/1.1" -> "-" "FOOBAR baz HTTP/1.1"-> "-" "FOOBAR / HTTP/1.1" -> "FOOBAR / HTTP/1.1" "get / HTTP/1.1" -> "-" "" -> "-" This is less informative than previous behavior, but considering how inconsistent it was, and that RFC-complying agents will probably not send us such requests, we're ready to lose that information in the log. This is of course the fastest and simplest implementation we can get. We've chosen to implement this alternative in this patch. Since we modified the behavior, this patch also changes the affected tests. [1]: Multiple successive spaces as a token delimiter is allowed by the RFC, but it is discouraged, and considered a security risk. It is currently supported by Unit, but we will probably drop support for it in the future. [2]: Unit currently supports spaces in the request-target. This is a violation of the relevant RFC (linked below), and will be fixed in the future, and consider those targets as invalid, returning a 400 (Bad Request), and thus the log lines with the previous inconsistent behavior would be changed. [3]: Not yet supported. [4]: In the error log, regarding the "log_routes" conditional logging of the request line, we only need to log the request line if it was valid. It doesn't make sense to log "" or "-" in case that the request was invalid, since this is only useful for understanding decisions of the router. In this case, the access log is more appropriate, which shows that the request was invalid, and a 400 was returned. When the request line is valid, it is printed in the error log exactly as in the access log. Link: <https://datatracker.ietf.org/doc/html/rfc9112#section-3> Suggested-by: Liam Crilly <liam@nginx.com> Reviewed-by: Zhidao Hong <z.hong@f5.com> Cc: Timo Stark <t.stark@nginx.com> Cc: Andrei Zeliankou <zelenkov@nginx.com> Cc: Andrew Clayton <a.clayton@nginx.com> Cc: Artem Konev <a.konev@f5.com> Signed-off-by: Alejandro Colomar <alx@nginx.com>	2023-04-12 11:50:56 +02:00
Alejandro Colomar	1b05161107	Removed the unsafe nxt_memcmp() wrapper for memcmp(3). The casts are unnecessary, since memcmp(3)'s arguments are 'void '. It might have been necessary in the times of K&R, where 'void ' didn't exist. Nowadays, it's unnecessary, and _very_ unsafe, since casts can hide all classes of bugs by silencing most compiler warnings. The changes from nxt_memcmp() to memcmp(3) were scripted: $ find src/ -type f \ \| grep '\.[ch]$' \ \| xargs sed -i 's/nxt_memcmp/memcmp/' Reviewed-by: Andrew Clayton <a.clayton@nginx.com> Signed-off-by: Alejandro Colomar <alx@nginx.com>	2022-11-04 00:30:27 +01:00
Andrew Clayton	4418f99cd4	Constified numerous function parameters. As was pointed out by the cppcheck[0] static code analysis utility we can mark numerous function parameters as 'const'. This acts as a hint to the compiler about our intentions and the compiler will tell us when we deviate from them. [0]: https://cppcheck.sourceforge.io/	2022-06-22 00:30:44 +02:00
Alejandro Colomar	952bcc50bf	Fixed #define style. We had a mix of styles for declaring function-like macros: Style A: #define \ foo() \ do { \ ... \ } while (0) Style B: #define foo() \ do { \ ... \ } while (0) We had a similar number of occurences of each style: $ grep -rnI '^\w(.\\' \| wc -l 244 $ grep -rn 'define.(.)' \| wc -l 239 (Those regexes aren't perfect, but a very decent approximation.) Real examples: $ find src -type f \| xargs sed -n '/^nxt_double_is_zero/,/^$/p' nxt_double_is_zero(f) \ (fabs(f) <= FLT_EPSILON) $ find src -type f \| xargs sed -n '/define nxt_http_field_set/,/^$/p' #define nxt_http_field_set(_field, _name, _value) \ do { \ (_field)->name_length = nxt_length(_name); \ (_field)->value_length = nxt_length(_value); \ (_field)->name = (u_char ) _name; \ (_field)->value = (u_char ) _value; \ } while (0) I'd like to standardize on a single style for them, and IMO, having the identifier in the same line as #define is a better option for the following reasons: - Programmers are used to `#define foo() ...` (readability). - One less line of code. - The program for finding them is really simple (see below). function grep_ngx_func() { if (($# != 1)); then >&2 echo "Usage: ${FUNCNAME[0]} <func>"; return 1; fi; find src -type f \ \| grep '\.[ch]$' \ \| xargs grep -l "$1" \ \| sort \ \| xargs pcregrep -Mn "(?s)^\$[\w\s]+?^$1\(.?^}"; find src -type f \ \| grep '\.[ch]$' \ \| xargs grep -l "$1" \ \| sort \ \| xargs pcregrep -Mn "(?s)define $1\(.?^$" \ \| sed -E '1s/^[^:]+:[0-9]+:/&\n\n/'; } $ grep_ngx_func Usage: grep_ngx_func <func> $ grep_ngx_func nxt_http_field_set src/nxt_http.h:98: #define nxt_http_field_set(_field, _name, _value) \ do { \ (_field)->name_length = nxt_length(_name); \ (_field)->value_length = nxt_length(_value); \ (_field)->name = (u_char ) _name; \ (_field)->value = (u_char ) _value; \ } while (0) $ grep_ngx_func nxt_sprintf src/nxt_sprintf.c:56: u_char nxt_cdecl nxt_sprintf(u_char buf, u_char end, const char fmt, ...) { u_char p; va_list args; va_start(args, fmt); p = nxt_vsprintf(buf, end, fmt, args); va_end(args); return p; } ................ Scripted change: ................ $ find src -type f \ \| grep '\.[ch]$' \ \| xargs sed -i '/define \\$/{N;s/ \\\n/ /;s/ //}'	2022-05-03 12:11:14 +02:00
Valentin Bartenev	fb80502513	HTTP parser: allowed more characters in header field names. Previously, all requests that contained in header field names characters other than alphanumeric, or "-", or "_" were rejected with a 400 "Bad Request" error response. Now, the parser allows the same set of characters as specified in RFC 7230, including: "!", "#", "$", "%", "&", "'", "*", "+", ".", "^", "`", "\|", and "~". Header field names that contain only these characters are considered valid. Also, there's a new option introduced: "discard_unsafe_fields". It accepts boolean value and it is set to "true" by default. When this option is "true", all header field names that contain characters in valid range, but other than alphanumeric or "-" are skipped during parsing. When the option is "false", these header fields aren't skipped. Requests with non-valid characters in header field names according to RFC 7230 are rejected regardless of "discard_unsafe_fields" setting. This closes #422 issue on GitHub.	2020-11-17 16:50:06 +03:00
Max Romanov	6bda9b5eeb	Using malloc/free for the http fields hash. This is required due to lack of a graceful shutdown: there is a small gap between the runtime's memory pool release and router process's exit. Thus, a worker thread may start processing a request between these two operations, which may result in an http fields hash access and subsequent crash. To simplify issue reproduction, it makes sense to add a 2 sec sleep before exit() in nxt_runtime_exit().	2020-04-16 17:09:23 +03:00
Igor Sysoev	ddde9c23cf	Initial proxy support.	2019-11-14 16:39:54 +03:00
Valentin Bartenev	f7d3db314d	HTTP parser: removed unused "exten" field. This field was intended for MIME type lookup by file extension when serving static files, but this use case is too narrow; only a fraction of requests targets static content, and the URI presumably isn't rewritten. Moreover, current implementation uses the entire filename for MIME type lookup if the file has no extension. Instead of extracting filenames and extensions when parsing requests, it's easier to obtain them right before serving static content; this behavior is already implemented. Thus, we can drop excessive logic from parser.	2019-09-30 19:11:17 +03:00
Valentin Bartenev	2dbda125db	HTTP parser: normalization of paths ending with "." or "..". Earlier, the paths were normalized only if there was a "/" at the end, which is wrong according to section 5.2.4 of RFC 3986 and hypothetically may allow to the directory above the document root.	2019-09-30 19:11:17 +03:00
Valentin Bartenev	6352c21a58	HTTP parser: fixed parsing of target after literal space character. In theory, all space characters in request target must be encoded; however, some clients may violate the specification. For the sake of interoperability, Unit supports unencoded space characters. Previously, if there was a space character before the extension or arguments parts, those parts weren't recognized. Also, quoted symbols and complex target weren't detected after a space character.	2019-09-17 18:40:21 +03:00
Valentin Bartenev	3b77e402a9	HTTP parser: removed unused "plus_in_target" flag.	2019-09-16 20:17:42 +03:00
Valentin Bartenev	2fb7a1bfb9	HTTP parser: removed unused "exten_start" and "args_start" fields.	2019-09-16 20:17:42 +03:00
Valentin Bartenev	64be8717bd	Configuration: added ability to access object members with slashes. Now URI encoding can be used to escape "/" in the request path: GET /config/listeners/unix:%2Fpath%2Fto%2Fsocket/	2019-09-16 20:17:42 +03:00
Max Romanov	29911538ea	Improving response header fields processing. Fields are filtered one by one before being added to fields list. This avoids adding and then skipping connection-specific fields.	2019-08-16 00:56:38 +03:00
Igor Sysoev	0ba7cfce75	Added routing based on header fields.	2019-05-30 15:33:51 +03:00
Andrey Zelenkov	22de5fcddf	Style.	2019-03-11 17:31:59 +03:00
Valentin Bartenev	11cecce114	HTTP parser: relaxed checking of fields values. Allowing characters up to 0xFF doesn't conflict with RFC 7230. Particularly, this make it possible to pass unencoded UTF-8 data through HTTP headers, which can be useful.	2018-07-03 15:18:16 +03:00
Igor Sysoev	606eda045b	Removed '\r' and '\n' artifact macros.	2018-06-25 16:56:45 +03:00
Valentin Bartenev	41317e37da	HTTP parser: saving partial method. This is useful for log purposes.	2018-04-10 16:51:22 +03:00
Valentin Bartenev	8d697e8004	HTTP parser: saving unsupported version. This is useful for log purposes.	2018-04-10 16:51:22 +03:00
Valentin Bartenev	b1b9c78362	HTTP parser: correct "target" for partial or invalid request line.	2018-04-10 16:51:22 +03:00
Valentin Bartenev	d15b4ca906	Style.	2018-04-05 15:49:41 +03:00
Valentin Bartenev	0665896a55	Style: capitalized letters in hexadecimal literals.	2018-04-04 18:13:05 +03:00
Valentin Bartenev	701a54c177	HTTP parser: excluding leading and trailing tabs from field values. As required by RFC 7230.	2018-03-15 21:08:29 +03:00
Valentin Bartenev	0b628bfe48	HTTP parser: allowing tabs in field values as per RFC 7230.	2018-03-15 21:07:57 +03:00
Valentin Bartenev	3d2f85d9ca	HTTP parser: restricting allowed characters in fields values. According to RFC 7230 only printable 7-bit ASCII characters are allowed in field values.	2018-03-15 21:07:56 +03:00
Valentin Bartenev	5a003df1fe	HTTP parser: fixed parsing of field values ending with space. This closes #82 issue on GitHub.	2018-03-15 20:52:39 +03:00
Valentin Bartenev	7fe8f72364	HTTP parser: simplified nxt_http_parse_field_value(). There's no need in loop after 4ac474b68658. Found by Coverity (CID 259713).	2018-01-25 10:31:22 +03:00
Valentin Bartenev	477e8177b7	HTTP parser: restricting control chars in header fields values. This also fixes an infinite loop here (found with honggfuzz).	2018-01-24 15:02:56 +03:00
Valentin Bartenev	0c38ff0e66	Checking for major HTTP version.	2018-01-15 20:50:20 +03:00
Valentin Bartenev	a073616fc3	Improved HTTP version representation.	2018-01-15 20:50:14 +03:00
Valentin Bartenev	3fb140d6d2	HTTP parser: improved error reporting.	2018-01-15 20:49:59 +03:00
Valentin Bartenev	e8aada94de	HTTP parser: allowing underscore in header field names.	2018-01-09 16:50:47 +03:00
Valentin Bartenev	45d08d5145	HTTP parser: introduced nxt_http_parse_fields().	2017-12-27 15:45:23 +03:00
Valentin Bartenev	95a9cb94d5	HTTP parser: fixed memory overflow in the collisions test. The level hash uses the NULL value as the indicator of a free entry in a bucket. So, inserting a NULL value breaks the hash and can lead to a bucket overflow. In case of the collision counter, the value wasn't initialized, since it's not needed for the purpose of checking collisions. As a result, it might contain any garbage from the stack and in some rare cases the value was NULL. Now the value is initilized.	2017-12-26 17:18:57 +03:00
Valentin Bartenev	8830d73261	HTTP parser: reworked header fields handling.	2017-12-25 17:04:22 +03:00
Valentin Bartenev	67d72d46f7	HTTP parser: improved detection of corrupted request line.	2017-12-08 19:18:00 +03:00
Valentin Bartenev	20d720dfc5	HTTP parser: slightly improved readability of code. As suggested by Igor Sysoev.	2017-12-08 19:18:00 +03:00
Max Romanov	f3107f3896	Complex target parser copied from NGINX. nxt_app_request_header_t fields renamed: - 'path' renamed to 'target'. - 'path_no_query' renamed to 'path' and contains parsed value.	2017-07-05 13:31:45 +03:00
Valentin Bartenev	dfd3cc8c0e	Applied nxt_pointer_to() and nxt_value_at() where possible.	2017-06-27 17:27:18 +03:00
Valentin Bartenev	accb489492	HTTP parser: reduced memory consumption of header fields list.	2017-06-20 22:32:13 +03:00
Igor Sysoev	f888a5310c	Using new memory pool implementation.	2017-06-20 19:49:17 +03:00
Valentin Bartenev	db6642f374	HTTP parser: decoupled header fields processing.	2017-06-13 20:11:29 +03:00
Valentin Bartenev	f6e7c2b6a6	HTTP parser: fixed handling header fields with missing colon.	2017-06-09 21:49:51 +03:00
Valentin Bartenev	dee819daab	HTTP parser: changed style of a comment. As requested by Igor.	2017-05-31 14:35:33 +03:00
Valentin Bartenev	ed38d86abb	Added missing "fall through" comments to make GCC 7 happy.	2017-05-10 19:19:14 +03:00
Valentin Bartenev	558d1f8687	HTTP parser: fixed minimum length optimization in headers hash.	2017-04-25 16:57:14 +03:00
Valentin Bartenev	5745e48264	More optimizations of HTTP parser. SSE 4.2 code removed, since loop unrolling gives better results.	2017-03-08 00:38:52 +03:00
Valentin Bartenev	4df646a258	HTTP parser.	2017-03-01 15:29:18 +03:00

1 2

52 Commits