Commit Graph

59 Commits

Author SHA1 Message Date
Andrew Clayton
3ecdd2c69c Isolation: Rename NXT_HAVE_CLONE -> NXT_HAVE_LINUX_NS.
Due to the need to replace our use of clone/__NR_clone on Linux with
fork(2)/unshare(2) for enabling Linux namespaces(7) to keep the
pthreads(7) API working.  Let's rename NXT_HAVE_CLONE to
NXT_HAVE_LINUX_NS, i.e name it after the feature, not how it's
implemented, then in future if we change how we do namespaces again we
don't have to rename this.

Reviewed-by: Alejandro Colomar <alx@nginx.com>
Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
2023-02-17 21:24:18 +00:00
Andrew Clayton
867a839f10 Isolation: wired up per-application cgroup support internally.
This commit hooks into the cgroup infrastructure added in the previous
commit to create per-application cgroups.

It does this by adding each "prototype process" into its own cgroup,
then each child process inherits its parents cgroup.

If we fail to create a cgroup we simply fail the process. This behaviour
may get enhanced in the future.

This won't actually do anything yet. Subsequent commits will hook this
up to the build and config systems.

Reviewed-by: Alejandro Colomar <alx@nginx.com>
Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
2022-12-10 14:00:20 +00:00
Max Romanov
900828cc4b Fixing isolated process PID manipulation.
Registering an isolated PID in the global PID hash is wrong
because it can be duplicated.  Isolated processes are stored only
in the children list until the response for the WHOAMI message is
processed and the global PID is discovered.

To remove isolated siblings, a pointer to the children list is
introduced in the nxt_process_init_t struct.

This closes #633 issue on GitHub.
2022-08-11 13:33:46 +01:00
Alejandro Colomar
c8d9106a0d Removed code used when NXT_HAVE_POSIX_SPAWN is false.
posix_spawn(3POSIX) was introduced by POSIX.1d
(IEEE Std 1003.1d-1999), and was later consolidated in
POSIX.1-2001, requiring it in all POSIX-compliant systems.
It's safe to assume it's always available, more than 20 years
after its standardization.

Link: <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/spawn.h.html>
2022-07-18 19:09:30 +02:00
Max Romanov
9e2e69dd58 Fixing alerts on router restart.
Splitting the process type connectivity matrix to 'keep ports' and 'send
ports'; the 'keep ports' matrix is used to clean up unnecessary ports after
forking a new process, and the 'send ports' matrix determines which process
types expect to get created process ports.

Unfortunately, the original single connectivity matrix no longer works because
of an application stop delay caused by prototypes.  Existing applications
should not get the new router port at the moment.
2021-11-24 13:11:48 +03:00
Tiago Natel de Moura
e207415a78 Introducing application prototype processes. 2021-11-09 15:48:44 +03:00
Tiago Natel de Moura
1de660b6df Changed nxt_process_* for reuse.
This enables the reuse of process creation functions.
2021-11-09 15:48:44 +03:00
Max Romanov
80a8cb835b Preserving the app port write socket.
The socket is required for intercontextual communication in multithreaded apps.
2020-10-28 00:01:46 +03:00
Tiago Natel de Moura
a8a7eeb1fc Moved isolation related code to "nxt_isolation.c". 2020-08-20 15:22:58 +01:00
Max Romanov
2f3d27fa22 Process structures refactoring in runtime and libunit.
Generic process-to-process shared memory exchange is no more required.  Here,
it is transformed into a router-to-application pattern.  The outgoing shared
memory segments collection is now the property of the application structure.
The applications connect to the router only, and the process only needs to group
the ports.
2020-08-11 19:20:17 +03:00
Max Romanov
3cbc22a6dc Changing router to application port exchange protocol.
The application process needs to request the port from the router instead of the
latter pushing the port before sending a request to the application.  This is
required to simplify the communication between the router and the application
and to prepare the router to use the application shared port and then the queue.
2020-08-11 19:20:10 +03:00
Max Romanov
ec3389b63b Libunit refactoring: port management.
- Changed the port management callbacks to notifications, which e. g. avoids
the need to call the libunit function
- Added context and library instance reference counts for a safer resource
release
- Added the router main port initialization
2020-08-11 19:19:55 +03:00
Tiago Natel de Moura
f8ba5d6c00 Isolation: fixed build when features aren't detected. 2020-06-23 12:11:27 +01:00
Tiago Natel de Moura
e2b53e16c6 Added "rootfs" feature. 2020-05-28 14:57:41 +01:00
Tiago Natel de Moura
e9e5ddd5a5 Refactor of process management.
The process abstraction has changed to:

  setup(task, process)
  start(task, process_data)
  prefork(task, process, mp)

The prefork() occurs in the main process right before fork.

The file src/nxt_main_process.c is completely free of process
specific logic.

The creation of a process now supports a PROCESS_CREATED state.  The
The setup() function of each process can set its state to either
created or ready.  If created, a MSG_PROCESS_CREATED is sent to main
process, where external setup can be done (required for rootfs under
container).

The core processes (discovery, controller and router) doesn't need
external setup, then they all proceeds to their start() function
straight away.

In the case of applications, the load of the module happens at the
process setup() time and The module's init() function has changed
to be the start() of the process.

The module API has changed to:

  setup(task, process, conf)
  start(task, data)

As a direct benefit of the PROCESS_CREATED message, the clone(2) of
processes using pid namespaces now doesn't need to create a pipe
to make the child block until parent setup uid/gid mappings nor it
needs to receive the child pid.
2020-03-09 16:28:25 +00:00
Max Romanov
58cc13ab29 Resolving a racing condition while adding ports on the app's side.
An earlier attempt (ad6265786871) to resolve this condition on the
router's side added a new issue: the app could get a request before
acquiring a port.
2020-04-10 16:21:58 +03:00
Max Romanov
792ef9d3c7 Fixing 'find & add' racing condition in connected ports hash.
Missing error log messages added.
2020-04-06 16:52:11 +03:00
Tiago Natel
411daeaa53 Isolation: allowed the use of credentials with unpriv userns.
The setuid/setgid syscalls requires root capabilities but if the kernel
supports unprivileged user namespace then the child process has the full
set of capabilities in the new namespace, then we can allow setting "user"
and "group" in such cases (this is a common security use case).

Tests were added to ensure user gets meaningful error messages for
uid/gid mapping misconfigurations.
2019-12-06 16:52:50 +00:00
Tiago Natel
ed2492a66a Moved credential-related code to nxt_credential.c.
This is required to avoid include cycles, as some nxt_clone_* functions
depend on the credential structures, but nxt_process depends on clone
structures.
2019-12-06 13:28:05 +00:00
Tiago Natel
417cc7be7c Refactor of process init.
Introduces the functions nxt_process_init_create() and
nxt_process_init_creds_set().
2019-11-26 16:26:24 +00:00
Tiago Natel
2f23923e44 Changed the group listing to run unprivileged when possible.
Now the nxt_user_groups_get() function uses getgrouplist(3) when available
(except MacOS, see below).  For some platforms, getgrouplist() supports
a method of probing how much groups the user has but the behavior is not
consistent.  The method used here consists of optimistically trying to get up
to min(256, NGROUPS_MAX) groups; only if ngroups returned exceeds the original
value, we do a second call.  This method can block main's process if LDAP/NDIS+
is in use.

MacOS has getgrouplist(3) but it's buggy.  It doesn't update ngroups if the
value passed is smaller than the number of groups the user has.  Some
projects (like Go stdlib) call getgrouplist() in a loop, increasing ngroups
until it exceeds the number of groups user belongs to or fail when a limit
is reached.  For performance reasons, this is to be avoided and MacOS is
handled in the fallback implementation.

The fallback implementation is the old Unit approach.  It saves main's
user groups (getgroups(2)) and then calls initgroups(3) to load application's
groups in main, then does a second getgroups(2) to store the gids and restore
main's groups in the end.  Because of initgroups(3)' call to setgroups(2),
this method requires root capabilities.  In the case of OSX, which has
small NGROUPS_MAX by default (16), it's not possible to restore main's groups
if it's large; if so, this method fallbacks again: user_cred gids aren't
stored, and the worker process calls initgroups() itself and may block for
some time if LDAP/NDIS+ is in use.
2019-11-26 16:15:23 +00:00
Hong Zhi Dao
5d42599e33 Process port refactoring.
- Introduced nxt_runtime_process_port_create().
- Moved nxt_process_use() into nxt_process.c from nxt_runtime.c.
- Renamed nxt_runtime_process_remove_pid() as nxt_runtime_process_remove().
- Some public functions transformed to static.

This closes #327 issue on GitHub.
2019-10-29 16:07:21 +03:00
Tiago Natel
4a79e9631b Added clone syscall check for uid/gid mapping.
Now it's possible to pass -DNXT_HAVE_CLONE=0 for debugging.
2019-10-28 16:02:40 +00:00
Tiago Natel
23b94fde83 Improved error logging when uid/gid map is not properly set.
When using "credential: true", the new namespace starts with a completely
empty uid and gid ranges.  Then, any setuid/setgid/setgroups calls using ids
not properly mapped with uidmap and gidmap fields return EINVAL, meaning
the id is not valid inside the new namespace.
2019-10-22 14:46:15 +00:00
Valentin Bartenev
f2c0f2899a Refactored nxt_process_create() for more explicit pipe closing. 2019-09-26 16:03:02 +03:00
Valentin Bartenev
9c06bfdf2c Fixed descriptors leak on process creation.
The leak has been introduced in 325b315e48c4.
This closes #322 issue in GitHub.
2019-09-26 16:03:01 +03:00
Tiago de Bem Natel de Moura
c554941b4f Initial applications isolation support using Linux namespaces. 2019-09-19 15:25:23 +03:00
Max Romanov
6c694d4b47 Ignoring EPERM error when changing application process uid/gid.
This closes #228 issue on GitHub.
2019-03-22 15:32:58 +03:00
Max Romanov
5ef1352fae Initializing user_cred gids and ngroups for MacOS. 2018-09-19 18:53:16 +03:00
Max Romanov
903ee2de64 Misspelled variable names fixed. 2018-09-07 18:45:14 +03:00
Igor Sysoev
cb36b07686 Removing Unix control socket on start failure.
The bug had appeared in 5cc5002a788e when process type has been
converted to bitmask. This commit reverts the type back to a number.

This commit is related to #131 issue on GitHub.
2018-06-18 17:14:30 +03:00
Igor Sysoev
6273819080 Removed unused single process type. 2018-06-18 17:14:25 +03:00
Valentin Bartenev
912a49c609 Reduced number of critical log levels. 2018-03-05 17:32:50 +03:00
Sergey Kandaurov
771e9d3cc3 Fixed formatting in nxt_sprintf() and logging. 2018-01-24 15:16:33 +03:00
Max Romanov
6bbed85899 Fixing Coverity warnings.
CID 200496
CID 200494
CID 200490
CID 200489
CID 200483
CID 200482
CID 200472
CID 200465
2017-11-20 17:08:29 +03:00
Max Romanov
b3aab8c66f Filtering process to keep connection.
- Main process should be connected to all other processes.
- Controller should be connected to Router.
- Router should be connected to Controller and all Workers.
- Workers should be connected to Router worker thread ports only.

This filtering helps to avoid unnecessary communication and various errors
during massive application workers stop / restart.
2017-10-19 17:37:19 +03:00
Max Romanov
6532e46465 Supporting concurrent shared memory fd receive in router.
Two different router threads may send different requests to single
application worker.  In this case shared memory fds from worker
to router will be send over 2 different router ports.  These fds
will be received and processed by different threads in any order.

This patch made possible to add incoming shared memory segments in
arbitrary order.  Additionally, array and memory pool are no longer
used to store segments because of pool's single threaded nature.

Custom array-like structure nxt_port_mmaps_t introduced.
2017-10-19 17:36:56 +03:00
Max Romanov
e44401a0bb Introducing process use counter.
This helps to decouple process removal from port memory pool cleanups.
2017-10-04 15:02:11 +03:00
Max Romanov
6a64533fa3 Introducing use counters for port and app. Thread safe port write.
Use counter helps to simplify logic around port and application free.

Port 'post' function introduced to simplify post execution of particular
function to original port engine's thread.

Write message queue is protected by mutex which makes port write operation
thread safe.
2017-10-04 14:58:47 +03:00
Max Romanov
ba31199786 Removing mem_pool from port_hash interface.
Memory pool is not used by port_hash and it was a mistake to pass it into
'add' and 'remove' functions.  port_hash enrties are allocated from heap.
2017-10-04 14:57:56 +03:00
Max Romanov
838d9946ac Introducing named port message handlers to avoid misprints. 2017-09-15 20:30:34 +03:00
Igor Sysoev
58907888e5 Style fixes. 2017-09-06 02:30:55 +03:00
Igor Sysoev
f0e9e3ace9 nginext has been renamed to unit. 2017-08-31 00:42:16 +03:00
Igor Sysoev
9d487df10d The master process has been renamed to the main process. 2017-08-29 02:59:35 +03:00
Igor Sysoev
9aaa7d8c20 Added configure option --user=USER and --group=GROUP. 2017-08-26 13:37:44 +03:00
Max Romanov
f23f985899 Runtime processes protected with mutex. 2017-08-02 13:22:07 +03:00
Max Romanov
8ad2c3fd3a Work queue thread assertions. Reset thread after fork. 2017-07-18 00:21:17 +03:00
Max Romanov
803855138c Mem pool cleanup introduced.
Used for connection mem pool cleanup, which can be used by buffers.
Used for port mem pool to safely destroy linked process.
2017-07-18 00:21:16 +03:00
Max Romanov
eb675f2d78 Port allocation and destroy changed. Worker process stop introduced. 2017-07-18 00:21:14 +03:00
Max Romanov
b0c1e740cf New process port exchange changed. READY message type introduced.
Application process start request DATA message from router to master.
Master notifies router via NEW_PORT message after worker process become ready.
2017-07-12 20:32:16 +03:00