Commit Graph

64 Commits

Author SHA1 Message Date
Andrew Clayton
45c45eaeb4 Add per-application logging.
Currently when running in the foreground, unit application processes
will send stdout to the current TTY and stderr to the unit log file.

That behaviour won't change.

When running as a daemon, unit application processes will send stdout to
/dev/null and stderr to the unit log file.

This commit allows to alter the latter case of unit running as a daemon,
by allowing applications to redirect stdout and/or stderr to specific
log files. This is done via two new application options, 'stdout' &
'stderr', e.g

  "applications": {
      "myapp": {
          ...
          "stdout": "/path/to/log/unit/app/stdout.log",
          "stderr": "/path/to/log/unit/app/stderr.log"
      }
  }

These log files are created by the application processes themselves and
thus the log directories need to be writable by the user (and or group)
of the application processes.

E.g

  $ sudo mkdir -p /path/to/log/unit/app
  $ sudo chown APP_USER /path/to/log/unit/app

These need to be setup before starting unit with the above config.

Currently these log files do not participate in log-file rotation
(SIGUSR1), that may change in a future commit. In the meantime these
logs can be rotated using the traditional copy/truncate method.

NOTE:

You may or may not see stuff printed to stdout as stdout was
traditionally used by CGI applications to communicate with the
webserver.

Closes: <https://github.com/nginx/unit/issues/197>
Closes: <https://github.com/nginx/unit/issues/846>
Reviewed-by: Alejandro Colomar <alx@nginx.com>
Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
2023-04-11 19:08:32 +01:00
Andrew Clayton
8f0192bb4c Remove some dormant code from nxt_process_quit().
In nxt_process_quit() there is a loop that iterates over the
task->thread->runtime->listen_sockets array and closes the connections.

This code has been there from the beginning

  $ git log --pretty=oneline -S'if (rt->listen_sockets != NULL)'
  e9e5ddd5a5 Refactor of process management.
  6f2c9acd18 Processes refactoring. The cycle has been renamed to the runtime.
  $ git log --pretty=oneline -S'if (cycle->listen_sockets != NULL) {'
  6f2c9acd18 Processes refactoring. The cycle has been renamed to the runtime.
  16cbf3c076 Initial version.

but never seems to have been used (AFAICT and certainly not recently,
confirmed by code inspection and running pytests with a bunch of
language modules enabled and the code in question was never executed) as
the listen_sockets array has never been populated... until now.

The previous commit now adds Unix domain sockets to this array so that
they can be unlink(2)'d upon exit and reconfiguration.

This has now caused this dormant code to become active as it now tries
to close these sockets (from at least the prototype processes), this
array is inherited via fork by other processes.

The file descriptor for these sockets is set to -1 when they are put
into this array. This then results in close(-1) calls which caused
multiple failures in the pytests such as

  >       assert not alerts, 'alert(s)'
  E       AssertionError: alert(s)
  E       assert not ['2023/03/09 23:26:14 [alert] 137673#137673 socket close(-1) failed (9: Bad file descriptor)']

I think the simplest thing is to just remove this code.

Reviewed-by: Alejandro Colomar <alx@nginx.com>
Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
2023-03-17 04:03:56 +00:00
Andrew Clayton
5ed6eae718 Set a safer umask(2) when running as a daemon.
When running as a daemon. unit currently sets umask(0), i.e no umask.
This is resulting in various directories being created with a mode of
0777, e.g

  rwxrwxrwx

this is currently affecting cgroup and rootfs directories, which are
being created with a mode of 0777, and when running as a daemon as there
is no umask to restrict the permissions.

This also affects the language modules (the umask is inherited over
fork(2)) whereby unless something explicitly sets a umask, files and
directories will be created with full permissions, 0666 (rw-rw-rw-)/
0777 (rwxrwxrwx) respectively.

This could be an unwitting security issue.

My original idea was to just remove the umask(0) call and thus inherit
the umask from the executing shell/program.

However there was some concern about just inheriting whatever umask was
in effect.

Alex suggested that rather than simply removing the umask(0) call we
change it to a value of 022 (which is a common default), which will
result in directories and files with permissions at most of 0755
(rwxr-xr-x) & 0644 (rw-r--r--).

If applications need some other umask set, they can (as they always have
been able to) set their own umask(2).

Suggested-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Liam Crilly <liam@nginx.com>
Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
2023-02-24 15:48:15 +00:00
Andrew Clayton
96f33c0395 Remove the nxt_getpid() alias.
Since the previous commit, nxt_getpid() is only ever aliased to
getpid(2).

nxt_getpid() was only used once in the code, while there are multiple
direct uses of getpid(2)

  $ grep -r "getpid()" src/
  src/nxt_unit.c:    nxt_unit_pid = getpid();
  src/nxt_process.c:    nxt_pid = nxt_getpid();
  src/nxt_process.c:    nxt_pid = getpid();
  src/nxt_lib.c:    nxt_pid = getpid();
  src/nxt_process.h:#define nxt_getpid()                                                          \
  src/nxt_process.h:#define nxt_getpid()                                                          \
  src/nxt_process.h:    getpid()

Just remove it and convert the _single_ instance of nxt_getpid() to
getpid(2).

Reviewed-by: Alejandro Colomar <alx@nginx.com>
Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
2023-02-17 21:24:18 +00:00
Andrew Clayton
b0e2d9d0a1 Isolation: Switch to fork(2) & unshare(2) on Linux.
On GitHub, @razvanphp & @hbernaciak both reported issues running the
APCu PHP module under Unit.

When using this module they were seeing errors like

  'apcu_fetch(): Failed to acquire read lock'

However when running APCu under php-fpm, everything was fine.

The issue turned out to be due to our use of SYS_clone breaking the
pthreads(7) API used by APCu.  Even if we had been using glibc's
clone(2) wrapper we would still have run into problems due to a known
issue there.

Essentially the problem is when using clone, glibc doesn't update the
TID cache, so the child ends up having the same TID as the parent and
that is used in various parts of pthreads(7) such as in the various
locking primitives, so when APCu was grabbing a lock it ended up using
the TID of the main unit process (rather than that of the php
application processes that was grabbing the lock).

So due to the above what was happening was when one of the application
processes went to grab either a read or write lock, the lock was
actually being attributed to the main unit process.  If a process had
acquired the write lock, then if a process tried to acquire a read or
write lock then glibc would return EDEADLK due to detecting a deadlock
situation due to thinking the process already held the write lock when
in fact it didn't.

It seems the right way to do this is via fork(2) and unshare(2).  We
already use fork(2) on other platforms.

This requires a few tricks to keep the essence of the processes the same
as before when using clone

  1) We use the prctl(2) PR_SET_CHILD_SUBREAPER option (if its
     available, since Linux 3.4) to make the main unit process inherit
     prototype processes after a double fork(2), rather than them being
     reparented to 'init'.

     This avoids needing to ^C twice to fully exit unit when running in
     the foreground.  It's probably also better if they maintain their
     parent child relationship where possible.

  2) We use a double fork(2) technique on the prototype processes to
     ensure they themselves end up in a new PID namespace as PID 1 (when
     CLONE_NEWPID is being used).

     When using unshare(CLONE_NEWPID), the calling process is _not_
     placed in the namespace (as discussed in pid_namespaces(7)).  It
     only sets things up so that subsequent children are placed in a PID
     namespace.

     Having the prototype processes as PID 1 in the new PID namespace is
     probably a good thing and matches the behaviour of clone(2).  Also,
     some isolation tests break if the prototype process is not PID 1.

  3) Due to the above double fork(2) the main unit process looses track
     of the prototype process ID, which it needs to know.

     To solve this, we employ a simple pipe(2) between the main unit and
     prototype processes and pass the prototype grandchild PID from the
     parent of the second fork(2) before exiting.  This needs to be done
     from the parent and not the grandchild, as the grandchild will see
     itself having a PID of 1 while the main process needs its
     externally visible PID.

Link: <https://www.php.net/manual/en/book.apcu.php>
Link: <https://sourceware.org/bugzilla/show_bug.cgi?id=21793>
Closes: <https://github.com/nginx/unit/issues/694>
Reviewed-by: Alejandro Colomar <alx@nginx.com>
Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
2023-02-17 21:24:18 +00:00
Andrew Clayton
3ecdd2c69c Isolation: Rename NXT_HAVE_CLONE -> NXT_HAVE_LINUX_NS.
Due to the need to replace our use of clone/__NR_clone on Linux with
fork(2)/unshare(2) for enabling Linux namespaces(7) to keep the
pthreads(7) API working.  Let's rename NXT_HAVE_CLONE to
NXT_HAVE_LINUX_NS, i.e name it after the feature, not how it's
implemented, then in future if we change how we do namespaces again we
don't have to rename this.

Reviewed-by: Alejandro Colomar <alx@nginx.com>
Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
2023-02-17 21:24:18 +00:00
Andrew Clayton
867a839f10 Isolation: wired up per-application cgroup support internally.
This commit hooks into the cgroup infrastructure added in the previous
commit to create per-application cgroups.

It does this by adding each "prototype process" into its own cgroup,
then each child process inherits its parents cgroup.

If we fail to create a cgroup we simply fail the process. This behaviour
may get enhanced in the future.

This won't actually do anything yet. Subsequent commits will hook this
up to the build and config systems.

Reviewed-by: Alejandro Colomar <alx@nginx.com>
Signed-off-by: Andrew Clayton <a.clayton@nginx.com>
2022-12-10 14:00:20 +00:00
Max Romanov
900828cc4b Fixing isolated process PID manipulation.
Registering an isolated PID in the global PID hash is wrong
because it can be duplicated.  Isolated processes are stored only
in the children list until the response for the WHOAMI message is
processed and the global PID is discovered.

To remove isolated siblings, a pointer to the children list is
introduced in the nxt_process_init_t struct.

This closes #633 issue on GitHub.
2022-08-11 13:33:46 +01:00
Alejandro Colomar
c8d9106a0d Removed code used when NXT_HAVE_POSIX_SPAWN is false.
posix_spawn(3POSIX) was introduced by POSIX.1d
(IEEE Std 1003.1d-1999), and was later consolidated in
POSIX.1-2001, requiring it in all POSIX-compliant systems.
It's safe to assume it's always available, more than 20 years
after its standardization.

Link: <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/spawn.h.html>
2022-07-18 19:09:30 +02:00
Max Romanov
9e2e69dd58 Fixing alerts on router restart.
Splitting the process type connectivity matrix to 'keep ports' and 'send
ports'; the 'keep ports' matrix is used to clean up unnecessary ports after
forking a new process, and the 'send ports' matrix determines which process
types expect to get created process ports.

Unfortunately, the original single connectivity matrix no longer works because
of an application stop delay caused by prototypes.  Existing applications
should not get the new router port at the moment.
2021-11-24 13:11:48 +03:00
Tiago Natel de Moura
e207415a78 Introducing application prototype processes. 2021-11-09 15:48:44 +03:00
Tiago Natel de Moura
1de660b6df Changed nxt_process_* for reuse.
This enables the reuse of process creation functions.
2021-11-09 15:48:44 +03:00
Max Romanov
80a8cb835b Preserving the app port write socket.
The socket is required for intercontextual communication in multithreaded apps.
2020-10-28 00:01:46 +03:00
Tiago Natel de Moura
a8a7eeb1fc Moved isolation related code to "nxt_isolation.c". 2020-08-20 15:22:58 +01:00
Max Romanov
2f3d27fa22 Process structures refactoring in runtime and libunit.
Generic process-to-process shared memory exchange is no more required.  Here,
it is transformed into a router-to-application pattern.  The outgoing shared
memory segments collection is now the property of the application structure.
The applications connect to the router only, and the process only needs to group
the ports.
2020-08-11 19:20:17 +03:00
Max Romanov
3cbc22a6dc Changing router to application port exchange protocol.
The application process needs to request the port from the router instead of the
latter pushing the port before sending a request to the application.  This is
required to simplify the communication between the router and the application
and to prepare the router to use the application shared port and then the queue.
2020-08-11 19:20:10 +03:00
Max Romanov
ec3389b63b Libunit refactoring: port management.
- Changed the port management callbacks to notifications, which e. g. avoids
the need to call the libunit function
- Added context and library instance reference counts for a safer resource
release
- Added the router main port initialization
2020-08-11 19:19:55 +03:00
Tiago Natel de Moura
f8ba5d6c00 Isolation: fixed build when features aren't detected. 2020-06-23 12:11:27 +01:00
Tiago Natel de Moura
e2b53e16c6 Added "rootfs" feature. 2020-05-28 14:57:41 +01:00
Tiago Natel de Moura
e9e5ddd5a5 Refactor of process management.
The process abstraction has changed to:

  setup(task, process)
  start(task, process_data)
  prefork(task, process, mp)

The prefork() occurs in the main process right before fork.

The file src/nxt_main_process.c is completely free of process
specific logic.

The creation of a process now supports a PROCESS_CREATED state.  The
The setup() function of each process can set its state to either
created or ready.  If created, a MSG_PROCESS_CREATED is sent to main
process, where external setup can be done (required for rootfs under
container).

The core processes (discovery, controller and router) doesn't need
external setup, then they all proceeds to their start() function
straight away.

In the case of applications, the load of the module happens at the
process setup() time and The module's init() function has changed
to be the start() of the process.

The module API has changed to:

  setup(task, process, conf)
  start(task, data)

As a direct benefit of the PROCESS_CREATED message, the clone(2) of
processes using pid namespaces now doesn't need to create a pipe
to make the child block until parent setup uid/gid mappings nor it
needs to receive the child pid.
2020-03-09 16:28:25 +00:00
Max Romanov
58cc13ab29 Resolving a racing condition while adding ports on the app's side.
An earlier attempt (ad6265786871) to resolve this condition on the
router's side added a new issue: the app could get a request before
acquiring a port.
2020-04-10 16:21:58 +03:00
Max Romanov
792ef9d3c7 Fixing 'find & add' racing condition in connected ports hash.
Missing error log messages added.
2020-04-06 16:52:11 +03:00
Tiago Natel
411daeaa53 Isolation: allowed the use of credentials with unpriv userns.
The setuid/setgid syscalls requires root capabilities but if the kernel
supports unprivileged user namespace then the child process has the full
set of capabilities in the new namespace, then we can allow setting "user"
and "group" in such cases (this is a common security use case).

Tests were added to ensure user gets meaningful error messages for
uid/gid mapping misconfigurations.
2019-12-06 16:52:50 +00:00
Tiago Natel
ed2492a66a Moved credential-related code to nxt_credential.c.
This is required to avoid include cycles, as some nxt_clone_* functions
depend on the credential structures, but nxt_process depends on clone
structures.
2019-12-06 13:28:05 +00:00
Tiago Natel
417cc7be7c Refactor of process init.
Introduces the functions nxt_process_init_create() and
nxt_process_init_creds_set().
2019-11-26 16:26:24 +00:00
Tiago Natel
2f23923e44 Changed the group listing to run unprivileged when possible.
Now the nxt_user_groups_get() function uses getgrouplist(3) when available
(except MacOS, see below).  For some platforms, getgrouplist() supports
a method of probing how much groups the user has but the behavior is not
consistent.  The method used here consists of optimistically trying to get up
to min(256, NGROUPS_MAX) groups; only if ngroups returned exceeds the original
value, we do a second call.  This method can block main's process if LDAP/NDIS+
is in use.

MacOS has getgrouplist(3) but it's buggy.  It doesn't update ngroups if the
value passed is smaller than the number of groups the user has.  Some
projects (like Go stdlib) call getgrouplist() in a loop, increasing ngroups
until it exceeds the number of groups user belongs to or fail when a limit
is reached.  For performance reasons, this is to be avoided and MacOS is
handled in the fallback implementation.

The fallback implementation is the old Unit approach.  It saves main's
user groups (getgroups(2)) and then calls initgroups(3) to load application's
groups in main, then does a second getgroups(2) to store the gids and restore
main's groups in the end.  Because of initgroups(3)' call to setgroups(2),
this method requires root capabilities.  In the case of OSX, which has
small NGROUPS_MAX by default (16), it's not possible to restore main's groups
if it's large; if so, this method fallbacks again: user_cred gids aren't
stored, and the worker process calls initgroups() itself and may block for
some time if LDAP/NDIS+ is in use.
2019-11-26 16:15:23 +00:00
Hong Zhi Dao
5d42599e33 Process port refactoring.
- Introduced nxt_runtime_process_port_create().
- Moved nxt_process_use() into nxt_process.c from nxt_runtime.c.
- Renamed nxt_runtime_process_remove_pid() as nxt_runtime_process_remove().
- Some public functions transformed to static.

This closes #327 issue on GitHub.
2019-10-29 16:07:21 +03:00
Tiago Natel
4a79e9631b Added clone syscall check for uid/gid mapping.
Now it's possible to pass -DNXT_HAVE_CLONE=0 for debugging.
2019-10-28 16:02:40 +00:00
Tiago Natel
23b94fde83 Improved error logging when uid/gid map is not properly set.
When using "credential: true", the new namespace starts with a completely
empty uid and gid ranges.  Then, any setuid/setgid/setgroups calls using ids
not properly mapped with uidmap and gidmap fields return EINVAL, meaning
the id is not valid inside the new namespace.
2019-10-22 14:46:15 +00:00
Valentin Bartenev
f2c0f2899a Refactored nxt_process_create() for more explicit pipe closing. 2019-09-26 16:03:02 +03:00
Valentin Bartenev
9c06bfdf2c Fixed descriptors leak on process creation.
The leak has been introduced in 325b315e48c4.
This closes #322 issue in GitHub.
2019-09-26 16:03:01 +03:00
Tiago de Bem Natel de Moura
c554941b4f Initial applications isolation support using Linux namespaces. 2019-09-19 15:25:23 +03:00
Max Romanov
6c694d4b47 Ignoring EPERM error when changing application process uid/gid.
This closes #228 issue on GitHub.
2019-03-22 15:32:58 +03:00
Max Romanov
5ef1352fae Initializing user_cred gids and ngroups for MacOS. 2018-09-19 18:53:16 +03:00
Max Romanov
903ee2de64 Misspelled variable names fixed. 2018-09-07 18:45:14 +03:00
Igor Sysoev
cb36b07686 Removing Unix control socket on start failure.
The bug had appeared in 5cc5002a788e when process type has been
converted to bitmask. This commit reverts the type back to a number.

This commit is related to #131 issue on GitHub.
2018-06-18 17:14:30 +03:00
Igor Sysoev
6273819080 Removed unused single process type. 2018-06-18 17:14:25 +03:00
Valentin Bartenev
912a49c609 Reduced number of critical log levels. 2018-03-05 17:32:50 +03:00
Sergey Kandaurov
771e9d3cc3 Fixed formatting in nxt_sprintf() and logging. 2018-01-24 15:16:33 +03:00
Max Romanov
6bbed85899 Fixing Coverity warnings.
CID 200496
CID 200494
CID 200490
CID 200489
CID 200483
CID 200482
CID 200472
CID 200465
2017-11-20 17:08:29 +03:00
Max Romanov
b3aab8c66f Filtering process to keep connection.
- Main process should be connected to all other processes.
- Controller should be connected to Router.
- Router should be connected to Controller and all Workers.
- Workers should be connected to Router worker thread ports only.

This filtering helps to avoid unnecessary communication and various errors
during massive application workers stop / restart.
2017-10-19 17:37:19 +03:00
Max Romanov
6532e46465 Supporting concurrent shared memory fd receive in router.
Two different router threads may send different requests to single
application worker.  In this case shared memory fds from worker
to router will be send over 2 different router ports.  These fds
will be received and processed by different threads in any order.

This patch made possible to add incoming shared memory segments in
arbitrary order.  Additionally, array and memory pool are no longer
used to store segments because of pool's single threaded nature.

Custom array-like structure nxt_port_mmaps_t introduced.
2017-10-19 17:36:56 +03:00
Max Romanov
e44401a0bb Introducing process use counter.
This helps to decouple process removal from port memory pool cleanups.
2017-10-04 15:02:11 +03:00
Max Romanov
6a64533fa3 Introducing use counters for port and app. Thread safe port write.
Use counter helps to simplify logic around port and application free.

Port 'post' function introduced to simplify post execution of particular
function to original port engine's thread.

Write message queue is protected by mutex which makes port write operation
thread safe.
2017-10-04 14:58:47 +03:00
Max Romanov
ba31199786 Removing mem_pool from port_hash interface.
Memory pool is not used by port_hash and it was a mistake to pass it into
'add' and 'remove' functions.  port_hash enrties are allocated from heap.
2017-10-04 14:57:56 +03:00
Max Romanov
838d9946ac Introducing named port message handlers to avoid misprints. 2017-09-15 20:30:34 +03:00
Igor Sysoev
58907888e5 Style fixes. 2017-09-06 02:30:55 +03:00
Igor Sysoev
f0e9e3ace9 nginext has been renamed to unit. 2017-08-31 00:42:16 +03:00
Igor Sysoev
9d487df10d The master process has been renamed to the main process. 2017-08-29 02:59:35 +03:00
Igor Sysoev
9aaa7d8c20 Added configure option --user=USER and --group=GROUP. 2017-08-26 13:37:44 +03:00