Background jobs with php and resque: part 4, managing worker

This guide is intended for Linux and OS X users. Windows users will have to adapt some code to make them works.

Understanding the internal works

Technically, a worker is a PHP process that will run indefinitely, always monitoring for new jobs to execute.

Pseudo-code of a worker’s internal:

while (true) {
    $jobs = pullData(); # Pull jobs from the queues

    foreach ($jobs as $class => $args) { # For each jobs found
        $job = new $class();
        $job->perform($args); # Execute them
    }
    sleep(300); # Then Sleep for 5 minutes (300 seconds), and retry
}

All this stuff is already taken care by php-resque. In order to create a worker, php-resque will need :

  • QUEUE : The name of the queues to poll
  • INTERVAL : The polling interval (waiting time in seconds between each polling). Default to 5 seconds.
  • APP_INCLUDE : Path to your application autoloader. Workers need to know where to find your jobs classes
  • COUNT : Number of workers to create. All workers will have the same properties. Will always create one worker by default.
  • REDIS_BACKEND : Address of your Redis server, formatted as hostname:port. e.g: 127.0.0.1:6379, or localhost:6379. Default to localhost:6379
  • VERBOSE : 1 to enable verbose, will print basic debugging informations
  • VVERBOSE : 1 to enable advanced verbose, will print detailed debugging informations

The only mandatory parameter is QUEUE. To make a worker monitor multiple queues, separate the names with a comma, e.g. : “achievement,notification”. The order matters, and achievement will always be polled before notification. All the jobs in achievement will be executed before checking the notification queue.

Setting QUEUE to * will poll all the queues, in alphabetical order.

Workers have to be started in the CLI. You can’t ceate worker through a browser, because :

  1. You can’t do background jobs through a browser
  2. PCNTL extension works only in CLI mode

Starting a worker

Workers can be created by running the file resque.php, located inside the php-resque folder.

In your terminal, navigate to your php-resque folder

cd /path/to/php-resque/

then run:

php resque.php

Obviously, this will not work, as the mandatory QUEUE parameter is missing. The code above will return the following error:

Set QUEUE env var containing the list of queues to work.

Instead of passing the parameters to PHP, we pass them to the environment, and php will get them back with getenv.

The correct command to start a worker will then be:

QUEUE=notification php resque.php

If VVERBOSE is enabled:

QUEUE=notification VVERBOSE=1 php resque.php

The terminal will output:

*** Starting worker KAMISAMA-MAC.local:84499:notification
** [23:48:18 2012-10-11] Registered signals
** [23:48:18 2012-10-11] Checking achievement
** [23:48:18 2012-10-11] Checking notification
** [23:48:18 2012-10-11] Sleeping for 5
** [23:48:23 2012-10-11] Checking achievement
** [23:48:23 2012-10-11] Checking notification
** [23:48:23 2012-10-11] Sleeping for 5
... etc ...

The worker will automatically be named KAMISAMA-MAC.local:84499:notification. It follows the conventions hostname:process-id:queue-names.

Why not directly pass the parameters to PHP ?

Answer #1

Because Resque is built that way. And php-resque is a port of Resque.

Answer #2

Because it let us define the parameters we only want, in the order we want.

If PHP was reading the parameters, he will parse them in a defined order, like regular function parameters (function($QUEUE, $INTERVAL, $COUNT)). You will then be unable to set $COUNT without setting $INTERVAL beforehand. There’s still a way to do so, but it’ll require some extra code. When processing hundred of jobs per second, there’s no room for extra code.

Daemon-ify the workers

It’s great, your workers are running … as long as you keep your terminal window open. As you see, the php command hogs up the whole terminal window. When you stop the script, either by closing the window or with Ctlr+C, the worker stops too. We’ll need to bring out the as-old-as-the-world trick : append a & at the end.

QUEUE=notification php resque.php &

Above command will daemon-ify the php command, and liberate your terminal. But if you were using verbose mode, all the output will be lost. We’ll need to save that output somewhere before daemonizing the worker.

It’ll also be nice to thrown in a nohup, enabling the command to keep running even if the user logged out.

nohup QUEUE=notification php resque.php &

Logging worker output

Let’s pipe the output to a file before daemonizing:

nohup QUEUE=notification php resque.php >> /path/to/your/logfile.log 2>&1 &

With this, all the standard and error output will be written to logfile.log. To monitor the file content:

tail -F /path/to/your/logfile.log

Worker permissions

Whatever you run in the terminal run as the current logged user. If you’re logged as mathieu, the php resque.php will run under mathieu’s permission. The same apply if you’re logged as root.

To avoid permissions issues, always create your worker under your webserver’s user. Apache is usually running under www-data. To start your workers with www-data:

nohup sudo -u www-data QUEUE=notification php resque.php >> /path/to/your/logfile.log 2>&1 &

Possible permissions issues:

  • All files created by your workers (running under user A) can’t be read by the rest of you php code (running under user B).
  • Workers don’t have permissions to create or edit files created by the rest of your application.

Let’s play

Let’s finish this section with various examples.

For simplification purpose, the permissions, file logging and daemonization part sudo nohup -u USER, >> /path/to/your/logfile.log 2>&1 & are dropped from the following examples.

To create a worker polling the default queue each 10 seconds:

INTERVAL=10 QUEUE=default php resque.php

To create 5 workers polling the default queue each 5 seconds:

QUEUE=default COUNT=5 php resque.php

INTERVAL parameter is not needed, as 5 seconds is already the default value.

To create a worker polling the queues achievement and notification:

QUEUE=achievement,notification php resque.php

Remember, queue name order defines their priority. achievement will always be checked before notification.

To create a worker polling all the existing queues, in alphabetical order:

QUEUE=* php resque.php

If your Redis is located at a different address:

QUEUE=default REDIS_BACKEND=192.168.1.56:6380 php resque.php

To pass your application autoloader:

QUEUE=default APP_INCLUDE=/path/to/autoloader.php php resque.php

Autoloader importance will be developed later.

Tip You can log each php resque.php output to a different log file.

Ensure your workers were created successfully

Since you’re piping the output, you’ll not know immediately if your worker creation failed. A way to verify that is to monitor your log file: a successful worker creation should output *** Starting worker YOURHOSTNAME:PID:queuename.

You can also check your system process. It’ll list all your active workers:

ps u | grep resque.php

Output looks like:

your-user 86681   0.0  0.1  2470332   4712 s002  S     2:03AM   0:00.01 php ./resque.php
your-user 86680   0.0  0.1  2470332   4712 s002  S     2:03AM   0:00.01 php ./resque.php

Second column is the worker PID.

It’s a good way to check if some of your workers died at some time.

Sometimes, you could see more process than you have workers. This happens only when you have the pcntl extension. It means that the worker forks another process to execute the job.

Forking

On certain platforms, when a Resque worker reserves a job it immediately forks a child process. The child processes the job then exits. When the child has exited successfully, the worker reserves another job and repeats the process.

Why?

Because Resque assumes chaos.

Resque assumes your background workers will lock up, run too long, or have unwanted memory growth.

If Resque workers processed jobs themselves, it’d be hard to whip them into shape. Let’s say one is using too much memory: you send it a signal that says “shutdown after you finish processing the current job,” and it does so. It then starts up again – loading your entire application environment. This adds useless CPU cycles and causes a delay in queue processing.

Plus, what if it’s using too much memory and has stopped responding to signals?

Thanks to Resque’s parent / child architecture, jobs that use too much memory release that memory upon completion. No unwanted growth.

And what if a job is running too long? You’d need to kill -9 it then start the worker again. With Resque’s parent / child architecture you can tell the parent to forcefully kill the child then immediately start processing more jobs. No startup delay or wasted cycles.

The parent / child architecture helps us keep tabs on what workers are doing, too. By eliminating the need to kill -9 workers we can have parents remove themselves from the global listing of workers. If we just ruthlessly killed workers, we’d need a separate watchdog process to add and remove them to the global listing – which becomes complicated.

Workers instead handle their own state.

https://github.com/defunkt/resque#forking

Pausing and stopping a worker

To stop a worker, simply kill his process. You can find the process id (PID) in the worker name (YOURHOSTNAME:PID:queuename). You can also get your workers PID by running ps u | grep resque.php. This method will not tell you which process correspond to which worker though.

To kill the worker with PID 86681, we’ll use

kill 86681

This command will immediately kill all the child then exit. If a worker were in the middle of processing a job, it’ll not wait for the job to finish, and the job will be marked as failed.

There’s another way to stop worker smoothly, and involves sending a SIGSPEC signal along the kill command. This will require the pcnlt extension.

All the following commands requires the pcntl extension.

Worker can understand signal, with pcntl extension. Available signals are:

  • QUIT – Wait for child to finish processing then exit
  • TERM / INT – Immediately kill child then exit
  • USR1 – Immediately kill child but don’t exit
  • USR2 – Pause worker, no new jobs will be processed
  • CONT – Resume worker.

TERM / INT signal is used by default when no signal is sent.

We can get a list of signal the system understand by running kill -l. That’ll return:

 1) SIGHUP  2) SIGINT   3) SIGQUIT  4) SIGILL
 5) SIGTRAP  6) SIGABRT  7) SIGEMT   8) SIGFPE
 9) SIGKILL 10) SIGBUS  11) SIGSEGV 12) SIGSYS
13) SIGPIPE 14) SIGALRM 15) SIGTERM 16) SIGURG
17) SIGSTOP 18) SIGTSTP 19) SIGCONT 20) SIGCHLD
21) SIGTTIN 22) SIGTTOU 23) SIGIO   24) SIGXCPU
25) SIGXFSZ 26) SIGVTALRM   27) SIGPROF 28) SIGWINCH
29) SIGINFO 30) SIGUSR1 31) SIGUSR2 

We can either use the signal number, or its name.

Soft stopping workers

To wait for all the jobs in the middle of processing to finish, before stopping the worker, use QUIT signal.

kill -QUIT YOUR-WORKER-PID

We can also use the corresponding signal number:

kill -3 YOUR-WORKER-PID

Kill only the child, but keep the worker

kill -USR1 YOUR-WORKER-PID # or kill -30 YOUR-WORKER-PID

Pause worker

kill -USR2 YOUR-WORKER-PID # or kill -31 YOUR-WORKER-PID

Resume a paused worker

kill -CONT YOUR-WORKER-PID # or kill -19 YOUR-WORKER-PID

Load distribution

You don’t necessary have to queue and execute jobs on the same server. You can perfectly queue jobs from a server A, and have the worker from a server B executing them. You’ll need to install php-resque and all your jobs classes on server B.

Next Time …

Now that you know how to start and stop a worker, in part 5, we’ll talk about jobs, and how to write job classes your workers can understand.