Background jobs with php and resque: part 2, queue system

As said in part 1, a queue is needed to save the jobs. Worker will then poll this queue at a defined interval to execute these jobs.

Queue systeme diagram

This system consist of three parts :

  • The pushers : Push jobs in the queues. Can be anything, even a worker.
  • The queues : Store the jobs in a strict order.
  • The workers : Pull jobs from the queues and execute them.

Note that I used the terms push and pull, instead of add and get.

Push : operation in which a data is placed at the end of a stack

Pull (or pop) : operation in which the data at the start of the stack is returned, then removed.

Pushing will always add the item at the end of the stack. That type of data structure will always ensure that the first element added will be the first element read and removed. That structure is used to implement a FIFO (first-in-first-out) list, or queue.

If Jobs 1, 2, 3 are added in this order, they will be executed in this same order. Logic, but fundamental.

What’s a job ?

A job is an order to execute a certain task. It tells the worker what to do, and place him in the context if needed. The worker doesn’t have a clue of what has already been done before executing the job. If the job is to send an email, that’s the only information the worker will have. He doesn’t know why, in which context, etc … He doesn’t even know who the email will be send to.

If you were executing the send mail function from your main workflow, the sender is the logged user. You already have a lot of information ready to use when reaching the send mail function call.

The worker start from a clean slate, all he has is an order to call send mail. You should pass various informations along the job : send a mail to admin, from user, with the content “email content”.

It’s like when you want new phone cable installed in your house. The phone company will contract an independent entrepreneur to do the work. He doesn’t know why you need new phone cable, he’ll just do the job with the information he’s been given.

Storing jobs

Jobs must be stored in a queue. That storage :

  • must have a native queue data structure (to push and pull jobs)
  • must be very fast
  • can be shared among multiple servers
  • is persistent

You can choose to use a storage fulfilling all these conditions, or use third-party applications like RabbitMQ or ØMQ to manage the queues. First solution is less powerful, but lighter and already fulfilling all our needs.

I’ll use Redis as storage for the queues. For the queue system, I will use php-resque, a port of Resque, originally written in Ruby by github.

Resque (pronounced like “rescue”) is a Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later.

Next time …

Installing php-resque will be in part 3.