Let's Write a Genetic Algorithm


I’ll be assuming that you know what these terms mean:

  • Cost/objective function
  • Local/global optima

If you don’t, that’s what Wikipedia and weekends are for.

What Is It?

Genetic algorithms are an optimization technique loosely based on the ideas of evolution and mutation. They’re a metaheuristic, and can used as a generic framework for solving all sorts of problems.

They’re also stupid simple, and I’ll try to prove that to you in this post.

The goal is to find the optimal solution to a problem, or one that’s close to it.

I’ll be using a lot of terminology, but it’ll all be defined properly later. This section is just to get the basic idea.

To start, we generate a bunch of candidate solutions to the problem. Each candidate is called an individual in the spirit of biology. The set of all individuals is the population.

Every iteration of our algorithm (a.k.a each generation), we evaluate the fitness of each individual and keep some amount of the fittest individuals. The rest are culled, except for a few that we randomly select to introduce some diversity that will help us find the global optima.

We create the next generation by breeding the survivors to create children.

Our next big inspiration from biology is mutation. Some fraction of the children are mutated at birth to introduce more diversity in a population.

We keep on doing this until the average fitness of the population has converged to something or we’re bored.

Why Should You Care?

This technique is worth learning if you care about optimizing things, because:

  • Can be run in parallel
  • Easily distributed
  • Doesn’t require fancy tuning for the problem at hand
  • Derivative-free, i.e. can be used for problems with lots of discontinuities
  • Generic, easily adapted to new problems with minimal tweaking

How To Implement?

Terminology

We start here, since a lot of the barrier for learning these (for me) was the unfamiliar terminology.

Individual
A candidate solution for the problem.
Population
A set of individuals.
Fitness Function

A function that takes in an individual and returns a real number representing how good a solution the individual is.

This is is usually maximized, but we’ll be minimizing it. So “fitness” is a misnomer, and it’s really a cost function as we’ll be using it.

The Process

Lots of the following steps are independent of the specific problem you’re trying to solve, and I’ll mark the ones that are generic to illustrate that.

To make this more concrete, we’ll be trying to optimize few benchmark functions.

I picked the Rastrigin function because it was first on the list and was easy to type in Python.

We optimize in the interval between \([-5.12,5.12]\) because Wikipedia told me to.

def rastrigin(x, A=10):
    return 10 * len(x) + sum((i ** 2 - A * np.cos(2 * np.pi * i)) for i in x)


def sphere(x):
    return sum(i ** 2 for i in x)


def booth_fn(x):
    return (x[0] + 2 * x[1] - 7) ** 2 + (2 * x[0] + x[1] - 5) ** 2


f = rastrigin

Define the Individual

This is problem-specific. It should be a candidate solution to your problem.

In our case, we just want points in \(N\)-dimensional space, so we just use an array of length \(N\). We’re just going to work in 1-dimensional space to make the computation really fast, but feel free to try it in higher dimensions.

def individual(bound: float=BOUND) -> Individual:
    return np.random.uniform(low=-bound, high=bound, size=N)

Define Fitness Function

We’ll be calling this fitness instead of loss for the sake of consistency, but we’re going to be minimizing it.

In our case this is simple. It’s just the objective function we’re trying to minimize, the Rastrigin function.

def rastrigin(x, A=10):
    return 10 * len(x) + sum((i ** 2 - A * np.cos(2 * np.pi * i)) for i in x)


f = rastrigin
def fitness(individual: Individual) -> Real:
    return f(individual)

Create Population

This is barely problem-specific. You just create some number of individuals. I feel like you could write a more abstract function that makes this generic, but it’s easy to do yourself.

def create_population() -> Population:

    return [individual() for _ in range(POPULATION_SIZE)]

Find Average Fitness Of Population (generic)

This is literally just the mean fitness over the population.

def avg_fitness(population):
    return sum(fitness(individual) for individual in population) / len(population)

Cull (generic)

Now we get to the law of the jungle. Survival of the fittest. Or rather, death of the un-fittest.

We sort our population by fitness and keep SURVIVAL_RATE percent of them. A few lucky individuals also get spared for the sake of genetic diversity.

def cull(population) -> Population:

    population.sort(key=fitness)
    split_idx = ceil(SURVIVAL_RATE * len(population))

    # they survive
    fittest = population[:split_idx]
    reprieved = random.sample(
        population[split_idx:], floor(REPRIEVE_RATE * len(population[split_idx:])))

    return fittest + reprieved

Breed and Mutate

Breed

The fun (and tricky part): how to combine parents to get a child.

This is extremely problem-specific, and clever breeding (and mutation) will make the difference between success and failure. Figuring out good strategies comes with practice and understanding the problem domain.

I tried 4 different strategies and the one that worked best was not the first one I thought of.

But for us, we’re going to us a really simple strategy. We just pick the fittest individual out of a random sample and copy it.

Mutate

Mutation is also problem-specific, and as important as breeding. In our case, mutation will be the main method of solving the problem. We’ll perturb each array by some number EPSILON, and set the mutation rate high so that this happens often.

We use np.clip to avoid escaping the area we’re optimizing over.

def breed(parents: Population, mutation_rate: float=MUTATION_RATE) -> Individual:

    child = min(
        random.sample(parents, NUM_PARENTS),
        key=fitness, )

    if random.random() < MUTATION_RATE:
        child += np.clip(
            np.random.uniform(low=-EPSILON, high=EPSILON, size=N),
            a_min=-BOUND,
            a_max=BOUND, )

    return child

Repeat (generic)

Now it all comes together. We repeatedly cull and breed until our average fitness is close enough to optimal or we decide to stop after some number of iterations.

The key line is population = create_new_generation(cull(population)), where we cull and then create a new generation from the survivors.

def create_new_generation(population) -> Population:

    num_children = POPULATION_SIZE - len(population)
    num_parents = min(len(population), NUM_PARENTS)

    population += [breed(random.sample(population, k=num_parents)) for _ in range(num_parents)]
    return population


def evolve():
    population = create_population()
    FITNESS = avg_fitness(population)

    iter = 0

    while FITNESS > 1e-7 and iter < MAX_ITERS:
        population = create_new_generation(cull(population))
        FITNESS = avg_fitness(population)

        iter += 1

        if iter % 1000 == 0:
            best = min(population, key=fitness)
            print('--------------------------------------------------------------')
            print(f'FITNESS = {FITNESS}, iter = {iter}, BEST = {fitness(best)}')


    print(f'iter = {iter}')
    best = min(population, key=fitness)
    return best

How Well Does It Do?

In the case of our problem setup, it tends to do fairly well. Often it gets stuck at a local minima of about .995, but equally often it finds the global minima in anywhere from 2 (!!) to a few thousand iterations.

Tweaking hyperparameters

You sort of have to play around with parameters like population size, survival rate, mutation rate, and reprieve rate.

You could probably write another genetic algorithm to tweak the hyperparameters of your genetic algorithm. I’ll leave that as an exercise.

Takeaway

Hopefully genetic algorithms will stick in your head long enough for you to find a use for them. When that day comes, read this post again. Soon enough, you’ll be able to make genetic algorithms a part of your toolbox.

The Code

Here’s the entire source as a Gist.

Related Posts

Just because 2 things are dual, doesn't mean they're just opposites

Boolean Algebra, Arithmetic POV

discontinuous linear functions

Continuous vs Bounded

Minimal Surfaces

November 2, 2023

NTK reparametrization

Kate from Vancouver, please email me

ChatGPT Session: Emotions, Etymology, Hyperfiniteness

Some ChatGPT Sessions