Genetic algorithm with VBA

For more than a couple of years I’ve been dealing with a nightmare-like problem that I could not get solved with Excel.

The solution I was working with implies choosing between nearly infinite combinatorial possibilities. The problem in question has a lot of dependencies in a variable number of variables you can set, is an open problem in its definition, and I was looking for the optimal one. You bet I will never achieve it. I was also stuck wandering if I could face situations where several options were possible; and I couldn’t discard that as not pausible.

I considered GA as an alternative, but never put the enough time on studying the possibilities, considering I have not -or do not know, that is the same- a tool for GA in VBA. Times I spend  a bit more learning GA in YouTube or in blogs did not scratch the surface, and the only XLS file I played with was very basic, and came from this old post from Paras Chopra blog, and another one from Dermont Balson’s InsaneExcel death blog (under Artificial Intelligence).

But none of them seemed enough to get in the thing… fortunately, I found two posts in spanish that put a lot more light into it (1 and 2).

Genetic Algorithms is a technique that simulates the logic of Darwinian selection (in ‘The Origin of Species’, Darwin stated that from a group of individuals the best will survive) process. Kinda evolution.

Namely, a “simulated” population of chromosomes is randomly created and allowed to reproduce according to the laws of evolution with the hope that a very fit individual chromosome will eventually result. These chromosomes are actually (encrypted) candidate solutions to a problem, so when a good chromosome evolves, a good solution to a problem has evolved.

It tries to emulate how populations accumulate differences over time due to the environmental conditions acting as a selective breeding mechanism. In other words, GA is a algorithm which makes it easy to search a large search space. By implementing the Darwinian selection to the problem, only the best solutions will remain, narrowing the search space as generations evolve.

GAs can be used where optimization is needed, that is where there are large solutions to the problem but we have to find the best one(s). But they also have some disadvantages:

  • GAs are very slow. The efficiency of a gradient-based solver over a GA solver in some problems is notorious.
  • They cannot always find the exact solution (but they iterate to the best solution).

Some important terms to consider:

  • Chromosome: A set of genes. Chromosome contains the solution in form of genes.
  • Gene: A part of chromosome. As the basic item of information, they contains a part of the solution.
  • Individual/Genotype: Same as chromosome.
  • Population: Nº of individuals present with “same length of chromosome”.
  • Fitness: Fitness is the value assigned to an individual. It is based on how far or close a individual is from the solution. Greater the fitness value better the solution expressed.
  • Fitness function: Fitness function is a function which assigns fitness value to the individual. It is problem specific.
  • Breeding: Taking two fit individuals and intermingling their chromosome to create new individuals.
  • Mutation: Changing a random gene in an individual.
  • Selection: Selecting individuals for creating the next generation.
How Do GAs Work?

First a random population of chromosomes are created, and tested against the fitness of each one with a special fitness function.

Although all the initial fitness values are probably very low (since the algorithm randomly created them), some are still higher than others. These chromosomes with higher fitness values are given a correspondingly higher probability to reproduce in the next generation. Thus for each generation in the evolution, a fitness function simulates natural selection by pushing the population in the direction of improved fitness (much like how natural selection nudges a biological species toward improved fitness with regards to its environment).

As a general convention, reproduction in the GA is diploid (sexual), meaning that when two chromosomes are mated, pieces (genes) of each are combined in the offspring. Thus chromosomes recombine their parts at random (two point crossover) gene locations, occasionally producing a fitter chromosomes. This improvement to the gene pool is amplified by a corresponding increase in that fitter chromosome’s probability to reproduce.

While there is no guarantee that genetic algorithms will find the optimal or “best” solution to a complex problem, they have been successfully applied to a wide variety of theoretical problems such as optimizing information networks because they iterate to better solutions. The trick, if one exists, is to write a fitness function that can successfully guide the evolutionary process.

General Algorithm of GA

The algorithm is almost same in most applications, and only fitness functions are different to different problems. The general algorithm is as follows :

START

Generate initial population.
Assign fitness function to all individuals.

DO UNTIL best solution is found

Select individuals from current generation
Create new offsprings with mutation and/or breeding
Compute new fitness for all individuals
Kill all the unfit individuals to give space to new offsprings
Check if best solution is found

LOOP

END

Here comes very handy the code development made by Paras, that I post here with light modifications:

    1. Define an individualPublic Type Individual
      Genome() As Integer ‘Genome which holds information
      Fitness As Double ‘Fitness of Individual
      MadeBy As String ‘ Made By
      End Type
      Here Genome is an array of integers. The algorithm uses integers instead of binary numbers. Integers are easy to handle and probably more efficient in GAs. It’s fitness contains the fitness value. And MadeBy holds the information about the process from which it is made.
    2. Define populationPublic Type Population
      NumOfIndivid As Integer ‘Number of individuals
      Individuals() As Individual ‘Individuals
      Parents() As Individual ‘Parents
      MaxFitness As Double ‘Maximum fitness
      NotifyWhenFitExceeds As Double ‘Notify the user if fitness exceeds this value
      GenomeLen As Integer ‘Length of Genome
      DiedIndivid() As Integer ‘Individuals who have died coz they had low fitness value
      NoOfDied As Integer ‘Present number of died individuals
      ProbMut As Double ‘Mutation probability per cent
      ProbCross As Double ‘CrossOver probability per cent
      StopEvolution As Boolean ‘ Check if we have to stop evolution
      Generation As Integer ‘ Generation number
      FitLim As Long ‘Fitness limit below which all will be killed
      BestSoFar As Individual ‘ Best Individual so far
      WorstSoFar As Individual ‘ Worst Individual so far
      End Type
    3. Build the population

Sub BuildPopulation(Population As Integer, LenghtOfGenome As Integer, MaxFit As Double, NotifyExceed As Double, Mutation As Double, CrossOver As Double)

‘Popu’ is the number of individuals. ‘LenghtOfGenome’ is the length of chromosome ‘MaxFit’ is the maximum fitness which ca ‘ n be acquired by an individual. It is ge ‘ nerally 100. ‘NotifyExceed’ is the range which tells the algorithm to notify the user when fitness of any individual goes beyond this level. It is generally 99. ‘Mutation’ and ‘CrossOver’ are probabilities.

    1. Evolve the population

Evolution of the population is defined by following algorithm

DO UNTIL StopEvolution = True

Assign Fitness to each individual
Notify the user if a solution is found
Kill all the worst individuals

IF less then 30% of population is dead Or All the population is dead then

Mutate all 33% of the population
Kill all the worst individuals

END IF

Kill all the worst individuals
Select the parents
Start breeding
Mutate a random individual if probability allows.

LOOP

In the above algorithm the algorithm mutates 33% of population if less or all individuals are dying because in both the situations the necessary evolution does not take place. And crossing is done every time because without crossing new generation cannot be made.

    1. Selection of parents

33% of the parents are selected on the basis of their fitness i.e. fitter the parent more the children he will have and another 33% of parents will be selected randomly.

    1. CrossOver

Take two individuals from the parents list. And then take a random crossover point. Interchange the genes to produce two new individuals. There are some ways to do this:

One-point, two-point, and uniform crossover.
crossover

    1. Mutation

Take any random individual and take a random point. Change the gene on that point with another random value.

The general aspect of a settings panel could be like this:

 

VBA implementation…

Private Type tGA_Settings
    ''Population parameters
    NºChromosomesPopulation As Long '(mod 2 = 0)
    CrossOverProbability As Double '(0.6-0.8)
    CrossOverType As Long '(OnePoint/TwoPoints/Uniform/Random)
    ChromosomesMutationProbability As Double '(0.01-0.02)
    RandomSelectionProbability As Double '(0.1-0.2)
    
    'Constraint parameters
    ConstraintPenalty As Long '(Fixed/Linear/Exponential)
    AbsoluteConstraintTolerance As Double
    
    'Run parameters
    MaxNumberGenerations As Long
    ConvergenceTolerance As Double
    NumericPrecision As Long
    
    'Preliminary run (optional) parameters
    NumberPreliminaryRuns As Long
    MaxNumberGenerationsPerRun As Long
End Type

Private Type tGA_Variable
    Min As Double
    Max As Double
    Evolution As Long 'eGA_Evolution: Fixed/Linear/Exponential
End Type

Private Type tGA_Constraint
    Logical As Long 'eGA_Constraint: Less_Equal_Greater
    Value As Double
End Type

Private Type tGA_Solver
    FunctionOptimization As Long 'eFunctionOptimization: Minimize/Maximize
    EvalFunction As String
    Variable() As tGA_Variable
    Constraint() As tGA_Constraint
    Settings As tGA_Settings
End Type

Some more code to come as it gets done…

Other References

[1] Goldberg, D. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley

[2] Anonymous (2005). “GAlib, A C++ Library of Genetic Algorithm Components”. At: http://lancet.mit.edu/ga/

[3] Soft Tech (2002). “GA Delphi Class Library”. Acompaning article

[4] Charbonneau, P., Knapp, B. (1995). “Users guide to PIKAIA 1.0”. NCAR Technical note TN-418-IA

Leave a Reply

Your email address will not be published. Required fields are marked *