July 15, 20259 min read

Shrinking a 10^90 search space: NSGA-II for factory layout with a SimPy twin

Notes from my BSc thesis. What happens when the naive encoding makes the problem impossible, why I pivoted from DEAP to pymoo, and how a two-tier evaluator turned an 800-run budget into 160.

NSGA-IIGenetic AlgorithmsDiscrete Event SimulationpymooSimPy

My BSc Creative Technology graduation project at the University of Twente was a facility layout optimiser for Van den Bos CM, a Dutch manufacturer of corrugated-cardboard transport systems. Their sales engineers currently draft plant layouts by hand in CAD, iterate with the customer over weeks, and ship a configuration with no guarantee it is near-optimal on energy, throughput, or conveyor cost. The brief was to see how far a genetic algorithm and a simulation-in-the-loop could push that process. The repo is private to VDB but the methodology and the numbers below are from the thesis, supervised by Dr. F. A. Bukhsh with MSc N. Bouali as critical observer, in collaboration with the Fraunhofer Innovation Platform for Advanced Manufacturing at UT.

The search-space problem

The first instinct with layout optimisation is to encode each machine as an (x, y, orientation) tuple on a grid. For a 100 m hall on a 1 m grid that gives you a search space of roughly 10^90 configurations. It is not only astronomical — it is full of garbage. Random placement of components produces colliding conveyors, orphaned machines, and infeasible routing in something like 99% of candidates. Early prototype runs confirmed this: the GA spent its entire budget feasibility-checking nonsense layouts.

The pivot that unlocked the rest of the project was to stop encoding coordinates and start encoding decisions. Instead of (x, y, orientation) per component, the chromosome became a 12-integer process-level array: machine_order, wip_type, corridor_slice, turntable_policy, and three speed_tier genes. A deterministic builder then materialises a concrete layout from those decisions using A* with an octile heuristic on a 5 m grid, calibrated to the smallest obstacle in a real VDB hall (a 3 m radius pillar with a 0.5 m safety margin). The search space collapsed to roughly 5^12, and feasibility went to 100% by generation 5.

Why NSGA-II, and why pymoo

The KPIs the business cared about were total conveyor length, path ratio (actual path over Manhattan distance), machine utilisation, throughput in stacks per hour, feasibility rate per generation, and hypervolume of the Pareto front. These are not commensurable. You cannot collapse them into a single scalar without hiding a weighting choice from the customer. NSGA-II's non-dominated sort + crowding-distance selection gives you the Pareto front directly, which is the artefact the sales engineer actually wants to show the customer anyway: here are your options, here is what each costs on each axis, pick.

I started in DEAP because the documentation is friendlier, but I pivoted to pymoo 0.6.1 in the middle of the realisation phase. The reason was mundane: pymoo's Problem abstraction made the two-tier evaluator (below) straightforward to wire in, whereas in DEAP I was writing bespoke glue for every generation. Mid-project pivots feel costly but the CRISP-DM rhythm I was following — business understanding, data understanding, modelling, evaluation, loop — made it a survivable change rather than a reset. If I were starting over I would still start in DEAP for the tutorial phase and plan for the pymoo switch.

The two-tier evaluator

A SimPy run of the full plant (corrugator → conveyor → WIP buffer → transfer car → processing machine) with PLC-derived component logic takes on the order of seconds per candidate. With pop=40 and gen=20 that is 800 SimPy runs per experiment, roughly an hour, and you need several experiments per design question. Outside the thesis budget.

The trick was a tier-1 surrogate evaluator that scores every candidate in milliseconds from cheap geometric proxies (conveyor length estimate, routing feasibility, aisle violations). Only the top 20% of each generation — 8 individuals — get sent into the tier-2 SimPy DES for a full run. Over 20 generations that is 160 SimPy runs, not 800. The tier-1 surrogate is not accurate in absolute terms; it is accurate in rank-order for the leading edge of the population, which is the only thing the selection operator actually needs.

What came out

On orders_mix120.xlsx (34 batches of synthesised production orders generated by a companion tool so the evaluation was reproducible), NSGA-II with pop=40, gen=20, uniform crossover 50%, bit-flip mutation 8%, and early-stop on HV improvement below 10% across 5 generations produced a Pareto front of 38 feasible layouts on an Apple M3 / 24 GB machine. I pulled three representatives for the thesis: Layout A (Lean) at one corner, Layout B (Balanced) in the middle, Layout C (Flow) at the other corner. Against the VDB-supplied golden baseline (812,000 mm conveyor, path ratio 3.01, 102 stacks/hour, 54.3% utilisation), Layout B delivered:

20.6% less conveyor length (645,000 mm vs 812,000 mm)
21% lower path ratio (2.38 vs 3.01)
15% higher throughput (117 vs 102 stacks/hour)
12.6 percentage points higher machine utilisation

The validation suite was 77 unit + 12 smoke + 5 regression (against the VDB baseline) + 4 perf tests via pytest-benchmark, 71 passing at submission. The builder, validator, and evaluator modules cleared 90% coverage.

Honest scoping

The thesis is explicit that this is a proof of concept for AI-enhanced layout optimisation, not a deployable industrial product. The 'golden layout' I was benchmarking against is itself a simplified test layout VDB designed to validate the simulation environment — it is not a real production floor. A 20.6% improvement against that baseline is a statement about the method, not a promised saving for the next customer. Writing this down in the thesis's scope section mattered more to me than I expected: the pipeline is powerful enough to produce misleading numbers if you let it, and being the one to flag that felt like the adult version of the work.

What I would do differently

Start from the decision-level encoding. The (x, y, orientation) prototype cost me two weeks I will not get back.
Build the tier-1 surrogate earlier. For the first half of the project I ran full SimPy on every candidate and the feedback loop was unusable.
Treat the golden layout as a single data point, not a truth. The final evaluation should have swept across several realistic order books and hall geometries instead of leaning on orders_mix120.
Commit to pymoo from day one if the objective count is >1. The DEAP phase was useful tutorial, not useful code.

What this project taught me

The real lesson was about problem framing rather than optimisation. A GA is a hammer; the interesting work is in how you set up the nail. The encoding change was not a micro-optimisation, it was the project. The surrogate-and-DES two-tier pipeline was not a performance trick, it was what made the whole thing tractable. And the Pareto front was not a result to be post-processed into a single recommended layout — it was the artefact that let the sales engineer and the customer have a conversation instead of a guess.

This thesis is the work I point to when someone asks why I am applying for a Masters in Data Science and AI. It is a full CRISP-DM cycle on a real industrial problem, with mid-flight pivots, a calibrated evaluation protocol, and an honest scope section. The full PDF is below if you want to follow the receipts. If any of it is useful or off-base, I would love to be told.

Read the full thesis (PDF)