Algorithm for Vectorization of Line-drawing Bitmaps

This project dealt with vectorization of line-drawing bitmaps. Given a bitmap that consists of a approximately straight lines, we wish to extract their coordinates and widths.

Abstract
This project dealt with vectorization of line-drawing bitmaps. Given a bitmap that consists of a approximately straight lines, we wish to extract their coordinates and widths. An original algorithm for doing this by gradual simplification of a graph was developed and implemented. The algorithm is tuned by defining an “energy function” over the graph. A large part of the project was finding a good function; with it the algorithm gives quite good results.

The problem
Given a bitmap representing a line drawing, derive a vectorized representation of it.

This is a simplified form of the problem known as Tracing (usually bent lines are supported; here we assume all lines were intended to be straight).

The input
A black/white bitmap, presumably consisting of a set of lines (possibly intersecting), of unknown (possibly varying) widths. See figure 1 for an example.

The output
The output is a simple graph output: a set of nodes and a set of edges connecting some of them. Each edge and node is characterized with a radius value (defined as half-width for edges); see figure 2 for an example, shown in graphical form; the actual output is a simple text description of the graph.

This output is intended as input to higher-level programs for line drawing understanding that are easier to write given a vectorized representation of an image. The algorithm can also (to a tunable degree) tolerate and hide imperfections in a drawing, such as unstraight lines, which relieves the higher-level programs of irrelevant details.

figure 1 – Example of input

figure 2 – Visualization of example output

The algorithm’s approach
The drawing might be imperfect on a large scale: the lines need not be exactly straight, their width can somewhat vary along the line, lines don’t have to meet perfectly at intersections, etc. On the other hand, the image is imperfect on a small scale because it is rasterized, resulting in jugged boundaries.

These observations suggested a process of gradual idealization of the image. We start with a graph corresponding 1:1 to the image’s black pixels; neighboring black pixels are connected by edges. From then on we apply at each step one of several operations on the graph. The choice of operations is driven by an “energy function” that is a parameter of the algorithm.

The graph
Each node of the graph is a set of black pixels. Each edge is a pair of nodes and another set of pixels belonging to the edge itself — the edge’s “dark matter”. At each step, each black pixel belongs to precisely one node or dark matter.

The graph is initialized in a simple way: every black pixel becomes a node, 4-adjacent pixels are connected with edges, diagonally adjacent pixels are connected if not 4-connected on either side of the diagonal.

The possible operations are:

1. Edge merging: if precisely 2 edges connect to the same node, they can be merged, combining their dark matters and the common node to the dark matter of the resulting edge:

2. Node merging: 2 nodes connected by an edge can be merged (the edge’s dark matter, if any, also becomes part of the resulting node):

3. “Pixel stealing”: from any node that consists of more than one pixel, we can move any of these pixels into the dark matter of any of the incident edges:

Energy functions
At every step many operations are possible; we choose the operation that minimizes an “energy function” defined over the graph. We stop when the energy starts to increase (we look ahead for some steps for a lower energy value to skip local minima). Thus the whole algorithm can be tuned by changing the small definition of the energy function (indeed, finding a good one was a major part of the project’s effort).

First, we assume that diverse parts of the graph should be independent, so we define the energy of the graph as by summing a node energy function over all nodes and an edge energy function over all edges.

So we need to define these two functions. One choice which behaves quite well is presented next. The guiding logic was that after graph initialization, the energy must be approximately proportional to the area covered by black pixels. We construct functions that are somewhat lower than the area for “good” complex nodes and edges and higher than it for “bad” ones.

Node Energy
For a node (set of black pixels), it’s easy to find it’s center of mass. We measure the number of pixels n and the distances d of every pixels from the center and compute k’th moment (the result for same n is higher for unround nodes):

We then estimate the radius as and compute a value close to the area for round nodes, higher for unround:

With a couple last corrections to encourage merging, the following function was obtained:

Edge Energy
The edge energy function is quite similar. We measure the number of pixels n of the dark matter and it’s length L (defined as the distance between the centers of the two end-nodes minus their radiuses). Measuring the distances d of every pixel of the dark matter from the straight segment between the two nodes and compute k’th moment (the result for same n is higher for unstraight edges).

We estimate the “radius” (half-width) of the edge (bounding it into [1,L] to avoid some anomalous behavior observed otherwise):

From this an area-like value is computed: and encouraging merging like with nodes, we get:

Conclusions
The algorithm with the above energy functions performs quite well. It copes with low resolution images containing lines of different widths and lengths. The tuning is still imperfect; in particular it doesn’t cope well with strong variation of width along a line.

The algorithm has an inherent weakness: the operations on the graph preserve connectivity, so it can’t recognize a discontinuous line as a single line.

The algorithm’s behavior is controlled by the energy functions whose code fits into one screen; this can be seen as a benefit but in retrospect, it is also a disadvantage. Finding a function that behaves well across different scales and forms proved surprisingly hard. Perhaps a less “concentrated” means of tuning would be better…

The performance, as implemented, is unacceptable for big images — the constant factors are too high (although the algorithm’s complexity is not bad, approximately of the order of for n black pixels).

Thanks
I’d like to thank the supervisor Idan Shatz for his valuable feedback; he shares a lot of the credit for the design of the algorithm. I also want to thank the VISL lab for providing this project and being ready to provide computing resources (although I scarcely used them) and the Ollendorf Minerva Center Fund which supported this project.

Tools
The program was developed fully with free software. It’s written in the Python high-level interpreted language and is portable. I developed on GNU/Linux, mainly with Emacs’ python-mode. Documentation was prepared mainly with docutils, with LaTeX as a back-end; the presentation was prepared with OpenOffice.org. Thanks to all the hackers who made this possible.