## 13. Wave Front Fetch Graph Cut Algorithm

### What is graph cut?

A graph is a figure with nodes (points) connected by edges (lines). Ignore details of node positions and edge lengths, and only consider relative ties. A graph with an edge orientation is called a directed graph. Edges can be assigned numerical values called weights. Consider the case where a non-negative integer is assigned to a directed edge, and this number means the maximum flow rate in that direction. Consider placing two special nodes S (source point) and T (suction point) in the node, and letting water flow from S to T in the graph. The problem of finding "how much water can flow at maximum" is called the maximum flow problem. When it is said that water can no longer flow from S to T any more, some edge has reached the maximum flow rate, and a group of edges that have reached the maximum flow rate creates a "one wall" from S to T It should block the route. In other words, this wall separates the graph into S side and T side.

On the other hand, we consider a group of edges with the direction from the S side to the T side that can separate the graph into S side and T side by cutting the edge, and this is called cut. The total value of the edge weights included in the cut is called the cut value. There are many cuts, but the one with the smallest cut value is called the minimum cut. Dividing a graph with a minimum cut or finding a minimum cut is called a graph cut.

In a situation where water can no longer flow, the minimum cut should be the previous "single wall", so the maximum flow and the minimum cut value will match. This is called the maximum flow minimum cut theorem. If you try to flow as much water as possible, the minimum cut will be found. As will be explained in detail later, the graph cut algorithms find the minimum cut by flowing the maximum flow.

If the evaluation value of the combinatorial optimization problem can be expressed as a cut value of the graph, the optimal solution can be obtained by graph cut. Other methods can only find optimized solutions (solutions that have worked hard to achieve the best), while finding true optimal solutions.

### Residual flow

First, we explain the concept of residual flow that is common to all algorithms. It is not known in advance how much water will flow on an edge to get the maximum flow as a whole. There is no choice but to try it out for the time being. For the time being, of course, the value of how much water can flow on the edge is naturally smaller than the maximum flow rate. This value of "how much else can flow" is called the residual flow. As you can see, it is important to add the same amount to the residual flow of the opposite edge. Flowing in the opposite direction means reducing (canceling) the amount of flow for the time being at first, so there is no longer a failure due to overflow for the time being. Thanks to this residual flow with backtracking (cancellation) function, the maximum solution can be found with a seemingly greedy algorithm.

### Graph cut algorithm

There are two basic concepts in the graph cut algorithm. The first is the augmenting path method and the second is the push relabel method.

The augmented path method is a process of finding and flowing the route from S to T with remaining residual flow until such a route is not found. There are various variations in how to find and flow routes. BK algorithm (Boykov-Kolmogorov algorithm) introduced in Article 1 is also classified as an incremental path method.

The push relabel method uses the idea of over flow inspired by the actual water flow. When erosion is not considered, how the actual flow of water is determined is that water flows from higher to lower, but the height (water level) is automatically controlled by the mechanism described below. It will be decided. If the amount flowing out downstream is less than the amount flowing in from upstream, water will accumulate there and the water level at that location will rise. And this rise in water level reduces the inflow from the upstream and increases the outflow downstream. On the other hand, if the amount flowing out downstream is larger than the amount flowing from upstream, the accumulated water will decrease and the water level at that location will drop. And this drop in water level increases the inflow from the upstream and decreases the outflow to the downstream. This change in water level regulates the flow and finally reaches a steady flow. The flow rate increases as the water level difference increases, and the flow rate decreases as the water level difference decreases, so the flow is adjusted, so if the water flow level does not increase even if the water level difference increases, The difference widens and becomes a waterfall in the natural stream. Just like the pinch-off point of EFT (Field Effect Transistor). And this falls corresponds to the minimum cut. And not only is the amount of water that flows everywhere downstream, but there is room to increase the flow.

Excess flow is the value of how much inflow is greater than outflow. Nodes with excess flow correspond to places where the water level is rising. Passing excess flow to the next node means increasing the outflow to that node and balancing yourself with the inflow. Since water flows only from high to low, excess flow can only be passed to low heights. Of course, you can only pass in the direction where the remaining flow remains. You can also pass part of the excess flow. If you can't pass the excess flow, increase your height. Passing the excess flow from S to T will ensure that there is no excess flow downstream of the waterfall. A waterfall is an edge group with zero residual flow, over which excess flow can no longer pass. The operation that passes the excess flow is called push, and the operation that increases your height when you cannot pass the excess flow is called relabel. If this push and relabel are used to create a situation where there is no excess flow downstream of the waterfall, the position of the waterfall is determined and the graph cut is completed.

It does not say that the greater the difference in height, the more it flows, but the difference in flow is different from the actual flow of water. This is because the purpose is to find a waterfall. Therefore, it is enough if it is possible to judge which height is higher, so it is expressed as an integer. If this is repeated as a unit operation, push or relabel (details will be explained later) is applied once to all nodes with excess flow, the difference in height where the flow can be increased is 0 or 1 (Omitted because proof is difficult). Therefore, if the height of T is 0 and all nodes are connected with a height difference of 1, the highest node height is -1 for all nodes. Since there are places where the height difference is 0, this is the highest possible height that can increase the flow. Therefore, assign the total number of nodes to the height above the waterfall where the flow cannot be increased. The initial height before starting the unit operation is 0 except S, and S is the total number of nodes. That is, S is above the waterfall. There are two references to relabel: "Examine the height of surrounding nodes and increase it by 1 from the lowest" and "Increase your height by 1". It seems easier to prove that "examine the height of surrounding nodes and increase it by 1 than the lowest one", but "just increase your height by 1" can prove that it is correct is. In an actual program, "simply increase your height by 1" is faster.

The graph cut is completed only by pushing and relabeling, but in order to improve the speed, the operations of global relabeling and gap heuristic are used in combination. Global relabeling is an operation that performs a breadth-first search from T and determines the height of all nodes at that time. When you reach T from an edge that has residual flow remaining from a node, the shortest number of edges is the height of that node. An algorithm that performs this at high speed is breadth-first search. The node that cannot reach T is located above the waterfall, so it can be removed from subsequent operations. Perform global relabeling every 50 unit operations.

When the unit operation is repeated, the height difference where the flow has not reached is 0, and the height difference where the flow is present and can be increased is 1. Therefore, when creating a histogram of height and number of nodes in both ranges, it is natural that the number of nodes will be zero above a certain height, and there will be at least one node at a lower height. Therefore, if a height (gap) with a node number of 0 appears in the histogram of the height and the number of nodes for the whole and is sandwiched between non-zero heights, the height is larger than the gap. You can see that the nodes are separated from the flow that can increase the flow, that is, above the waterfall. The node above the waterfall can be removed from subsequent operations. This is called gap heuristic.

### Graph cut algorithm for parallel processing

In the push relabel method, the node that has passed the excess flow and the node that has not passed the excess flow are the main players in the next unit operation. Therefore, the CPU can improve efficiency by managing the main players using queues. You However, when using devices suitable for parallel processing such as GPU and FPGA, parallelism cannot be used in queues. It will be judged whether or not it is a leading role for all nodes. If so, the Wave Front Fetch algorithm came up with the idea that this global participation can be effectively used by performing global relabeling and push in a pipeline. According to the research note, it came up on 2014/10/16 and it was named 2014/11/04.

The basic idea is as follows. The flow arrival point T sends the new time, and the other nodes send their time to the nodes connected by the remaining flow toward them. As a result, nodes other than T will receive the time from some nodes, but update their time with the latest time and hand overflow to the node that sent this time. This process is called a unit operation when all nodes are executed, and this unit operation is performed until there is no excess flow under the waterfall. Waves of various times are transmitted from the T side to the S side at the same time for each unit operation, waves of different times collide with each other, the waves of the old time disappear, and the beginning of a new wave captures excess flow I think you can imagine the situation and the situation where you can no longer transmit the wave by receiving the excess flow and there is no residual flow. The name Wave Front Fetch Algorithm represents this image. Since this algorithm uses the concept of excess flow, it is classified as a push relabel method, but it is as simple as not using relabel, which is difficult to prove.

### Details of Wave Front Fetch algorithm

When the Wave Front Fetch algorithm is written in NVIDIA CUDA, it is as follows, assuming

• tim_write(S, H, W, V) set V to the time of node(S, H, W)
• ovf_add(S, H, W, V) add V to the overflow of node(S, H, W)
• edg_read(S, H, W, D) read the flow of edge(S, H, W, D)
• edg_add(S, H, W, D, V) add V to the flow of edge(S, H, W, D)
.
``````__global__ void wave_init(void) {
///////////////////////////////
int total_id = blockDim.x * blockIdx.x + threadIdx.x;
int S  = total_id / size_hw;
int sa = total_id % size_hw;
int H  = sa       / size_w;
int W  = sa       % size_w;
///////////////////////////////
tim_write(S, H, W, 0);
}

__device__ void push(int S, int H, int W, int nS, int nH, int nW, int D, int R) {
int tim = tim_read(S, H, W);
int Tim = tim_read(nS, nH, nW);
if (Tim <= tim) return;
int edg = edg_read(S, H, W, D);
if (edg == 0) return;
int ovf = ovf_read(S, H, W);
if (ovf > 0) {
bool q = edg > ovf;
int pp = (q)? ovf: edg;
if (q) tim_write(S, H, W, Tim);
}
else tim_write(S, H, W, Tim);
}

__global__ void wave(int tim) {
///////////////////////////////
int total_id = blockDim.x * blockIdx.x + threadIdx.x;
int S  = total_id / size_hw;
int sa = total_id % size_hw;
int H  = sa       / size_w;
int W  = sa       % size_w;
///////////////////////////////
if (ovf_read(S, H, W) < 0) {
tim_write(S, H, W, tim);
return;
}
if  (S!=size_s-1)                 push(S, H, W, S+1, H,   W,   0, 9);
if ((S!=size_s-1)&&(W!=size_w-1)) push(S, H, W, S+1, H,   W+1, 5, 7);
if ((S!=size_s-1)&&(W!=0))        push(S, H, W, S+1, H,   W-1, 6, 8);
if                 (W!=size_w-1)  push(S, H, W, S,   H,   W+1, 4, 3);
if                 (W!=0)         push(S, H, W, S,   H,   W-1, 3, 4);
if                 (H!=size_h-1)  push(S, H, W, S,   H+1, W,   2, 1);
if                 (H!=0)         push(S, H, W, S,   H-1, W,   1, 2);
if ((S!=0)       &&(W!=size_w-1)) push(S, H, W, S-1, H,   W+1, 8, 6);
if ((S!=0)       &&(W!=0))        push(S, H, W, S-1, H,   W-1, 7, 5);
if  (S!=0)                        push(S, H, W, S-1, H,   W,   9, 0);
}

void wave_cut(int loop) {
wave_init<<< SIZE_SHW/256, 256 >>>();
for (int time = 1; time <= loop; time++) {
wave<<< SIZE_SHW/256, 256 >>>(time);
}
}``````
As many threads as the number of nodes in the graph are activated all at once to execute processing asynchronously. This code is for a node with 10 edges. 10 edges is the number of edges to handle the Gaze_line-Depth model. The same structure can be used for grid graphs other than 10 edges. Just increase or decrease the number of lines of push () in the function wave (). As you can see from the code, edge number 0 is the edge toward S. The residual flow to T is represented by a negative excess flow, and the residual flow from S is represented by a positive excess flow. The graph cut is complete when there is no positive excess flow on the T side of the waterfall. Also, ovf_add (S, H, W, V) uses the CUDA atomicAdd () function because contention occurs between nodes.

Looking at the function push (), it compares the time of your own with the time of the neighbor, and if the neighbor is not new, it is meaningless. The larger the value, the newer the time. Next, no matter how new the neighbor is, if there is no residual flow toward the neighbor, it is meaningless, so we are returning. If there is a residual flow and you also have an excess flow, you can pass all or part of the excess flow to the neighbor. However, you cannot pass beyond the remaining flow, nor can you pass beyond the excess flow. Therefore, it is checked whether the residual flow or excess flow is small, and the excess flow of that amount is passed to the neighbor, the residual flow at the edge toward the neighbor is reduced, and the residual flow in the opposite direction is increased. Passing excess flow to the neighbor means increasing the excess flow next to it and reducing your excess flow. If there is no remaining flow as a result of passing the excess flow, it means that the route has been cut off, so it will not update your time so that incorrect information is not transmitted. If there is a residual flow or if there is no excess flow from the beginning, update your time.

If you look at the function wave (), the first part is to determine which of the threads you are running at once. The part that determines whether the subsequent excess flow is negative determines whether there is a residual flow to T. If there is a residual flow to T, it means that there is a route to T. I am updating my time to a new time. In the subsequent push (), if the excess flow is passed to the node with the residual flow to T, the negative degree of excess flow is reduced. As a result, if the excess flow becomes positive, there is no remaining flow to T, and it becomes a normal node with excess flow to pass to others. In this way, if a negative value is used for excess flow, a data structure can be created with only the nodes except S and T.

### Distribution of Wave Front Fetch program

The distribution data and program may be annoying, but there are many with

. The program just described is graph_cut.cu_wave_push_only. This program does not have a termination check. The unit operation is executed the number of times specified by LOOP, and the operation is completed. In rectify.cpp, "> Key command" is added to the program distributed in article 12. This command executes a graph cut once. This rectify.cpp is the latest mainstream. The following one-character key commands can be used.
List of one key commands
characterfunction
qquit
?help
>do graph cut
Mdo calibration
mleft picture
,stereo picture
.right picture
<both picture
sdepth(disparity) to front
tdepth(disparity) to back
heye move left angle
jeye move down angle
keye move up angle
leye move right angle
feye move near
neye move far
Heye move left
Jeye move down
Keye move up
Leye move right
Feye move forward
Neye move back
Cclear angle and position
3_rec_L.png and 3_rec_R.png are stereo parallel images of the car in Nagasaki University used in article 11. graph_cut.cu_push_relabel displays the execution time and the maximum flow value in graph_cut.cu in article 12. This program adds global relabeling to the push relabeling method. Termination judgment is performed in global relabeling. If you do global relabeling, you can see the range connected to T in the residual flow path, but if there is no excess flow in that range, you can judge that the graph cut has ended. graph_cut.cu_wave_push is a combination of Wave Front Fetch and end judgment by global relabeling. Unfortunately, the speed is not much different from graph_cut.cu_push_relabel. graph_cut.cu_with_shfl aims at the maximum speed with a data size of 32 bits per node. Therefore, the number of edges is 6 edges without the W direction inhibit. A total of 32 nodes of S-H-W size 2-4-4 is set as 1 warp, and the area is divided into 4 so that adjacent warps are not activated simultaneously. As a result, conflicts are avoided by warp synchronization within the warp and by region splitting between warps. Negative values are not used to make effective use of the number of bits. For this reason, a data structure that represents the residual flow to T instead of representing the residual flow to T with a negative excess flow is added. The shfl_sync () function is used for data movement in warp. graph_cut.cu_without_shfl is a code that expects a cache hit without using the shfl_sync () function. This is a little faster because the code is simpler. The 6-edge 32-bit version (the last two) is about 8 times faster than the previous version, but it does not have a termination check.

You can check the operation of these graph_cut.cu ... by making a copy named graph_cut.cu and typing

``````make do
>,fffjjjq``````
. > It will take some time after key input. The, key is a command to display the graph cut result in stereo. The f key and j key are commands that change the appearance. The q key is the exit command.

The startup format is

``./rectify right_image left_image max_disparity min_disparity penalty inhibit divisor option LOOP``
. The last LOOP value has been newly added, but it doesn't have to be. In that case, a value that seems appropriate for each program is used. In graph_cut.cu_push_relabel and graph_cut.cu_wave_push, this LOOP value is the number of unit operations to be executed per global relabeling. In other programs, there is no end judgment, so the unit operation is executed for this LOOP value and the process ends. The maximum flow value is 6033198 for the 10-edge version and 889438 for the 6-edge 32-bit version.

### Conclusion

As you can see by running the distribution program, push relabel + global relabeling seems to be the fastest when the end judgment is done with the 10-edge version with inhibitor properly inserted. So, what is the advantage of the Wave Front Fetch method, the code is simple. In the push relabel method, the push relabel part is simple, but without global relabeling, the speed is not achieved at all. With the Wave Front Fetch method, you can get almost the same speed. How to determine the end of the Wave Front Fetch method is an annoying problem, but one way of thinking is to abort it a certain number of times even if the graph cut has not been completed. Most of the flow is flowing at the very beginning, and most of the time is spent on the last few flows. In most cases, the stereovision display will not deteriorate even if it is interrupted. I think there may be a direction of research to clarify how much to choose.

The 6-edge version with 32 bits per node is the fastest so far, but even with the highest performance GPUs such as GTX1080 and GTX1080Ti, it is one more step to use in real time. In order to perform real-time processing with a margin, it is necessary to devise measures such as reducing the number of pixels or hierarchically cutting graphs. If it is to be applied to automatic driving of automobiles in the future, how big the reduction of power consumption of GPU will be is a big problem. After all FPGA and ASIC may be necessary. At that time, the simplicity of the Wave Front Fetch method seems to be useful.

### The meaning of eliminating inhibit

Inhibit has been lost in the 32bit6 edge version, but the meaning is arranged. The pixel-disparity model uses a truncated linear penalty instead of a linear penalty proportional to the parallax change in order to handle the troublesome disparity that varies greatly even with neighboring pixels. Yes. This is to "reduce the effect of penalty where parallax changes are large". In the gaze_line-depth model, the change in depth is 1 at most, unlike parallax, so the penalty proportional to the change gives an inhibit that does not allow more than 2 changes. This is "increase the effect of penalty where the change in depth is large". In this way, the idea that the change is large between the pixel-disparity model and the gaze_line-depth model is reversed. If inhibit is not given, only a penalty proportional to the change is given, but there is no difference in "increasing the effect of the penalty when the depth change is large".