Any flow control instruction (if, switch, do, for, while) can significantly affect the instruction throughput by causing threads of the same warp to diverge; that is, to follow different execution paths. If this happens, the different execution paths must be serialized, increasing the total number of instructions executed for this warp. When all the different execution paths have completed, the threads converge back to the same execution path.
To obtain best performance in cases where the control flow depends on the thread ID, the controlling condition should be written so as to minimize the number of divergent warps.
This is possible because the distribution of the warps across the block is deterministic as mentioned in Section 4.1 of the CUDA C Programming Guide. A trivial example is when the controlling condition depends only on (threadIdx / WSIZE) where WSIZE is the warp size.
In this case, no warp diverges because the controlling condition is perfectly aligned with the warps.