Not Just for Graphics
A quick foreword: this article’s gonna be bit heavy because it was originally published as a part of a university assignment on parallel computing. You can view the full report here.
Ever since their inception as an independent display processor in the late 1970s, Video Chips — or now more formally, GPUs — have played an important role in the advancement of modern computing. Originally developed as a means to improve the quality and performance of onscreen computer graphics, GPUs have since made their way to becoming an integral part of of parallel computing 1.
The need for a specialised graphics processor came out of the limitations of CPUs. CPUs were specifically made to execute application logic and handle user interaction. But as graphical user interfaces surged in popularity (coupled with the rise of video games), more graphics processing was needed to be done on top of a CPU’s core workload. Memory and processing power was expensive; thus, the GPU was invented.
GPUs were designed to compute graphical data very efficiently. They are not replacements for the CPU — instead, are seen as an extension for it. The CPU would run the main application code, passing any graphical computation to the GPU. This decoupling of processing responsibility resulted in a visually-pleasing and highly-interactive user experience. As such, advancements in GPU technology have since been focused on improving the performance and quality of computer graphics 2.
It wasn’t until the early 2000s that another use for GPUs had been discovered. It was found that if raw mathemathical data (in particular, matrices, vectors or large arrays) were translated into graphical data, the GPU could perform computation much faster than any traditional CPU. Furthermore, with the introduction of APIs like OpenCL or NVIDIA’s Compute Unified Device Architecture (CUDA), raw data no longer has to be converted and programs can be passed directly to the GPU, making it easier to use.
The speed, performance and efficiency of using non-graphical, General Purpose Computing on GPUs has paved the way for advancements in personal computing and many more modern, practical uses.
So how does it all work? Unlike CPUs, GPUs were built to process mass mounts of individual data in parallel. In the graphics processing context, if a CPU was tasked to modify a grid of pixels, it would have to pass over each pixel sequentially, and if done trivially, through a for
loop. On the other hand, a GPU would instead compute all (if not, most of) the pixels at the exact same time. In the context of general computing, this parallelization can be used to batch-compute data 3.
The process by which data is passed to the GPU for computation is known as the Graphics Pipeline. From a high-level perspective: graphical data is sent from an Application in the CPU, communicated through the Host (an interface between the CPU and the GPU), then the data is computed — sometimes combined with data stored in the Frame Buffer (video memory). The resulting data is a set of vertices in 3D Space, converted to triangles through Geometry that essentially becomes a 3D model. This scene is then mapped onto a 2D projection of pixels through Rasterization, rendered with more detail (Fragment) and finally, prepares the image for display via the Raster Operation (ROP).
In the context of general computing, the most useful aspect of the above Graphics Pipeline is a process that happens within the CPU called Shading. Shaders — programs that add aesthetic value to a 3D image through lighting, color and other image effects — can be programmed in a way to produce a variety of different rasterizations of a 3D image. For example, Shaders can be used to make a 3D sphere look matte, shiny or even made of gold. Beyond visual utility however, Shaders can be programmed to process non-graphical data — that is, general computation.
Take for example the task of summing an incredibly large array of data. Done sequentially, the complexity of a trivial algorithm for this would take linear time (sequentially). However, if the array was converted into an image format — a grid of pixels where each pixel holds the corresponding value of an item in the array (essentially, a large matrix) — it can then be processed in parallel through the GPU. In this case, a Shader that specialises in image resizing can be re-programmed to sum the values of any 4 adjacent “pixels” (cells in the matrix) into one “pixel” — all at the same time. The GPU would then run through in parallel the entire matrix again, summing any 4 adjacent “pixels” into one — and so on, until eventually ending up with a single “pixel” that holds the value of the final sum.
Traditionally, the code for the above example (in pseudocode) would be written like so:
for pixel in grid {
modify(pixel)
}
When the GPU is utilised, the above for
loop is re-written as a kernel
— a small, reusable program to be used computed in parallel asynchronously:
function(pixel, grid) {
modify(pixel)
}
Although the above example was simple, the process of having to convert raw data into a format that the GPU can compute becomes more cumbersome as complexity increases. Because of this, the above traditional method of Shader programming has been improved by the more modern technique of simply running the code directly on the GPU via the use of APIs. An example of this is the task of finding an element in a non-sorted array: code could simply be written to compare each element in the 1-dimensional array with the query all in parallel. If the element in the array does not match the query, it is ignored, otherwise, its index is stored and passed as part of the result.
The bulk, parallel processing capabilities introduced by GPGPUs has paved the way for more high-performance computing per individual computer. This idea of using the graphics card for more than just graphics processing has many practical uses today, including advancements in consumer software, data processing in fields like Astrophysics, Chemistry and Medicine, and even for business use such as in data centers for load balancing.
In consumer software, due to mass-production of more advanced GPUs and the introduction of easy-to-use, high-level APIs like Apple’s Metal that enable for GPGPUs on iPhones and iPads, users now have access to features and programs once deemed exclusive to supercomputers — such as image noise reduction, live 3D filters and audio processing.
In the science and medicine industry, clusters of GPUs added on top of CPUs are used by supercomputers to batch-process terabytes of raw mathematical data: from physical calculations (in the form of vectors or matrices) for simulating the movement of distant stars to analysing protein folding (generating various forms of a chemical in parallel).
So, what’s next? Recent advancements include more easy-to-use APIs such as WebGL (using GPGPUs via the web) and NVIDIA’s research into Echelon — a new GPU architecture specifically geared towards general computing that stores more memory than a traditional GPU. In terms of future applications, GPGPUs will see more usage in artificial intelligence (processing more input through parallelism) and in the science research industry — in particular, using asynchronous computation to produce a 1:1 mapping of every molecule in the human brain.
By Moore’s Law, GPUs (and CPUs in turn) will become more efficient, performant and will continue to provide many more practical uses with every advancement of the technology.
General Purpose Computing on GPUs work by passing large amounts of individual data from the CPU to the GPU via the Graphics Pipeline and then parallelizing the computation. This insight of passing non-graphical, general compuation has advanced the mere GPU from a simple extension for processing onscreen graphics, to become the more versatile and high-performance processor used in a lot of today’s modern computing.
- Means exactly what you think it means: running multiple programs (or solving pieces of a larger problem) at the same time instead of sequentially. [return]
- Here’s a quick video demonstration of how GPUs work, courtesy of the Mythbusters team. [return]
- Obviously, because this is merely an introduction, this explanation is massively oversimplified. For a more in-depth approach, I recommend perusing these lecture slides. [return]