Monday, February 18, 2013

The Wonderful World of OpenCL

One of my favorite parts of programming is that there's constantly new tools coming out that make programming more interesting. One recent development I've stumbled upon is OpenCL, a specification that lets one design code for execution on a GPU or other parallel processor. I've been in desperate need for some additional speed in one of my recent projects, so I decided to give OpenCL a shot. If you're curious about the tools that I used, check out the last paragraph of the article.

The problem: to generate vast swaths of terrain, and to optimize said terrain to reduce the number of vertexes without heavily impacting the quality of the terrain mesh. The generation aspect of the problem has been tackled by many programmers and math nerds who have developed methods such as perlin noise and ridged fractals. Already having some ridged fractal code lying around, I chose it for the sake of simplicity. In OpenCL, you launch a kernel(aka the entry function) with a specified number of worker threads. These worker threads identify themselves using methods like get_global_id(n), where n is the dimension of the id. Generating a mesh of fractal noise is quite trivial, one can just launch a kernel with a two dimensional array of workers. Simply set each dimension of the array to how many vertexes wide and long you want the generated mesh to be, and stick the fractal code in the kernel which interprets (x_id,y_id) as the position of the vertex to be generated.

As simple as this sounded, I ran into a tiny problem. The kernel was set to stick generated vertexes in an array of float3, but it turns out that internally, float3 is wrapped by a float4. This means that when I read the buffer from the kernel, the vertexes were enumerated as [x,y,z,0,x,y,z,0,x,y,z,0]. Not exactly a huge deal, but it would have saved me a lot of debugging time if the tutorials mentioned it somewhere.

The usual solution to simplifying terrain is to use Quadtrees. Most of solutions that I see get very fancy with loads of objects and maintain a large relational structure to define the quadtree. These solutions also tend to be hopelessly slow because of all the cache thrashing they tend to cause, not to mention that the code is a bitch to maintain. Since none of those fancy object oriented features are available in OpenCL (thank god), I took an array-based approach where the state of the quadtree is broken down into an array of boolean nodes that specify whether or not a vertex is "activated".

One of the painful issues I ran into was that barrier(), the method used to synchronize worker threads, only works across the work-group. As far as I can gather, there is no way to synchronize threads across an entire kernel execution. This means that every place where I would have needed to add a synchronization across all the worker threads, I have to end that kernel's execution there, then queue another kernel afterwards that contains the code that comes after the barrier. I was kinda upset on discovering this considering that the size of work-groups is heavily restricted, but it wasn't all that bad because the overhead associated with launching a kernel is relatively small.

Now at this point I had an array of vertexes, and an array of boolean values that specifies which of those vertexes are actually important. The final challenge was to create an array of indices that specify the order in which the vertexes are to be drawn. Unfortunately this was a pretty boring problem to solve, and I ended up with a solution full of hard-coding. At this point I had a good set of debugging/compiling tools, so programming the vertex-winder was almost trivial.

In conclusion, I have to say that working with opencl has certainly been an enlightening experience. Massively parallel programming involves a very different approach to solving problems than normal imperative programming. As devices start to come packed with more and more cores, opencl will likely become a more important specification. Being in its infancy, there's still some undocumented driver/compiler bugs you may run into, including the lack of a unified documentation source that doesn't suck, but if you hang in there you'll get the hang of it. I think it would be interesting to see a programming language that could convert sections of one's code to opencl behind the scenes, making the parallel aspect of programming invisible to the end user. I guess that's on the table for a possible next project.

OKAY TIME FOR THE NUMBERS


edit: since publishing this post I've been able to get gpu elapsed execution time down to 5ms (!)


Tools

Dropping into opencl head-first is a blast. Virtually none of the tutorials out there talk about resources and tools to help you write and debug opencl code, which makes opencl look like a black box from which you can't debug or get worthwhile compiler feedback. Your GPU's vendor writes the opencl compiler for their specific platform, which means a failed compilation gets you useful errors like COMPILATION_FAILED. To fix this issue, I found several tools including CMSoft's OpenCL Code Checker, and the much more useful Intel SDK Kernel Builder.
In terms of debugging, you will encounter the same black-box issue. AMD released a tool called ocl-emu, a visual studio project that has macros and bindings that lets you -kinda- write opencl code in a C++ environment. Unfortunately opencl's syntax doesn't translate directly into ocl-emu's idea of opencl syntax, and I found the program to be lacking in documentation. In addition, many opencl functions were not implemented verbatim  which made using the tool a complete hassle. On the other hand we have Intel's SDK which provided some really neat toys. There's a Visual Studio plugin that lets you open your CL source file in VS, and place breakpoints and add watches to variables as if the code was native, kinda like AMD's setup but it works without having to "convert" code or write scripts to generate the parameters for your kernels. I was also surprised to find out that the SDK's Kernel Builder program also included a fairly comprehensive set of profiling tools. Honestly, if I hadn't discovered Intel's SDK I probably wouldn't have been able to finish this project. If you plan on doing any kind of heavy opencl programming in the future, it is a must have.

Thursday, March 29, 2012

Project Forge: Town Roads and Plot Zoning - Part 1

Last update I mentioned that the next post would likely be about NPC generation and simulation. Well, I started doing that and finished the generation algorithms; however it turns out simulation will require much more environmental input than I anticipated, requiring a development shift to town generation algorithms. This will give simulated NPCs much more of a "context" to work with, and will save me from much code refactoring when it comes time to tie all these modules together.

Town generation started out with the basics; each town would be defined from the seeded towns discussed in the last update, and would grow outward from a weighted center point of the town plot.
The initial boundaries of each town are defined by the local resources. Each pixel from the global map on the left corresponds with  16 pixels on the local map on the right. Each pixel in the map on the right represents one square meter.
The first step to generating towns is road generation. I made a critical mistake during my initial attempt at solving the problem, and locked myself into a solution that used templates to generate roads. This means that each town would have had a road system that inherited from a pre-defined template with several random variations. By the time I realized how awful of an idea this was, I had already blown most of my spring break programming it.
Example of one of the templated road systems. Purple lines represent roads, and green square represent  potential zones for residential development. 
I have a feeling that my initial idea was something based on L-systems like what this guy does, but the idea never actually came to fruition before I started actually programming. Looking back I have no idea what I was thinking; but on the bright side there were a few useful algorithms that had to be programmed to make templates work that can probably be reused elsewhere.


Okay - with that garbage out the window, I came up with a much more natural approach to the problem that likely emulates how roads are created in real life. The idea is to break the local map down into grids that contain several square meters of land. An entity that we'll call a road builder is spawned at a random location somewhere near the center of the local map. If the grid that the road builder lands on is not connected to a road, then it will construct a road connecting that grid to the main road system. By repeating the process and moving the road builder to another random spot on the grid, a kind of "crack" of roads is formed. 
Shit, it already looks better than that template crap.
In order to resolve the utter lack of interconnectivity on the road grid, an algorithm is executed which connects the endpoints of each path to other endpoints to increase connectivity and to make everything look a bit more natural. 

At least it doesn't look like a spider web anymore. Notice how the earliest built roads form a backbone for the network.
The nice thing about this road builder algorithm is that it works for any size town. It can be used to generate a town of only a few homes or even a metropolis of 500+ buildings. 
Road network from above seeded with building lots(green). There are about 200 buildings lots served by this road network.

Much larger network, unlike the ones above this one's "road builders" were bound by proximity to the road grid, rather than proximity to the origin point of the town. I think there's ~500 building lots on this one.
That's it for this update. Tune in next time for even more images of lines and boxes. 

-zalzane

Saturday, March 3, 2012

Project Forge: World Generation and Town Seeding

Over the past few weeks, I have shifted design focus away from engine components and more onto the procedural generation components of the Forge. I have yet to make the decision on whether or not to use a graphics engine for Forge, or to use something lower-level like openGL.

Anyways the first procedural component I pumped out was a terrain generator. Right now it's nothing special and is just a placeholder until I can iron out all of the details. For the purpose of debugging and working on other features I'm using a 4km by 4km map, however in the future that will probably scale up to something much bigger like 16km by 16km, as long as memory restrictions allow. The placeholder terrain generator actually doesn't look all that bad, and because of how it works, some natural artifacts show up that may pass off as rivers at a later stage in development.
The brighter the blue color, the higher the pixel is from sea-level.

The next item on the itinerary was resource and town seeding. Each town would be seeded on areas of very high natural resource concentration, something I expected would normally happen for towns in real life. Several different resource maps were generated, each of which with their own unique density and spread. This is supposed to mirror how in real life, certain resources like fertile ground may be pervasive for many miles, however resources such as precious metals may only have deposits a few meters wide. Since the NPCs founding towns aren't supposed to be omniscient, they will only recognize areas as having a specific resource if the concentration of that resource passes a specific threshold. For example, an NPC would only recognize an area as having iron deposits because the concentration of iron in the local area is so dense that rocks of hematite or magnetite are found on the surface.
Example of a resource map for Iron. Areas of solid blue are where  traces of the metal are visible on the surface.
The average of these resource maps is taken, and a threshold is applied to the new cumulative map, symbolizing areas where potential settlers would recognize that there are plenty of local resources.
Each blue pockets represents an area with an unusually high resource concentration
For each of these high-density resource pockets, a town is founded. The next step was to connect these towns with roads, a process which turned to be much more painful than I had expected and added over 1000 lines of code to the project. 3 weeks and several cans of coffee later, I found the best solution for generating roads to be the use of Dijkstra's algorithm with a set of cost constraints based on how hard it is to travel through a certain area.
Red dots represent towns, green lines are roads.
Blue background is the cost map, with ligther blue having a higher cost.
For the pathfinding cost map, I did a threshold where if a pixel was deemed to be neigh-impossible to travel through (mountains, swamps, rivers,  etc), then the road would avoid traveling through it at all costs. However in some situations these areas must be crossed because there is no way around, in which case there will have to be bridges, massive elevators (I can dream), or stonework roads required to pass through. In the future it may be possible for NPCs to create their own roads after the initial batch in order to supplement the current system, but speculating that far down the development like probably isn't that good of an idea. 

That's it for this update. Next up in the queue is NPC generation and simulation, which is starting to look like a very interesting component to develop.

-zalzane

Sunday, January 29, 2012

Project Forge Development

I suppose I'll start the blog with a post about a video game I've been working on: Project Forge. I want to see Forge become a relatively-believable, procedurally generated adventure game. It will likely be 1st or 3rd person point of view in a statically sized 3D world.

In terms of procedurally generated content, I want to have the game generate unique towns, cities, and entire civilizations for the player to interact with. Ideally these civilizations would function just like real civilizations, NPCs would have relationships with one another, they would go to work, go home, eat, sleep, wake up late, get fired, get knocked up, have kids, and all sorts of other fun stuff. I want to place a lot of emphasis on the realism factor of these civilizations so that they will grow, expand, and become unique entities all on their own.

In terms of gameplay, I'm not entirely sure of a direction to take. The idea that I play around with the most is having ruins scattered across each world for the player to explore and plunder, however I fear that such gameplay may become repetitive or boring after the first few exploration expeditions, an issue that crops up in a lot of other open-world adventure games like those in the Elder Scrolls series. The solution to this issue may take a lot of time to figure out, which is why I'm going to be working on the procedural civilizations/content before I start in on gameplay.

I'll try to work on Forge as much as I can in my spare time, but between college and work I cannot guarantee progress or blog updates on regular intervals. However on the bright side, I have already been working on the game on and off for the past few weeks, and have made a bit of progress in getting all of the 3D knick knacks working.

Here's a recent image I took of my inability to debug whatever the hell opengl uses to shade primitives. Kinda looks cool combined with the heightmap mesh forming some kind of pseudo-valley of darkness or something.