Game of the Year Enhanced

Yes, you read correctly. Borderlands – Game of the Year Enhanced on PC. Not to be mistaken with the earlier release titled Borderlands – Game of the Year. The console version of this release is also titled Borderlands – Game of the Year…so I guess only Steam restricted them from having two titles with the same name. But rest assured, this one is more gamier, more yearlier, and more enhanced by far. I’m going to avoid going into the importance of not naming your release the same thing as a previous release and move on.

I was thinking what to say about the project for this post, but the question quickly changed to what not to say. This project was as intricate and filled with interesting, unusual, and wacky events as the game itself. A huge open world of tasks for me as well, as I worked on nearly every area. I had my pick of a large list of issues ranging across the board from networking, gameplay, physics, engine, and UI tasks! Memory stomps, framerate issues, falling through elevators, random disconnections, you name it! Flying around a level editor to find missing turrets that ended up being hidden within the walls of buildings that were now getting blocked from appearing by new collision fixes? You betcha!

I’ve been told that I am one of the most versatile engineers at Rockstar. That is likely in no small part due to my earlier work on this project. So what was this project? We were contracted by 2K to port the original Borderlands to XB1/PS4, add 4-player splitscreen, do a graphical upgrade, bring in some features of later games in the series like the minimap, and top it off with a release of this new version on PC as well.

One defining feature of this game was that 90% of it was written in Unrealscript. For those who have the luxury of not knowing what this is, it is the language built into versions of Unreal prior to UE4. The syntax is similar to C++/C#, but as a scripting language it’s…easier? Well, it’s certainly much slower. The player update tick took up a significant portion of the frame time. After adding three more of those in for splitscreen, the game was unplayable.

No matter. It turns out optimizing such a game was pretty easy. We added some custom labels in the engine code that ran the scripts so that it would spit out which functions were being run in the PIX / Razor profilers. We took the functions that were taking the longest, copy/pasted them into C++, and then fixed up the syntax. Easy. The game framerate was now fantastic, even with 4-players! This was great! It was great, right? Right?

Well, that’s what the goal was but it came with a few issues. It is common knowledge that a game update loop shouldn’t depend on the framerate, but apparently someone didn’t get the memo back in 2009 when this game was originally made. This led to some hilarious results. If any character in the game, the player included, fired a rocket…it would immediately explode in their face, killing them. The next update after firing a weapon now hit faster due to the increased framerate, so the rocket had not yet moved as far and the character using the weapon was still within the hitbox range. As much as I thought this was a great feature, it still had to be fixed.

Unfortunately, we managed to get the best of both worlds. Certain areas in the game were still slow due to uncommon functions being run that we had not yet optimized. Per the contract, we were at the mercy of 2K QA for all the performance issues and other bugs to be found and they were not very thorough at finding them all. I eventually ended up on the case of “the elevators of random death”. Whenever you took an elevator down, you would inexplicably find yourself bleeding to death as soon as it reached its destination.

It turns out that the elevator squash kill was being triggered for when an elevator lands on top of someone. But the player was on top! Well, visually, but for a few unnoticeable frames at the bottom…that was not the case. The framerate was slower so gravity had carried the player further down between updates and he was no longer considered on top of the elevator, so the logic that pushed him up to stay on it was not triggering. Conveniently, taking the elevator up worked because the gravity logic didn’t apply with the player’s feet on the surface.

Another problem arose where sniper bullets would go through people. The framerate was so slow in these circumstances that the bullet had completely passed the hitbox by the time the next update came around and it was on the other side. The framerate was too risky to mess with at this stage of the project, so we fixed the sniper bullets with a raycast collision check between the bullet position and where it was on the last update. The rockets were fixed by adding the character that fired them to an exclusion list. The elevators were fixed by moving the logic pushing the player up to before the gravity update.

Comical framerate issues aside, Unrealscript posed another major problem. There was a frequent memory stomp plaguing the project for weeks. Every programmer would crash on it at least a few times a day at unexpected times, making development a pain. The lead kept telling me not to worry about it, but I had enough. I asked him who was working on it and he told me no one was. 2K QA hadn’t bugged up the crashes. We were only obligated to fix what they found. Obligated or not, I told him not fixing it was wasting a lot more time. He said we didn’t have time to do anything about it. I told him I would look at it for 2 days max, and move on if I couldn’t figure it out. He hesitantly agreed.

Well, I synced back to a changelist that I recalled hitting the issue most often on and managed to track down an ~80% repro by starting a multiplayer session. The crash was always in some external audio library code. I set up symbols for them as they were not set up for our project, so I could debug what was going on. It was attempting to call a function on an element in a massive linked list of audio effects that it was looping through. I had a look at the vfptr on the problematic object. It was pointing to a vftable for one of our networking classes!

I looked for references to that class in our code, but all of the instances of it were being allocated directly from the Unreal allocator. It was an Unrealscript class that was marked native so that C++ code could utilize it, which basically exported it to a header file. I didn’t see how that could be wrong, so I figured something must be stomping this library’s pointer to point to the wrong location. This seemed to make sense. After all, aside from the bogus vfptr on it…all the following data on the object seemed to be valid and was similar to expected values on other items of the list. I realized these values always seemed to be the same on the one we crashed on though.

With the use of some hefty conditionals that slowed the game to a crawl to be specific enough to check for exactly that object, I managed to hit a breakpoint on it before the crash occurred. Then going back earlier, I was able to hit it at the point of allocation. But it was just being allocated by the audio library’s dlmalloc! The vfptr was valid at the point of allocation, so I stuck a data breakpoint on it and sure enough it was being overwritten by the Unreal allocator allocating our networking class instance! Two allocators both thought the same location was available for use.

I dug into the Unreal allocation code. The pointer being returned for use was somewhat before the location of the audio object’s vfptr location. I did a sizeof our networking class and added that to the location and the result was 16 bytes after the vfptr. Clearly, there wasn’t actually enough room for this Unreal object here. What was going on!? I looked at the logic it had for checking the size, and noticed the size it was checking against was 16 bytes less than the actual size. I did another sizeof for sanity and the numbers mismatched.

Where did Unreal’s number come from? Well, it appeared to be cached off in static memory corresponding to the object type. I took a look at the Unrealscript code for the class. The last two member pointers weren’t listed in the usual place there! But they were in the header. Unrealscript allows you to declare some code to be thrown into the generated header with a cpptext block, and the two additional members were being declared within that block in the Unrealscript code. It turns out Unreal lets you declare members there, but then just ignores them when actually allocating space for the object! An ancient bug hidden within the UE3 engine itself.

Well that was a bit of a tangent to discuss that one bug, but I hope you found it as interesting as I did when I uncovered it. Quickly changing topics…there’s plenty more to discuss about this project, but I suppose I’ll settle for just addressing one last thing.

A week before we sent the final builds off to be vetted by Sony and Microsoft, I noticed while working on a networking bug that each time I created a new character…my previous character was gone the next time I restarted the game! I just happened to notice my character list wasn’t getting bigger and tested it out. I figured it was likely some debug functionality proving to be a pain, but I checked a release build just to be sure. The same thing happened. The characters were legitimately being deleted by final code. This was a big problem.

The bug itself wasn’t too interesting. A separate save path had been carved out of code for saving networking games in the autosave case. It was terribly redundant, but more importantly…used the wrong index when saving, so it wiped whatever other character happened to be there. A clear problem with a clear fix. I ran some tests and it resolved it, so I went to the lead to get authorization to submit the code.

“I don’t know if that’s entirely safe this close to build finalization”, he stated. I was a bit set aback so restated that the issue was that a random character save got deleted every single time you play online. But he had understood the first time. “That’s not as bad as the game crashing.” A true statement, but one I feel was lacking all context of the current situation. I simply told him this is one of the worst problems a game can have still, so we really want it in the build. He said to find someone to test it more thoroughly and he would consider it.

We had no internal QA and I sure wasn’t allowed to tell 2K QA about this horrible problem so close to finalization, but I talked to some people and got a member of QA pulled off another project for a few days to help with some testing. I stayed in contact with him and he verified specifically that it fixed the save issue in a variety of different setups and just played through various single player and multiplayer stages of the game for two days straight. Everything was good. I went back to the lead to get permission to submit the fix again.

He replies “Absolutely not.” If I was stunned the last time, I nearly fell over in shock this time. “I’ve decided that it’s not worth the risk and 2K QA haven’t noticed it, so we aren’t obligated to fix it. The bug stays.” I was silent for a few moments before stating the logic that was plain as day to me. “Either Sony and Microsoft find this problem, and 2K is mad that our code is bad and we have to submit the fix anyway…or it makes it to retail and 2K and the players are mad and we have to submit the fix anyway, after our company’s reputation takes some damage. Regardless, we are submitting this fix in the end. The lead simply said I was generalizing the situation and I had no idea what would actually happen.

I had never done this before nor since, but I felt the judgement was so poor that I went above his head to inform a higher-up in the company. The higher-up basically said I should respect my senior and accept that his judgement was probably better than mine since he had more experience. “We can’t try to be virtuous for the good of the game when we’re not being paid for it. We have to think about our own employees…what if someone has to work OT because this fix causes the game to crash? That’s not healthy.”

It was at this moment that I realized that I was not going to work at a “work for hire” company much longer nor ever again. The realization hit like a sack of bricks to the face. No one cared about the player experience or the state of the game. We were just here to do a job and get paid. I am one of the most cautious people about making changes to a game unnecessarily…but even with the work for hire environment, I’d still say this decision involved a hefty amount of paranoia and significant lack of foresight.

So what happened? Well, Sony noticed that all the saves were being deleted when they played online sessions and failed the submission. Microsoft did not. What do you think happened after that? Naturally, we branched the code to have a PS4-specific version and submitted the fix only to that branch and resubmitted. I protested again, but it landed on deaf ears. It was so silly at that point that it was hard to even stay mad. So, the save deletions made it to retail on XB1 only. A few thousand players got incredibly pissed off within the first few days after release and 2K got us to put out an emergency day 1 hotfix that had very little testing.

I suppose it was fixed in the end at least, but a few headaches could have certainly been avoided with no difference in risk. But only after some reviews slammed the game as the buggiest game of the year. I am still proud to have worked on it for helping to get it into a better state, but I am definitely glad to be out of the “work for hire” business.