I think spoken ( or shouted! ) phrase detection, rather than speech recognition, will be more prevelant.
As there are only so many orders you can give in a game ( eg “Shoot the feckin fecker” rather than an order like “4 pints of guinness mate” ), the challenge would be to quickly allow for fuzzy comparison between what is said, and what it might be similar to ( if anything ).
With modern GPUs being able to do additional non-3D processing like comparing data ( images ), and even physics ( latest Havok announcement ), and some sound processing ( assuming there is no dedicated SPU… if there is, assume SPU is similar to GPU in performance terms of processing dedicated sound data ) then it should be able to quickly take a spoken and recorded ( live ) phrase, apply effects very fast ( stretch, distort, compare sections of what was spoken to certain recorded sounds ( or shapes of sound phrases ), and see what phrases more or less match up. Then, if one is found, issue that “order guinness” command to the AI barstaff.