Home Forums Soap Box speech recognition in games

Welcome to our forums. These forums were active from 2003-2014. We have now decided to close them down, but will leave them here as an archive.

Remember you can send us feedback, news, jobs and content ideas by clicking here.

If you're really stuck for time, email news@gamedevelopers.ie.

You can also follow us on Twitter @gamedev_ie 

 

 

This topic contains 6 replies, has 7 voices, and was last updated by  obscure 11 years, 4 months ago.

  • Author
    Posts
  • #5171

    hannaleen
    Participant
  • #30296

    Nifty
    Participant

    Take a quick look at Unreal Tournament 2004. I’m sure other games must use this technique, but UT2004 is the only one I kno for certain.

    UT uses windows own speech to text features and then interprets the text. There’s no need for the game programmers to try and reinvent the wheel becuase the OS takes care of the interpretation, and the game need only work with its own custom library…..

    At least thats how it seems to work. Improving your windows voice profile certainly improves the responsiveness of the game.

  • #30304

    keyo
    Participant

    Here’s a good review of the (wierd) virtual pet game Seaman for Sega Dreamcast that made use of the DC’s microphone addon:
    http://www.gamenationtv.com/reviews/seaman.shtml

  • #30678

    Steph
    Participant

    I use speech recognition software daily, and have been doing so for a number of years now, initially IBM’s ViaVoice (mid-90s), experimentation with other less-heard of packages, experimentation with Windows Speech-to-text when Xp appeared with the feature, and finally settled on Dragon’s Naturally Speaking 2 years ago.

    I can’t say I’ve much experience of SR in games (but can say with some authority I’ve plenty of game experience – over 20 years’ worth :lol:), but one thing that strikes me, still to this day, is the amount of computing resources required for a stable, working solution.

    From experience, Dragon is best (I use SR both in French and English, and I’ve been told my English ain’t so bad/accented :wink: – actually use Dragon more in English than French, but less errors in French than English… go figure! :roll: ) but it does need quite a bit of oomph CPU + RAM wise – surprisingly so for RAM, which makes muh more of a difference than CPU where accuracy is concerned. It’s reasonably accurate on a P-M 1.6 with 512 RAM DDR, much more accurate on a A64 3000+ with 1GB RAM DDR, but still not 100%, or even 95% for that matter.

    My concern for a stable in-game solution, putting aside the closed nature of console hardware for a moment (but which you’d have to contend with as I expect more and more XB360/PS3 to use the feature), is what impact this feature would have on game performance and *possibly* what trade offs would have to be implemented to maintain the gameplay experience (FPS/bots/eye candy).

    Moreover, I would imagine that there would be quite a vast amount of work to be done when you’re contemplating localisation – you can’t expect all of your market to be fluent in English, and if you were providing an English-only solution still, then you’d have to compensate for accents, to the risk of putting off basically any gamer who’s not US, UK, IE, AU, NZ etc. So, in that respect, perhaps voice recognition is not yet mainstream for cost reasons, rather than tech reasons.

    But it would be nice to -say- play BF2 SP with bots with which you can interact through voice recognition (instead of using the Q key and selecting a standard msg with mouse), and even roll this feature into MP also (and in doing so, mitigate the amount of faffing about you have to do to get Teamspeak going and/or reduce bandwidth requirements of the BF2 built-in VoIP app). So, there certainly is bags of potential for the technology in games still…

  • #30699

    steve_skittles
    Participant

    i think voice recognition could be put to good use. i like the BF2 idea but i also think that a tactical shooter on the Revolution could utilise voice recognition, for example point to a location on screen with the controller and issueing a command to your squad, Eg: take cover, give covering fire and so on. so yeah i think voice recognition can be a good feature if used wisely. as for voice recognition nowadays well i havent seen anything yet that has captured my eye, but i havent played UT so maybe thats good i dont know really :(

  • #30702

    mal
    Participant

    I think spoken ( or shouted! ) phrase detection, rather than speech recognition, will be more prevelant.

    As there are only so many orders you can give in a game ( eg “Shoot the feckin fecker” rather than an order like “4 pints of guinness mate” ), the challenge would be to quickly allow for fuzzy comparison between what is said, and what it might be similar to ( if anything ).

    With modern GPUs being able to do additional non-3D processing like comparing data ( images ), and even physics ( latest Havok announcement ), and some sound processing ( assuming there is no dedicated SPU… if there is, assume SPU is similar to GPU in performance terms of processing dedicated sound data ) then it should be able to quickly take a spoken and recorded ( live ) phrase, apply effects very fast ( stretch, distort, compare sections of what was spoken to certain recorded sounds ( or shapes of sound phrases ), and see what phrases more or less match up. Then, if one is found, issue that “order guinness” command to the AI barstaff.

    Mal

  • #31005

    obscure
    Participant

    My concern for a stable in-game solution, putting aside the closed nature of console hardware for a moment (but which you’d have to contend with as I expect more and more XB360/PS3 to use the feature), is what impact this feature would have on game performance and *possibly* what trade offs would have to be implemented to maintain the gameplay experience (FPS/bots/eye candy).[/quote:7724cc9287] From a development point of view this is certainly the big one. The large system resource requirements would currently be a real problem.

    Moreover, I would imagine that there would be quite a vast amount of work to be done when you’re contemplating localisation… [/quote:7724cc9287] Not as much as you might think, provided that the VR was used in a structured way within the game. For example, in a military game you would have standard commands such as “attack”, “enemy 12-o-clock”, “take cover”, “suppressing fire” etc. From a programming point of view these are just command 1, command 2, command 3 etc. The program knows that “advance” is command 1 and when it detects it it tells the game what to do. If you replace the English dictionary/database with a Italian database the Italian equivalent of attack would still be command 1* and the program would deal with it in exactly the same way.

    But it would be nice to -say- play BF2 SP with bots with which you can interact through voice recognition…[/quote:7724cc9287] If we ignore the system resource issue then this becomes quite easy. Point your aiming cross-hair at an enemy machine gun nest and say/shout “suppressing fire” and the program would tell your AI team mates what map co-ordinate you were looking at and what command number “suppressing fire” was. They would then use their AI to find the nearest cover and go into their suppressing fire routine (pop off a few shots, duck down rinse and repeat). The enemy AI would know that they are on the receiving end of suppressing fire and would act accordingly (duck down, attempt to spot your team, attempt to return fire but in a less coordinated fashion than if they were not under fire).

    As there are only so many orders you can give in a game ( eg “Shoot the feckin fecker” rather than an order like “4 pints of guinness mate” ), the challenge would be to quickly allow for fuzzy comparison between what is said, and what it might be similar to ( if anything ).[/quote:7724cc9287] This is actually an AI problem, not a voice recognition issue. In a military game you could get around this by making it part of the game play that orders must be given clearly. The military train to use standard commands so that everyone clearly understands. “Enemy 12-o-clock” is a clear command warning of enemy and its location “Look out, Krauts” would be a poor command and if the AI didn’t understand it it would be reasonable to have them hit the dirt but not know what to do until a clear command was given.

    *Actually the Italian equivalent of “attack” is “surrender” but that is a different issue ;)

The forum ‘Soap Box’ is closed to new topics and replies.