Cloud Speech-to-Text

Speech-to-text conversion powered by machine learning and available for short-form or long-form audio.

Do you want to take service from certified Power BI experts? For more details  please contact with us through order form.

Powerful speech recognition

Google Cloud Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google’s machine learning technology.

Powered by machine learning

Apply the most advanced deep-learning neural network algorithms to audio for speech recognition with unparalleled accuracy. Cloud Speech-to-Text accuracy improves over time as Google improves the internal speech recognition technology used by Google products.

Recognizes 120 languages and variants

Cloud Speech-to-Text can support your global user base, recognizing 120 languages and variants. You can also filter inappropriate content in text results for all languages.

Automatically identifies spoken language

Using Cloud Speech-to-Text you can identify what language is spoken in the utterance (limit to four languages). This can be used for voice search (such as, “What is the temperature in Paris?”) and command use cases (such as, “Turn the volume up.”)

Returns text transcription in real time for short-form or long-form audio

Cloud Speech-to-Text can stream text results, immediately returning text as it’s recognized from streaming audio or as the user is speaking. Alternatively, Cloud Speech-to-Text can return recognized text from audio stored in a file. It’s capable of analyzing short-form and long-form audio..

Offers selection of pre-built models, tailored for your use case

Cloud Speech-to-Text comes with multiple pre-built speech recognition models so you can optimize for your use case (such as, voice commands). Example: Our pre-built video transcription model is ideal for indexing or subtitling video and/or multispeaker content and uses machine learning technology that is similar to YouTube captioning.

Automatically transcribes proper nouns and context-specific formatting

Cloud Speech-to-Text is tailored to work well with real-life speech and can accurately transcribe proper nouns (such as, Sundar Pichai) and appropriately format language (such as, dates, phones numbers). Google supports more than 10x proper nouns compared to the number of words in the entire Oxford English Dictionary.

Cloud Speech-to-Text features

Speech-to-text conversion powered by machine learning.

Automatic Speech Recognition

Automatic Speech Recognition (ASR) powered by deep learning neural networking to power your applications like voice search or speech transcription.

Global Vocabulary

Recognizes 120 languages and variants with an extensive vocabulary.

Phrase Hints

Speech recognition can be customized to a specific context by providing a set of words and phrases that are likely to be spoken. This is especially useful for adding custom words and names to the vocabulary and in voice-control use cases.

Real-time Streaming or Prerecorded Audio Support

Audio input can be streamed from an application’s microphone or sent from a prerecorded audio file (inline or through Google Cloud Storage). Multiple audio encodings are supported, including FLAC, AMR, PCMU, and Linear-16.

Auto-Detect Language BETA

When you need to support multilingual scenarios, you can now specify two to four language codes and Cloud Speech-to-Text will identify the correct language spoken and provide the transcript.

Noise Robustness

Handles noisy audio from many environments without requiring additional noise cancellation.

Inappropriate Content Filtering

Filter inappropriate content in text results for some languages.

Automatic Punctuation BETA

Accurately punctuates transcriptions (e.g., commas, question marks, and periods) with machine learning.

Model Selection BETA

Choose from a selection of four pre-built models: default, voice commands and search, phone calls, and video transcription.

Speaker Diarization BETA

Know who said what – you can now get automatic predictions about which of the speakers in a conversation spoke each utterance.

Multichannel Recognition BETA

In multiparticipant recordings where each participant is recorded in a separate channel (e.g., phone call with two channels or video conference with four channels), Cloud Speech-to-Text will recognize each channel separately and then annotate the transcripts so that they follow the same order as in real life.


There are no reviews yet.

Be the first to review “Cloud Speech-to-Text”