rhubarb-lip-sync/lib/pocketsphinx-5prealpha-2015.../doc/pocketsphinx_batch.1

251 lines
5.6 KiB
Groff
Raw Normal View History

2015-10-19 19:45:08 +00:00
.TH POCKETSPHINX_BATCH 1 "2007-08-27"
.SH NAME
pocketsphinx_batch \- Run speech recognition in batch mode
.SH SYNOPSIS
.B pocketsphinx_batch
.RI \fB\-hmm\fR
\fIhmmdir\fR
\fB\-dict\fR
\fIdictfile\fR
[\fI options \fR]...
.SH DESCRIPTION
.PP
Run speech recognition over a list of utterances in batchmode. A list
of arguments follows:
.TP
.B \-adchdr
Size of audio file header in bytes (headers are ignored)
.TP
.B \-adcin
Input is raw audio data
.TP
.B \-agc
Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
.TP
.B \-agcthresh
Initial threshold for automatic gain control
.TP
.B \-allphone
phoneme decoding with phonetic lm
.TP
.B \-allphone_ci
Perform phoneme decoding with phonetic lm and context-independent units only
.TP
.B \-alpha
Preemphasis parameter
.TP
.B \-argfile
file giving extra arguments.
.TP
.B \-ascale
Inverse of acoustic model scale for confidence score calculation
.TP
.B \-aw
Inverse weight applied to acoustic scores.
.TP
.B \-backtrace
Print results and backtraces to log file.
.TP
.B \-beam
Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
.TP
.B \-bestpath
Run bestpath (Dijkstra) search over word lattice (3rd pass)
.TP
.B \-bestpathlw
Language model probability weight for bestpath search
.TP
.B \-build_outdirs
Create missing subdirectories in output directory
.TP
.B \-cepdir
files directory (prefixed to filespecs in control file)
.TP
.B \-cepext
Input files extension (suffixed to filespecs in control file)
.TP
.B \-ceplen
Number of components in the input feature vector
.TP
.B \-cmn
Cepstral mean normalization scheme ('current', 'prior', or 'none')
.TP
.B \-cmninit
Initial values (comma-separated) for cepstral mean when 'prior' is used
.TP
.B \-compallsen
Compute all senone scores in every frame (can be faster when there are many senones)
.TP
.B \-ctl
file listing utterances to be processed
.TP
.B \-ctlcount
No. of utterances to be processed (after skipping \fB\-ctloffset\fR entries)
.TP
.B \-ctlincr
Do every Nth line in the control file
.TP
.B \-ctloffset
No. of utterances at the beginning of \fB\-ctl\fR file to be skipped
.TP
.B \-ctm
output in CTM file format (may require post-sorting)
.TP
.B \-debug
level for debugging messages
.TP
.B \-dict
pronunciation dictionary (lexicon) input file
.TP
.B \-dictcase
Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
.TP
.B \-dither
Add 1/2-bit noise
.TP
.B \-doublebw
Use double bandwidth filters (same center freq)
.TP
.B \-ds
Frame GMM computation downsampling ratio
.TP
.B \-fdict
word pronunciation dictionary input file
.TP
.B \-feat
Feature stream type, depends on the acoustic model
.TP
.B \-featparams
containing feature extraction parameters.
.TP
.B \-fillprob
Filler word transition probability
.TP
.B \-frate
Frame rate
.TP
.B \-fsg
format finite state grammar file
.TP
.B \-fsgctl
file listing FSG file to use for each utterance
.TP
.B \-fsgdir
directory for FSG files
.TP
.B \-fsgext
extension for FSG files (including leading dot)
.TP
.B \-fsgusealtpron
Add alternate pronunciations to FSG
.TP
.B \-fsgusefiller
Insert filler words at each state.
.TP
.B \-fwdflat
Run forward flat-lexicon search over word lattice (2nd pass)
.TP
.B \-fwdflatbeam
Beam width applied to every frame in second-pass flat search
.TP
.B \-fwdflatefwid
Minimum number of end frames for a word to be searched in fwdflat search
.TP
.B \-fwdflatlw
Language model probability weight for flat lexicon (2nd pass) decoding
.TP
.B \-fwdflatsfwin
Window of frames in lattice to search for successor words in fwdflat search
.TP
.B \-fwdflatwbeam
Beam width applied to word exits in second-pass flat search
.TP
.B \-fwdtree
Run forward lexicon-tree search (1st pass)
.TP
.B \-hmm
containing acoustic model files.
.TP
.B \-hyp
output file name
.TP
.B \-hypseg
output with segmentation file name
.TP
.B \-input_endian
Endianness of input data, big or little, ignored if NIST or MS Wav
.TP
.B \-jsgf
grammar file
.TP
.B \-keyphrase
to spot
.TP
.B \-kws
file with keyphrases to spot, one per line
.TP
.B \-kws_delay
Delay to wait for best detection score
.TP
.B \-kws_plp
Phone loop probability for keyword spotting
.TP
.B \-kws_threshold
Threshold for p(hyp)/p(alternatives) ratio
.TP
.B \-latsize
Initial backpointer table size
.TP
.B \-lda
containing transformation matrix to be applied to features (single-stream features only)
.TP
.B \-ldadim
Dimensionality of output of feature transformation (0 to use entire matrix)
.TP
.B \-lifter
Length of sin-curve for liftering, or 0 for no liftering.
.TP
.B \-lm
trigram language model input file
.TP
.B \-lmctl
a set of language model
.PP
The
.B \-hmm
and
.B \-dict
arguments are always required. Either
.B \-lm
or
.B \-fsg
is required, depending on whether you are using a statistical language
model or a finite-state grammar. To do batchmode recognition, you
will need to specify a control file, using
.B \-ctl
This is a simple text file containing one entry per line. Each entry
is the name of an input file relative to the
.B \-cepdir
directory, and without the filename extension (which is given in the
.B \-cepext
argument).
.PP
If you are using acoustic feature files as input (see
.BR sphinx_fe (1)
for information on how to generate these), you can also specify a subpart
of a file, using the following format:
.PP
.RS
.B FILENAME START\-FRAME END\-FRAME UTTERANCE-ID
.RE
.SH AUTHOR
Written by numerous people at CMU from 1994 onwards. This manual page
by David Huggins-Daines <dhuggins@cs.cmu.edu>
.SH COPYRIGHT
Copyright \(co 1994-2007 Carnegie Mellon University. See the file
\fICOPYING\fR included with this package for more information.
.br
.SH "SEE ALSO"
.BR pocketsphinx_continuous (1),
.BR sphinx_fe (1).
.br