c:\prosogram\
prosogram.praat script to make prosograms prosoprep.praat script to compute parameter files prosomain.praat include file containing main procedure etc. segment.praat include file to obtain various types of segmentation stylize.praat include file to make stylization util.praat include file with utilities histogram.praat include file for computing histogram prosoplot.praat include file containing plotting proceduresYou only have to remember the first filename.
Several speech segmentation types are available. The choice of the segmentation has an impact on the resulting stylisation. In all cases, the time intervals will be adjusted to the voiced regions for which pitch is defined. Moreover, pitch discontinuities (such as octave jumps) will lead to a truncation of the nucleus region to be stylized.
Some types of segmentation require an annotation file, which is called a TextGrid in Praat. A TextGrid object (or the corresponding file) contains one or more layers (called "tiers") of text labels which are time-aligned with the speech signal. Typically a TextGrid is used to store a phonetic alignment, indicating which part of the speech signal corresponds to which sound.
The segmentation types are listed below.
| Manual segmentation into vocalic nuclei |
This approach uses a TextGrid
in which some tier contains a phonetic alignment (one sound per interval)
indicating at least the vowels.
This tier is the tier the name of which starts with phon
(it could be phon, phonemes, phons, phones ...),
if it exists, or the first tier in the TextGrid, otherwise.
The stylized portions depend on
(1) the vowel boundaries, and
(2) the intensity drop relative to the local peak inside the vowel.
(Note: This was the only segmentation method available in versions before version 1.3.6.) |
| External segmentation | This requires a TextGrid with a tier, the name of which starts with segm. This tier contains units (such as vowels, syllables, syllabic nuclei...) which will be stylized if their label is recognized as a vowel. This method applies no segmentation on the basis of intensity. Typically, it will be used when one wants to use an external segmentation or when one wants to avoid the nucleus segmentation. |
| Automatic detection of syllabic nuclei | This approach uses a segmentation into local peaks in the intensity of band-pass (300-3500 Hz) filtered speech, adjusted on the basis of the intensity (full bandwith). The TextGrid is optional and it is used only for plotting. |
| Automatic segmentation into syllables and syllabic nuclei | This approach is similar to the previous, but it also tries to identify syllable boundaries. |
The annotation file (TextGrid) may contain any number of tiers. The non-automatic segmentation types require an interval tier used for the identification of vowel-like intervals or voiced parts. As a result the consonant labels are not required. When using an external segmentation which does not recognize vowel timbre, one can use arbitrary vowel labels. Typically a second tier contains the text representation (1 word per interval).
These input files should have the same base filename, but a different filename extension, e.g.
abc001.wav abc001.TextGrid
The prosogram script will look for several parameter files.
If these files are available in the directory of the speech file,
they will be read. Otherwise the parameters will be calculated
and will be saved in files, which will be available in the directory
of the speech file after the script terminates.
The parameters files have the same base filename, but different filename
extensions.
For instance for the file abc001.wav, we have the following files:
abc001.Pitch
abc001_BP.Intensity
(this is used in automatic segmentation based on band-pass filtered speech)
When processing large speech files, parameter calculation takes quite some time, so it is convenient to create the parameter files before making the actual prosograms. The Prosogram script contains a command to calculate the pitch of a speech file and to write it to the directory, replacing a previous version, if available. Alternatively one may ue the script prosoprep.praat.
An additional advantage of using prosoprep.praat for parameter calculation is that it can be called by praatcon, the command-line version of the Praat program, which runs faster than Praat because it doesn't use graphics.
C:\prosogram> praatcon prosoprep.praat *.wav 60 450 0.005 0
Note that the arguments on the command line are the same as those which are appear in the form when the script is run from within Praat: filename, minimum pitch, maximum pitch, frame period and so on.
When running prosogram.praat, a form appears which allows the user to determine which prosograms are made, and how. A screen resolution of min 1024 by 768 pixels is needed to display the entire form.
All options have default values, such that the program usually produces the desired results without changing these options. The only exception is the field for the input filename(s).
Examples
Output pages will have numbered filenames.
The script can perform several tasks. These are listed below.
| Prosogram and prosodic profile | Performs all necessary steps: parameter loading or calculation, loading or computation of segmentation, stylization, and prosodic profile calculation. The prosograms are saved in graphics files. |
| Interactive prosogram | Performs all steps and draws prosogram and segmentation in a graphics window. This window enables the user to interactively scroll through the signal, play parts or time intervals (by clicking on an interval of the segmentation), and to resynthesize the signal using the stylization as the pitch contour. |
| Prosodic profile only (no drawing) | Performs all steps except drawing and saving graphics files. This is used for statistical analysis of large files or sets of files. |
| Recalculate pitch for entire sound | This is used to replace an existing pitch parameter file
by another one using different settings for pitch detection
(such as lower and upper limits of the accepted pitch range).
When Prosogram needs the pitch parameter, it will first check whether a pitch file is available in the directory of the speech file. The pitch detection settings are not stored in the pitch file and therefor Prosogram cannot check whether they are identical to the ones indicated by the user. When you need settings for pitch detection that are different from the ones used for the available pitch file, then you should run this command. |
| Make automatic segmentation into syllables and save | Calculates a segmentation of the speech signal into syllables (or pseudo-syllables), based on acoustic criteria (rather than phonological criteria). This segmentation is saved to a TextGrid file with a filename consisting of the basename of the speech file followed by _auto.TextGrid. |
| Plot pitch in semitones, with annotation. No stylisation. | Plots pitch tracing together with the selected annotation tiers, in the same manner the prosograms are plotted. No segmentation or stylization is made. |
Depending on the task you have selected, you can now select options that are relevant for that option. Irrelevant options are ignored.
| Input files |
Indicate the speech file or files to process.
Use a wildcard (*) to select multiple files, e.g.
abc*.wavWhen multiple files are selected, the program will analyze them in alphanumeric order, e.g. "a01.wav", "a02.wav", "a24.wav", "b00.wav", ...If no directory is specified in the field "Input files", the active directory is used, i.e. the directory from which the scripts are loaded. Include the directory (starting at the drive specification) if necessary, e.g. c:\my_dir\my_corpus\*.wav (for Windows) c:/my_dir/my_corpus/*.wav (also accepted on Windows) /Users/my_dir/my_corpus/*.wav (for Mac)When the field Input files is left empty, a dialog window will pop up to select the file to be analysed. In this case, supply the other parameters before selecting the input file. |
| Time range (s) | Start time and end time for the analysis may be specified. The default end time value "0.0 (=all)" automatically sets the end time to the end of the speech signal. | ||||||||
| F0 detection range (Hz) | Minimum and maximum frequencies used for F0 detection,
with default values at 60 Hz and 450 Hz, resp.
These values may be adjusted to accomodate speakers with very low
or high pitch registers (such as young children) and to avoid
octave jumps and other discontinuities explained by an inadequate F0 detection range.
When F0 detection errors are found, you should select the task Recalculate pitch for the entire sound file with appropriate F0 range settings, in order to obtain new pitch data. | ||||||||
| Parameter calculation |
The "Partial" mode is useful when analysing huge signal files, for which no parameter files are available or for which only part of the phonetic segmentation is available. It avoids unnecessary calculations. |
||||||||
| Frame period | Time interval between successives values of pitch, intensity. See below for details. | ||||||||
| Segmentation method |
Select one of the following segmentation types, which are described in detail
here.
|
||||||||
| Thresholds | Thresholds used the stylization algorithm, for glissando threshold, differential glissando threshold and minimum duration of tonal segments. | ||||||||
| Save intermediate data |
Saves data in files for later use.
|
| View | Select a particular prosogram format: compact, compact rich, wide, wide rich. See the website for illustrations of these formats. |
| Time interval per strip | Time interval for one strip, i.e. one prosogram. The default value of 3 seconds is recommended, because the resulting prosogram are very readable when printed on a standard A4 paper, in portrait layout. Using a same value throughout all prosograms leads to easy interpretation, as is the case with classic spectrograms. |
| Tiers to show |
Select the tiers from the TextGrid which will be plotted in the prosograms.
This allows to plot a selection of tiers, to hide the others,
and to specify the order in which tiers will appear in the prosogram.
The tiers may be indicated by their number or name.
As asterisk preceeding the tier number of name, indicates that this tier will be converted from SAMPA to IPA. This is typically used for the phonetic transcription tier. |
| Pitch range |
Values in ST (semitones) for the minimum and maximum along the Y axis (excluding space taken by the textgrid).
The default is automatic pitch range selection, based on the distribution of the pitch targets in the stylization, with a default minimum pitch range of 2 octaves (24 ST). The calibration (horizontal dotted) lines are separated by 2 ST. |
| Output mode |
|
||||||||
| Output format |
|
||||||||
| Output path and fileame |
Directory and filenmame used for writing the prosogram graphics files.
When no path is specified, output files are written to the script directory. The default value "<same_as_input>", indicates graphics files will be saved in the directory where the input data (speech, annotation) was read. Successive graphics files are numbered automatically: "001.eps", "002.eps", and so on. The extension depends upon the file type. An optional filename part may be specified. For instance, "TEST_" results in files "TEST_001.eps", "TEST_002.eps", etc. |
The download page contains a ZIP archive file (testdata.zip) with a sound file and a Textgrid file for testing. It also includes a graphics file (eps) showing the kind of result you will obtain when selecting the wide rich format, with 3 tiers.
It is useful to select small speech files when testing. This gives an idea of the time required for parameter calculation and stylisation.
The prosogram script produces high quality graphics files in one of the following file formats:
To view EPS files, use the GSview viewer (for Windows, OS/2, Linux), which is freely available here. (Look for "GSview 4.9") GSview in turn requires Ghostscript, which is also available at the same place. Choose a recent version, such as "GPL Ghostscript 9.02". Download both files; then install Ghostscript first and GSview last.
The EPS files can be printed on a Postscript printer using GSview. When using a standard (non-Postscript) printer, first use GSview to convert EPS file to a PDF file, which can be printed on a normal printer using Adobe Acrobat.
Prosograms in EMF file format may be inserted directly into Word using "Insert | Picture | From file...". They will appear on the screen and print normally.
When you include an EPS file in a Word document , Word will print the EPS graphics, but only on a Postscript printer. Moreover, Word will not display the graphics on the screen, but will show a box instead (unless you incorporate a graphics "Preview" in the EPS file). When you print the document to a normal (i.e. non Postscript) printer, the box appears on the paper.
Prosogram files in EMF format can be inserted directly into Powerpoint.
To insert an EPS file, convert the EPS file to a bitmapped graphics file (as explained below), and include this in your Powerpoint presentation in the usual way. However, bitmapped graphics files will be scaled by Powerpoint, possibly resulting in unclear images.
GIF and PNG are bitmapped graphics. In bitmapped graphics, the image is represented as an array of pixels. When the size of such images is changed, this usually results in bad quality images.
If you don't have a Postscript printer, you have the following options. The first option is recommended.
In interactive mode, a window pops up showing the prosogram and the annotation tiers selected by the user. The top of this window shows a series of self-explanatory buttons to scroll the time axis, to play the interval shown in the window, to play the resynthesis of the signal using the stylized pitch, to display the values of the pitch target (in ST). Clicking on an interval in the annotation will play this interval. All settings (segmentation type, thresholds, analysis interval...) are chosen from the main Prosogram form.
Interactive mode is activated in the View menu of the script form.
When using interactive mode, the program will load the segmentation and stylization from data files, when available, and provided the settings selected by the user are identical to those that were used to obtain the data files.
When working on a large corpus that requires substantial computation time,
the following procedure is recommended.
First obtain the normal prosogram with the settings of your choice and with the option
"Save intermediate data" checked.
The segmentation and stylization will be saved in files.
Then run interactive mode with the same settings.
The prosogram will appear in the interactive window in a few seconds,
even for longer speech files.
Read from file... fg00150.wav Read from file... fg00150styl.PitchTier select Sound fg00150 To Manipulation... 0.01 60 600 select PitchTier fg00150styl plus Manipulation fg00150 Replace pitch tier select Manipulation fg00150 Get resynthesis (PSOLA)
The stylization uses a segmentation of the speech signal into a sequence of nuclei, ideally corresponding to syllabic nuclei. The availability of this segmentation allows for the computation of statistical data about the prosodic properties of these nuclei, as well as of the sequences of nuclei. For each input file the prosogram generates a file containing the prosodic profile of the speech signal.
A sample profile is shown below. Footnotes explain how these figures should be interpreted. However, additional information is given below.
Prosodic profile for input file: C:/corpus/Barthes.wav
Date: Sun Jun 20 15:10:44 2010
Segmentation type: asyll
Time:
total speech time=654.78(s) [Note 1]
estimated phonation time=533.27 (81.44% of speech time) [Note 2]
estimated pause time=121.51 (18.56% of speech time) [Note 3]
Nucleus: 2931 nuclei in signal
Nucleus duration:
mean=0.086(s) stdev=0.048 summed nucleus duration=253.15 (s)
Global pitch measures:
Quantiles of min and max F0 values of nuclei before stylisation:
2%=65.29(Hz), 5%=68.69(Hz), 50%=89.73(Hz), 95%=144.47(Hz), 98%=165.24(Hz)
Quantiles for median pitch values of nuclei before stylisation:
2%=67.70(Hz), 5%=70.61(Hz), 50%=88.96(Hz), 95%=145.00(Hz), 98%=162.36(Hz)
Pitch measures for stylized nuclei:
min=60.98(Hz), max=440.88(Hz), mean of median=97.85 [Note 7]
Quantiles of low & high pitch values of nuclei after stylisation:
2%=67.01(Hz), 5%=70.34(Hz), 50%=89.25(Hz), 95%=146.27(Hz), 98%=164.38(Hz)
Intrasyllabic pitch interval:
dynamic=16.17%, of which: rising=20.04%, falling=79.96% [Note 4]
mean=-1.68(ST) stdev=2.42
sum=-4929.74(ST) sum/nucleus_time=-19.47(ST/s) [Note 5]
trajectory=7433.36(ST) trajectory/nucleus_time=29.36(ST/s) [Note 6]
Intersyllabic pitch interval:
mean=0.51(ST) stdev=5.07
sum=1490.60(ST) sum/inter_nucleus_time=-19.47(ST/s) [Note 5]
trajectory=9575.19(ST) trajectory/inter_nucleus_time=34.18(ST/s) [Note 6]
All pitch intervals:
trajectory(speech)=17008.55(ST) [Note 1,2,6]
trajectory(speech)/speech_time=25.98(ST/s)
trajectory(phonation)/phonation_time=28.74(ST/s)
Notes
[1] speech time = internucleus time + intranucleus time + pause time
[2] phonation time = internucleus time + intranucleus time
[3] estimated pause = when internucleus time >= 0.3
[4] dynamic = when pitch variation exceeds glissando threshold
rising = when sum of upward intervals > sum of abs(downward intervals)
[5] sum = sum of signed pitch intervals (falls and rises compensate)
[6] trajectory = sum of absolute pitch intervals (falls and rises add up)
[7] mean of median = mean of median pitch of all nuclei
For some properties, absolute as well as normalized values are shown. The latter are computed by dividing the absolute value by the relevant time interval. Here is an example. The pitch trajectory indicates the sum of absolute pitch intervals, i.e. without taking into account the direction (upward or downward) of the movement. The normalized pitch trajectory indicates the total trajectory divided by time; cumulated intrasyllabic pitch movement is divided by cumulated intrasyllabic time, intersyllabic pitch movement is divided by intersyllabic time, total pitch movement is divided by total speech time.
The profile is written to a file named basename + _profile.txt. Moreover, a spreadsheet is generated containing the values in the prosodic profile of each file being processed. This file is named basename + _spreadsheet.txt. When speakers and stimulus types are combined in separate files, the spreadsheet will show the profile results for the speakers and stimulus types.
The input text file contains 3 columns: label, start_time, end_time. Columns may be separated by any of the separator characters given in 'seps$', typically a blank, a tab or a comma. Time intervals in the inputfile need not be contiguous, but can be. However, time intervals should not overlap. Times may be specified in the time unit selected in the script form. The labels should be given in a form such that prosogram is able to identify the vowels (see above). Input files can be specified using wildcards, to convert multiple files in one step. Output files are written to the directory of the input files.
The script is available here.
The script is available here.