Prosogram user guide

Installation

Segmentation types

Several speech segmentation types are available. The choice of the segmentation has an impact on the resulting stylisation. In all cases, the time intervals will be adjusted to the voiced regions for which pitch is defined. Moreover, pitch discontinuities (such as octave jumps) will lead to a truncation of the nucleus region to be stylized.

TextGrid annotation files

Some types of segmentation require an annotation file, which is called a TextGrid in Praat. A TextGrid object (or the corresponding file) contains one or more layers (called "tiers") of text labels which are time-aligned with the speech signal. Typically a TextGrid is used to store a phonetic alignment, indicating which part of the speech signal corresponds to which sound.

The segmentation types are listed below.

Input files

Prosograms are made using the script prosogram.praat.

The annotation file (TextGrid) may contain any number of tiers. The non-automatic segmentation types require an interval tier used for the identification of vowel-like intervals or voiced parts. As a result the consonant labels are not required. When using an external segmentation which does not recognize vowel timbre, one can use arbitrary vowel labels. Typically a second tier contains the text representation (1 word per interval).

These input files should have the same base filename, but a different filename extension, e.g.

Parameter files and efficient parameter calculation

The prosogram script will look for several parameter files. If these files are available in the directory of the speech file, they will be read. Otherwise the parameters will be calculated and will be saved in files, which will be available in the directory of the speech file after the script terminates.
The parameters files have the same base filename, but different filename extensions. For instance for the file abc001.wav, we have the following files:

When processing large speech files, parameter calculation takes quite some time, so it is convenient to create the parameter files before making the actual prosograms. The Prosogram script contains a command to calculate the pitch of a speech file and to write it to the directory, replacing a previous version, if available. Alternatively one may ue the script prosoprep.praat.

An additional advantage of using prosoprep.praat for parameter calculation is that it can be called by praatcon, the command-line version of the Praat program, which runs faster than Praat because it doesn't use graphics.

Note that the arguments on the command line are the same as those which are appear in the form when the script is run from within Praat: filename, minimum pitch, maximum pitch, frame period and so on.

Loading the prosogram script

Scripts are loaded in the normal way.

The form for specifying input files and options

When running prosogram.praat, a form appears which allows the user to determine which prosograms are made, and how. A screen resolution of min 1024 by 768 pixels is needed to display the entire form.

Prosogram Form

All options have default values, such that the program usually produces the desired results without changing these options. The only exception is the field for the input filename(s).

Examples

Output pages will have numbered filenames.

Selecting a task

Prosogram Form Task

The script can perform several tasks. These are listed below.

Depending on the task you have selected, you can now select options that are relevant for that option. Irrelevant options are ignored.

Selecting input files

The program can make single prosograms, a sequence of prosograms for a portion of the speech file, or a sequence of prosograms for a set of input files covering an entire speech corpus.

Setting analysis parameters

Plotting options

Output file options

Running the program.

Testing the program.

The download page contains a ZIP archive file (testdata.zip) with a sound file and a Textgrid file for testing. It also includes a graphics file (eps) showing the kind of result you will obtain when selecting the wide rich format, with 3 tiers.

It is useful to select small speech files when testing. This gives an idea of the time required for parameter calculation and stylisation.

Viewing and printing prosograms

The prosogram script produces high quality graphics files in one of the following file formats:

How do I view the prosograms?

To view EPS files, use the GSview viewer (for Windows, OS/2, Linux), which is freely available here. (Look for "GSview 4.9") GSview in turn requires Ghostscript, which is also available at the same place. Choose a recent version, such as "GPL Ghostscript 9.02". Download both files; then install Ghostscript first and GSview last.

How do I print prosograms?

The EPS files can be printed on a Postscript printer using GSview. When using a standard (non-Postscript) printer, first use GSview to convert EPS file to a PDF file, which can be printed on a normal printer using Adobe Acrobat.

How to include prosograms in a Word document?

Prosograms in EMF file format may be inserted directly into Word using "Insert | Picture | From file...". They will appear on the screen and print normally.

When you include an EPS file in a Word document , Word will print the EPS graphics, but only on a Postscript printer. Moreover, Word will not display the graphics on the screen, but will show a box instead (unless you incorporate a graphics "Preview" in the EPS file). When you print the document to a normal (i.e. non Postscript) printer, the box appears on the paper.

How to include prosograms in a Powerpoint presentation?

Prosogram files in EMF format can be inserted directly into Powerpoint.

To insert an EPS file, convert the EPS file to a bitmapped graphics file (as explained below), and include this in your Powerpoint presentation in the usual way. However, bitmapped graphics files will be scaled by Powerpoint, possibly resulting in unclear images.

How do I convert EPS files to other graphics formats?

GSview can convert EPS files to other graphics formats, such as Postscript, PDF, PNG..., using "File | Convert..."

GIF and PNG are bitmapped graphics. In bitmapped graphics, the image is represented as an array of pixels. When the size of such images is changed, this usually results in bad quality images.

How to display prosograms in HTML documents?

Convert the EPS file to a graphics format (such as GIF or JPEG) which can be displayed by the browser.

How to print a Word document including EPS files without a Postscript printer?

If you don't have a Postscript printer, you have the following options. The first option is recommended.

Interactive mode

In interactive mode, a window pops up showing the prosogram and the annotation tiers selected by the user. The top of this window shows a series of self-explanatory buttons to scroll the time axis, to play the interval shown in the window, to play the resynthesis of the signal using the stylized pitch, to display the values of the pitch target (in ST). Clicking on an interval in the annotation will play this interval. All settings (segmentation type, thresholds, analysis interval...) are chosen from the main Prosogram form.

Interactive Mode

Interactive mode is activated in the View menu of the script form.

When using interactive mode, the program will load the segmentation and stylization from data files, when available, and provided the settings selected by the user are identical to those that were used to obtain the data files.

When working on a large corpus that requires substantial computation time, the following procedure is recommended.
First obtain the normal prosogram with the settings of your choice and with the option "Save intermediate data" checked. The segmentation and stylization will be saved in files. Then run interactive mode with the same settings. The prosogram will appear in the interactive window in a few seconds, even for longer speech files.

Advanced topics

Annotation labels recognized as vowels

Intervals are considered vowels

Frame period

This is the time interval between successives values of pitch, of intensity or other parameters. It should be identical for all mentioned parameters. Using the frame period of 0.005 s (or 5 ms, or 200 Hz frame rate) is recommended for high precision prosograms. To obtain parameter files with the same frame rate (using prosogram or prosoprep), it may be necessary to rename or delete the previous parameter files if they were computed using a different frame period.

Exporting the stylization to another program

Select the "Save intermediate data" option. A file in the Pitchtier format will be saved when the script ends. (The filename is basenamestyl.Pitchtier, where basename represents the basename of the speech file.) Open this file in Praat for further conversions depending on your needs, e.g. to save it as a headerless spreadsheet file. The Pitchtier file contains the sequence of (Time,Frequency) coordinates used in the stylization. Frequency values are specified in Hz.

Semitones

In prosograms pitch is plotted on a semitone scale relative to 1 Hz (giving positive values only for a normal pitch range). In Praat, semitones may be measured relative to 100 Hz (giving negative ST values for values below 100 Hz) or to 1 Hz.

External pitch extraction

This is no longer supported.

Resynthesized speech based on the stylized pitch

Prosodic profile

The stylization uses a segmentation of the speech signal into a sequence of nuclei, ideally corresponding to syllabic nuclei. The availability of this segmentation allows for the computation of statistical data about the prosodic properties of these nuclei, as well as of the sequences of nuclei. For each input file the prosogram generates a file containing the prosodic profile of the speech signal.

A sample profile is shown below. Footnotes explain how these figures should be interpreted. However, additional information is given below.

 Prosodic profile for input file: C:/corpus/Barthes.wav

 Date: Sun Jun 20 15:10:44 2010
 Segmentation type: asyll
 Time:  
     total speech time=654.78(s) [Note 1]
     estimated phonation time=533.27 (81.44% of speech time) [Note 2]
     estimated pause time=121.51 (18.56% of speech time) [Note 3]
 Nucleus: 2931 nuclei in signal
 Nucleus duration: 
     mean=0.086(s) stdev=0.048 summed nucleus duration=253.15 (s)
 Global pitch measures: 
     Quantiles of min and max F0 values of nuclei before stylisation: 
       2%=65.29(Hz), 5%=68.69(Hz), 50%=89.73(Hz), 95%=144.47(Hz), 98%=165.24(Hz) 
     Quantiles for median pitch values of nuclei before stylisation: 
       2%=67.70(Hz), 5%=70.61(Hz), 50%=88.96(Hz), 95%=145.00(Hz), 98%=162.36(Hz) 
     Pitch measures for stylized nuclei: 
       min=60.98(Hz), max=440.88(Hz), mean of median=97.85 [Note 7]
     Quantiles of low & high pitch values of nuclei after stylisation: 
       2%=67.01(Hz), 5%=70.34(Hz), 50%=89.25(Hz), 95%=146.27(Hz), 98%=164.38(Hz) 
 Intrasyllabic pitch interval: 
     dynamic=16.17%, of which: rising=20.04%, falling=79.96% [Note 4]
     mean=-1.68(ST) stdev=2.42
     sum=-4929.74(ST) sum/nucleus_time=-19.47(ST/s) [Note 5]
     trajectory=7433.36(ST) trajectory/nucleus_time=29.36(ST/s) [Note 6]
 Intersyllabic pitch interval: 
     mean=0.51(ST) stdev=5.07
     sum=1490.60(ST) sum/inter_nucleus_time=-19.47(ST/s) [Note 5]
     trajectory=9575.19(ST) trajectory/inter_nucleus_time=34.18(ST/s) [Note 6]
 All pitch intervals: 
     trajectory(speech)=17008.55(ST) [Note 1,2,6]
     trajectory(speech)/speech_time=25.98(ST/s)
     trajectory(phonation)/phonation_time=28.74(ST/s)

 Notes 
  [1] speech time = internucleus time + intranucleus time + pause time 
  [2] phonation time = internucleus time + intranucleus time 
  [3] estimated pause = when internucleus time >= 0.3
  [4] dynamic = when pitch variation exceeds glissando threshold
      rising = when sum of upward intervals > sum of abs(downward intervals)
  [5] sum = sum of signed pitch intervals (falls and rises compensate)
  [6] trajectory = sum of absolute pitch intervals (falls and rises add up)
  [7] mean of median = mean of median pitch of all nuclei

For some properties, absolute as well as normalized values are shown. The latter are computed by dividing the absolute value by the relevant time interval. Here is an example. The pitch trajectory indicates the sum of absolute pitch intervals, i.e. without taking into account the direction (upward or downward) of the movement. The normalized pitch trajectory indicates the total trajectory divided by time; cumulated intrasyllabic pitch movement is divided by cumulated intrasyllabic time, intersyllabic pitch movement is divided by intersyllabic time, total pitch movement is divided by total speech time.

The profile is written to a file named basename + _profile.txt. Moreover, a spreadsheet is generated containing the values in the prosodic profile of each file being processed. This file is named basename + _spreadsheet.txt. When speakers and stimulus types are combined in separate files, the spreadsheet will show the profile results for the speakers and stimulus types.

Additional tools

labels2textgrid.praat

This script converts a text file (or files) containing a phonetic alignment into a TextGrid file for Praat.

The input text file contains 3 columns: label, start_time, end_time. Columns may be separated by any of the separator characters given in 'seps$', typically a blank, a tab or a comma. Time intervals in the inputfile need not be contiguous, but can be. However, time intervals should not overlap. Times may be specified in the time unit selected in the script form. The labels should be given in a form such that prosogram is able to identify the vowels (see above). Input files can be specified using wildcards, to convert multiple files in one step. Output files are written to the directory of the input files.

The script is available here.

eps_conv.praat

This script converts one or more EPS files into other graphics formats such as JPEG, PDF, PNG, GIF. It calls Ghostscript in the background. Read the beginning of the script to configure it for your computer.

The script is available here.


Last modification : 2011-12-15