QuidiVidi Help

Welcome to QuidiVidi!

QuidiVidi is a free-of-charge, freely distributable application for analyzing linguistic transcripts in CHAT format.

You can calculate:

You can also filter the input on tier-specific search terms.

Overview

The QuidiVidi main window consists of two tabs, Console and Input. The console displays program output, such as calculations of MLU and MLT (these can also be directed to a file instead). You can't edit the console contents, but you can copy from it or save it to a file.

The Input tab is one of two methods for providing CHAT input to the various QuidiVidi commands (you can also select a list of files from any of the command dialogs). You can open a CHAT file (File > Open in this tab. The full set of editing functions such as cut and paste are available, and, as with the console, you can save the contents in a file.

You can find (Edit > Find) text in the console, and find and replace in the Input tab. The contents of both tabs can be printed.

Commands for doing analyses of CHAT transcripts are found in the Analytics menu.

The dialogs for each analytic command give you the option of selecting a list of files to run the operation on. The file chooser dialog lists all CHAT files in the selected folder. Move the selected files to the Selected list on the right by clicking Add Selected, or double-click files individually. Similarly, you can remove files from the Selected list by double-clicking them.

MLU

MLU or mean length of utterance is an age-independent measure of a subject's linguistic development, originated in the 1970s by researcher Roger Brown. It is defined as the number of utterances by a specific speaker in the transcript (or portion thereof) divided by the number of morphemes (or words). An utterance, corresponding to a main tier in a CHAT file (such as *CHI or *MOT) is a continuous stream of spoken output without a significant break.

In common with all the QuidiVidi command dialogs, you can choose to read the input to the MLU calculation from the Input tab or from a list of files. Depending on your choice, the Choose button will enable you to select accordingly. Both forms of input are shared among the various QuidiVidi analytics commands: for example, if you populate the Input tab or file list from the MLU dialog, the same selections will be in effect if you run the MLT command. If you select a list of files, it will be displayed in a tooltip if you hover the mouse over the Choose button.

You can direct the output to either the console or an Excel file.

It is usual to base the MLU on the number of morphemes (calculated from the %mor sub-tier in the CHAT input) but you may base it on words instead.

By default, utterances containing unintelligible words (marked with ‘xxx’, ‘yyy’, and ‘www’, in the main tier in a CHAT file) will be ignored. You may choose to override this. However, these words themselves will always be ignored in the MLU count. As well, any words starting with one of the characters 0, &, *, -, + will always be ignored, as will the filler words hm, oh, uh, ah, and rah.

Some investigators may choose to skip the first few utterances in order to give the subject (usually a child) time to become comfortable with the environment. As well, you can base the MLU calculation on a maximum number of utterances (this maximum does not include the skipped utterances).

As an alternative measure, some investigators are interested in the MLU of the five longest utterances (MLU5) instead of or in addition to the usual MLU calculation.

Click Go to carry out the calculation. QuidiVidi will display the source of the input (a file name or the Input tab), the raw number of utterances and morphemes/words, the MLU, and the standard deviation of the utterance lengths.

Input tab
Utterances: 262, Morphemes: 916, MLU: 3.496, Std dev: 2.681

MLT

MLT, or mean length of turn is a measure that is analogous to MLU. A turn in a transcript of a conversation is defined as one or more utterances by the same speaker without another speaker intervening.

The MLT can be measured in words or seconds provided the transcript has been annotated to record the length of each utterance. If you choose to measure in seconds, any utterances that do not have a recorded length will be ignored.

By default, unintelligible words will not be counted, but utterances with such words or with no (intelligible) words will be counted. You can change any of these options. Any words starting with one of the characters 0, &, *, -, + will always be ignored.

You can skip the first few utterances in order to give the subject (usually a child) time to become comfortable with the environment, or restrict the MLT calculation on a maximum number of utterances.

Click Go to carry out the calculation.

Input tab

 Turns: 627, Words: 2088, MLT: 3.33, Std dev: 3.343

Frequencies

The Frequencies command enumerates the occurrences of individual words in the transcript for a specific speaker. For example, you would use this command if you want to know the top ten words that a child speaker uses in a transcript.

You can also use this command to count morphemes or lemmas instead of words. (A lemma is the “root” form of a word. If you are counting lemmas, cat and cats will be regarded as the same token, as will has, had and have.)

The results will be sorted by frequency, and within groups of words of the same frequency, alphabetically; you can change this to pure alphabetical order.

You can restrict the search to a set of words from a file, or to words that match a pattern (regular expression). See the Find section for more information on patterns.

You can choose to display only the top results.

By default, unintelligible words will not be counted, but you can change this option. Any words starting with one of the characters 0, &, *, -, + will always be ignored.

Click Go to carry out the search.

File: Input tab
and: 156
a: 124
the: 68
one: 60
what's: 54
what: 52
my: 48
has: 40
that: 40
he: 38
where's: 38
I: 34
...

Morphemes

The Morphemes command searches a transcript for occurrences of the fourteen English morphemes enumerated by Roger Brown in his study of the acquisition of grammatical structures. These are, in typical order of acquisition:

  1. Present progressive (-ing; coded in CHAT as -PRESP)
  2. The prepositions in and on
  3. Regular plural (-s; -PL)
  4. Irregular past tense (went, etc.; &PAST)
  5. Possessive ('s; ~poss)
  6. Uncontractible copula (is; ~cop|be)
  7. The articles a and the
  8. Regular past tense (-ed; -PAST)
  9. Regular third person singular (-s; -3S)
  10. Irregular third person singular (has, does; -3S)
  11. Uncontractible auxiliary (is; aux|be&3S)
  12. Contractible copula (is; cop|be)
  13. Contractible auxiliary (is; ~aux|be&3S)

Click Go to carry out the search.

File,PRESP,prep:in,prep:on,PL,irr_PAST,poss,un_cop,det:art,PAST,3S,irr_3S,un_aux,con_cop,con_aux

Input tab,48,15,4,33,27,21,33,134,17,14,3,6,29,14

Lexical classes

<>Lexical classes reads a transcript and calculates the percentage of words in each lexical class produced by the selected speaker: nouns, verbs, adjectives, determiners, and so forth. These calculations are based on the coding in the %mor sub-tier.

Click Go to carry out the calculation.

total words = 2424.
nouns: 462 (19.06%), verbs: 378 (15.59%), adjectives: 69 (2.85%), adverbs: 165 (6.81%),
determiners: 196 (8.09%), modifiers: 74 (3.05%), conjunctions: 43 (1.77%), coordinators: 81 (3.34%),
pronouns: 463 (19.1%), prepositions: 114 (4.7%), quantifiers: 34 (1.4%), onomatopoeia: 3 (0.12%),
negators: 5 (0.21%), singing: 8 (0.33%), co: 233 (9.61%), other: 96 (3.96%)

Maximums

Click Maximums to find the longest utterances or words in a transcript. You can count utterance length in words or morphemes. You can limit the output to the top one or more results, or to words or utterances with a specific minimum length.

By default, unintelligible words and utterances with unintelligible words will not be counted, but you can modify these settings. You can also choose to allow duplicates in the output.

Click Go to carry out the search.

Input tab
Top 5 utterances by word count:
"and now I abcs and sing with me" (8 words)
"I don't want to I don't want to" (8 words)
"all of the train track pieces are broken" (8 words)
"mom I want more bubbles I want more bubbles" (9 words)
"and grew until his ceiling hung with vines and the walls 
   became the world all around" (16 words)

Comparisons

The Comparison command enables you to track how a token (such as a word, morpheme, or phonetic rendition of a sound) in one sub-tier corresponds to the token at the matching position in another tier. For example, you can get a list of all the different ways that a speaker pronounces each word (or a subset) in a transcript. This kind of comparison is sometimes referred to as a “model-replica” comparison.

You need to select a base tier against which the comparisons will be made, and then a comparison tier. To see how words are pronounced, you would choose Main as the base tier and Phonetic as the comparison tier.

Optionally, you can restrict the comparison by entering a pattern in one or both of the Match fields. For example, to track how the segment ‘ɹd’ is pronounced, you would choose the Model tier as the base, and enter this segment in the Match, and then choose Phonetic as the comparison field.

Click Go to carry out the comparison.

animals
    1 ˈæ̃nɪmoz
another
    1 əˈnʌvə
are
    2 ə
at
    2 ɪʔ
    1 ˈæʔ
back
    1 ˈbæk
backpack
    1 ˈbækpæ

Lengths

The Lengths command displays the distribution of lengths of four quantities in a transcript:

It also calculates the mean for each of these quantities.

By default, the distributions for values one through twelve are displayed, but you can adjust this.

Unintelligible words will always be ignored when calculating word lengths and are ignored by default for other calculations.

Words by length in characters
lengths:     1     2     3     4     5     6    7+  Mean
*CHI        78   112   442   268   131   116    82  3.76
*MOT       215   555  1039  1005   652   421   415  3.99
Utterances by length in words
lengths:     1     2     3     4     5     6    7+  Mean
*CHI        97    61    68    52    30    10    40  3.13
*MOT       135    92   103    86    87    52   260  4.34
Turns by length in utterances
lengths:     1     2     3     4     5     6    7+  Mean
*CHI       275    33     4     0     1     0     0  1.14
*MOT       144    75    38    18    14     4    20  2.28
Turns by length in words
lengths:     1     2     3     4     5     6    7+  Mean
*CHI        79    48    58    44    27     5    52  3.37
*MOT        38    23    25    29    20    20   158  5.12

Diversity

Lexical diversity refers to the richness of the subject's vocabulary. A common measure of diversity is TTR, or type-token ratio: the number of unique words in a transcript divided by the total number of words. (You can obtain this value in QuidiVidi via the Frequency command.) However, TTR has a drawback in that the value tends to decrease for longer transcripts, since the speaker will repeat already-used words as the conversation goes on.

The diversity measure used here is that proposed by Durán et al. in their 2004 article Developmental Trends in Lexical Diversity (Applied Linguistics 25/2: 220-242).

They observe the following relationship between diversity (D), sample size (N), and TTR (T) for small sample sizes:

The procedure for obtaining D is as follows:

  1. Set N equal to 35 and randomly select N words from the transcript
  2. Calculate T by dividing the number of unique words in the random sample by N
  3. Do the above 100 times, recording T each time.
  4. Obtaining a value for D based on the above equation, and the 100 observed values of T cannot be carried out algebraically, but it can be solved as a nonlinear least squares problem, for which there are a number of possible algorithms. The one used by QuidiVidi is the Golden-section search method.
  5. Repeat the above for sample sizes (N) up to and including 50.
  6. Also using the Golden-section search method, calculate the optimum D for all observations.
  7. Repeat all of the above two more times.

You have the option of basing your calculations on words, word stems (the default), or word lemmas. If you choose stems, words like dog and dogs or hard and hardly will be considered the same word, but irregular verb forms such as had will be treated as a separate word, as will irregular plurals such as mouse and Mice. The ' in words such as he&apos's will be treated as the word be; can't will be treated as the words can and not.

Unintelligible words will always be ignored.

Filter

The Filters command enables you to display tiers and sub-tiers in a transcript that match one or more patterns. You can select a main tier (such as child or mother) and up to three sub-tiers. You can leave a pattern blank, which will cause all occurrences of the tier underneath a matching main tier to be displayed. You may choose to display only tiers in which all sub-tiers match, or those in which any sub-tier matches.

Find & Replace

The Edit > Find and Replace command enables you to search for text in the console, and search and replace text in the Input tab.

Regular expressions

You can carry out searches using regular expression (sometimes called “patterns”) in both tabs. This enables you to match a class of text strings instead of just the literal find text. If you're not familiar with these expressions, here's a brief overview:

If you want to search for any of the special characters * + | $ in a regular expression, prefix it with a backslash (for example, ^n\|dog will find the text n|dog at the start of a token).

Note: The regular expression syntax corresponds to that used in the Python programming language and many others. You can find out more information by searching online for “Python regular expressions”.

You can also take advantage of regular expressions in the replace text in the Input tab. This is done using group references. A group is an expression surrounded by parentheses in the find text. In the replace text, you can refer to the text that a group matches using the notation \n where n is the group number, counting from left to right in the find text.

If the find text is m(.)(.)(.)(.), there are four groups, each of which matches a single character (represented by a dot). You could have a replace expression \4\3\2\1 where each backslash followed by a number refers to a group. If the edit pane contains “media”, the find text will match it, and applying replace would change it to “aide”, by inserting each character matched by the groups in the order group 4, group 3, group 2, group 1 (in effect, reverse order).

Styles

Click Tools > Styles to set styles for the console and the Input tab: the font family and font size (in points); whether the text is bold, italic, or monospace; and the text color and background color.

You can also adjust some aspects of the interface styles: the color of toolbar buttons; the font, font size, and color of menu items; and the help font, font size, text color, and background color.

About the name

QuidiVidi is named for QuidiVidi Vidi Lake (if we're being honest, a pond), in St John's, Newfoundland and Labrador, Canada. Most people pronounce it “kiddy viddy” ['kɪdi 'vɪdi]; some say “kitty vitty”; and the occasional person even says “kwyda vyda”. The origin of the name is unknown and subject to much speculation.

—Updated: June 28, 2025.
Copyright © 2025 Rodney Boyd