Welcome to QuidiVidi!
QuidiVidi is a free-of-charge, freely distributable application for analyzing linguistic transcripts in CHAT format.
You can calculate:
You can also filter the input on tier-specific search terms.
The QuidiVidi main window consists of two tabs, Console and Input. The console displays program output, such as calculations of MLU and MLT (these can also be directed to a file instead). You can't edit the console contents, but you can copy from it or save it to a file.
The Input tab is one of two methods for providing CHAT input to the various QuidiVidi commands (you can also select a list of files from any of the command dialogs). You can open a CHAT file (File > Open in this tab. The full set of editing functions such as cut and paste are available, and, as with the console, you can save the contents in a file.
You can find (Edit > Find) text in the console, and find and replace in the Input tab. The contents of both tabs can be printed.
Commands for doing analyses of CHAT transcripts are found in the Analytics menu.
The dialogs for each analytic command give you the option of selecting a list of files to run the operation on. The file chooser dialog lists all CHAT files in the selected folder. Move the selected files to the Selected list on the right by clicking Add Selected, or double-click files individually. Similarly, you can remove files from the Selected list by double-clicking them.
MLU or mean length of utterance is an age-independent measure of a subject's linguistic development, originated in the 1970s by researcher Roger Brown. It is defined as the number of utterances by a specific speaker in the transcript (or portion thereof) divided by the number of morphemes (or words). An utterance, corresponding to a main tier in a CHAT file (such as *CHI or *MOT) is a continuous stream of spoken output without a significant break.
In common with all the QuidiVidi command dialogs, you can choose to read the input to the MLU calculation from the Input tab or from a list of files. Depending on your choice, the Choose button will enable you to select accordingly. Both forms of input are shared among the various QuidiVidi analytics commands: for example, if you populate the Input tab or file list from the MLU dialog, the same selections will be in effect if you run the MLT command. If you select a list of files, it will be displayed in a tooltip if you hover the mouse over the Choose button.
You can direct the output to either the console or an Excel file.
It is usual to base the MLU on the number of morphemes (calculated from the %mor sub-tier in the CHAT input) but you may base it on words instead.
By default, utterances containing unintelligible words (marked with ‘xxx’,
‘yyy’, and ‘www’, in the main tier in a CHAT file) will be ignored.
You may choose to override this. However, these words themselves will always be ignored in the MLU count.
As well, any words starting with one of the characters 0, &, *, -, +
will always be ignored, as will
the filler words hm, oh, uh, ah, and rah.
Some investigators may choose to skip the first few utterances in order to give the subject (usually a child) time to become comfortable with the environment. As well, you can base the MLU calculation on a maximum number of utterances (this maximum does not include the skipped utterances).
As an alternative measure, some investigators are interested in the MLU of the five longest utterances (MLU5) instead of or in addition to the usual MLU calculation.
Click Go to carry out the calculation. QuidiVidi will display the source of the input (a file name or the Input tab), the raw number of utterances and morphemes/words, the MLU, and the standard deviation of the utterance lengths.
Input tab Utterances: 262, Morphemes: 916, MLU: 3.496, Std dev: 2.681
MLT, or mean length of turn is a measure that is analogous to MLU. A turn in a transcript of a conversation is defined as one or more utterances by the same speaker without another speaker intervening.
The MLT can be measured in words or seconds provided the transcript has been annotated to record the length of each utterance. If you choose to measure in seconds, any utterances that do not have a recorded length will be ignored.
By default, unintelligible words will not be counted, but utterances with such
words or with no (intelligible) words will be counted. You can change any of these options.
Any words starting with one of the characters 0, &, *, -, +
will always be ignored.
You can skip the first few utterances in order to give the subject (usually a child) time to become comfortable with the environment, or restrict the MLT calculation on a maximum number of utterances.
Click Go to carry out the calculation.
Input tab Turns: 627, Words: 2088, MLT: 3.33, Std dev: 3.343
The Frequencies command enumerates the occurrences of individual words in the transcript for a specific speaker. For example, you would use this command if you want to know the top ten words that a child speaker uses in a transcript.
You can also use this command to count morphemes or lemmas instead of words. (A lemma is the “root” form of a word. If you are counting lemmas, cat and cats will be regarded as the same token, as will has, had and have.)
The results will be sorted by frequency, and within groups of words of the same frequency, alphabetically; you can change this to pure alphabetical order.
You can restrict the search to a set of words from a file, or to words that match a pattern (regular expression). See the Find section for more information on patterns.
You can choose to display only the top results.
By default, unintelligible words will not be counted, but you can change this option.
Any words starting with one of the characters 0, &, *, -, +
will always be ignored.
Click Go to carry out the search.
File: Input tab and: 156 a: 124 the: 68 one: 60 what's: 54 what: 52 my: 48 has: 40 that: 40 he: 38 where's: 38 I: 34 ...
The Morphemes command searches a transcript for occurrences of the fourteen English morphemes enumerated by Roger Brown in his study of the acquisition of grammatical structures. These are, in typical order of acquisition:
Click Go to carry out the search.
File,PRESP,prep:in,prep:on,PL,irr_PAST,poss,un_cop,det:art,PAST,3S,irr_3S,un_aux,con_cop,con_aux Input tab,48,15,4,33,27,21,33,134,17,14,3,6,29,14
%mor
sub-tier.
Click Go to carry out the calculation.
total words = 2424. nouns: 462 (19.06%), verbs: 378 (15.59%), adjectives: 69 (2.85%), adverbs: 165 (6.81%), determiners: 196 (8.09%), modifiers: 74 (3.05%), conjunctions: 43 (1.77%), coordinators: 81 (3.34%), pronouns: 463 (19.1%), prepositions: 114 (4.7%), quantifiers: 34 (1.4%), onomatopoeia: 3 (0.12%), negators: 5 (0.21%), singing: 8 (0.33%), co: 233 (9.61%), other: 96 (3.96%)
Click Maximums to find the longest utterances or words in a transcript. You can count utterance length in words or morphemes. You can limit the output to the top one or more results, or to words or utterances with a specific minimum length.
By default, unintelligible words and utterances with unintelligible words will not be counted, but you can modify these settings. You can also choose to allow duplicates in the output.
Click Go to carry out the search.
Input tab Top 5 utterances by word count: "and now I abcs and sing with me" (8 words) "I don't want to I don't want to" (8 words) "all of the train track pieces are broken" (8 words) "mom I want more bubbles I want more bubbles" (9 words) "and grew until his ceiling hung with vines and the walls became the world all around" (16 words)
The Comparison command enables you to track how a token (such as a word, morpheme, or phonetic rendition of a sound) in one sub-tier corresponds to the token at the matching position in another tier. For example, you can get a list of all the different ways that a speaker pronounces each word (or a subset) in a transcript. This kind of comparison is sometimes referred to as a “model-replica” comparison.
You need to select a base tier against which the comparisons will be made, and then a comparison tier. To see how words are pronounced, you would choose Main as the base tier and Phonetic as the comparison tier.
Optionally, you can restrict the comparison by entering a pattern in one or both of the Match fields. For example, to track how the segment ‘ɹd’ is pronounced, you would choose the Model tier as the base, and enter this segment in the Match, and then choose Phonetic as the comparison field.
Click Go to carry out the comparison.
animals 1 ˈæ̃nɪmoz another 1 əˈnʌvə are 2 ə at 2 ɪʔ 1 ˈæʔ back 1 ˈbæk backpack 1 ˈbækpæ
The Lengths command displays the distribution of lengths of four quantities in a transcript:
It also calculates the mean for each of these quantities.
By default, the distributions for values one through twelve are displayed, but you can adjust this.
Unintelligible words will always be ignored when calculating word lengths and are ignored by default for other calculations.
Words by length in characters lengths: 1 2 3 4 5 6 7+ Mean *CHI 78 112 442 268 131 116 82 3.76 *MOT 215 555 1039 1005 652 421 415 3.99 Utterances by length in words lengths: 1 2 3 4 5 6 7+ Mean *CHI 97 61 68 52 30 10 40 3.13 *MOT 135 92 103 86 87 52 260 4.34 Turns by length in utterances lengths: 1 2 3 4 5 6 7+ Mean *CHI 275 33 4 0 1 0 0 1.14 *MOT 144 75 38 18 14 4 20 2.28 Turns by length in words lengths: 1 2 3 4 5 6 7+ Mean *CHI 79 48 58 44 27 5 52 3.37 *MOT 38 23 25 29 20 20 158 5.12
Lexical diversity refers to the richness of the subject's vocabulary. A common measure of diversity is TTR, or type-token ratio: the number of unique words in a transcript divided by the total number of words. (You can obtain this value in QuidiVidi via the Frequency command.) However, TTR has a drawback in that the value tends to decrease for longer transcripts, since the speaker will repeat already-used words as the conversation goes on.
The diversity measure used here is that proposed by Durán et al. in their 2004 article Developmental Trends in Lexical Diversity (Applied Linguistics 25/2: 220-242).
They observe the following relationship between diversity (D), sample size (N), and TTR (T) for small sample sizes:
The procedure for obtaining D is as follows:
You have the option of basing your calculations on words, word stems (the default), or word lemmas. If you choose stems, words like dog and dogs or hard and hardly will be considered the same word, but irregular verb forms such as had will be treated as a separate word, as will irregular plurals such as mouse and Mice. The ' in words such as he&apos's will be treated as the word be; can't will be treated as the words can and not.
Unintelligible words will always be ignored.
The Filters command enables you to display tiers and sub-tiers in a transcript that match one or more patterns. You can select a main tier (such as child or mother) and up to three sub-tiers. You can leave a pattern blank, which will cause all occurrences of the tier underneath a matching main tier to be displayed. You may choose to display only tiers in which all sub-tiers match, or those in which any sub-tier matches.
The Edit > Find and Replace command enables you to search for text in the console, and search and replace text in the Input tab.
You can carry out searches using regular expression (sometimes called “patterns”) in both tabs. This enables you to match a class of text strings instead of just the literal find text. If you're not familiar with these expressions, here's a brief overview:
a.m
will match “arm”,
“aim”, and “alm”ar*m
will match “am”,
“arm”, and “arrrrm”.ar+m
will
“arm” and “arrrrm”, but not “am”.ar?m
will match “am” and
“arm”, but not “arrm”.^cat
will match “cat”
at the beginning of a line but not “black cat”. Similarly, the “$” character will match at the end
of a line. cat$
will match “cat” at the end of a line but not “cattle”.[ai]t
matches “at” and “it”in|of|by
will match
any of “in”, “of”, and “by”.(ar|bi|ca)t
will match
any of “art”, “bit”, and “cat”.If you want to search for any of the special characters * + | $
in a regular expression, prefix it with a backslash
(for example, ^n\|dog
will find the text n|dog
at the start of a token).
Note: The regular expression syntax corresponds to that used in the Python programming language and many others. You can find out more information by searching online for “Python regular expressions”.
You can also take advantage of regular expressions in the replace text in the Input
tab. This is done using group references.
A group is an expression surrounded by parentheses in the find text. In the replace text,
you can refer to the text that a group matches
using the notation \n
where n is the group number, counting from left to right
in the find text.
If the find text is m(.)(.)(.)(.)
, there are four groups, each of which matches a single character
(represented by a dot). You could have a replace expression \4\3\2\1
where each backslash followed by a number
refers to a group. If the edit pane contains “media”, the find text will match it, and applying replace would
change it to “aide”, by inserting each character matched by the groups in the order group 4, group 3, group 2, group 1
(in effect, reverse order).
Click Tools > Styles to set styles for the console and the Input tab: the font family and font size (in points); whether the text is bold, italic, or monospace; and the text color and background color.
You can also adjust some aspects of the interface styles: the color of toolbar buttons; the font, font size, and color of menu items; and the help font, font size, text color, and background color.
QuidiVidi is named for QuidiVidi Vidi Lake (if we're being honest, a pond), in St John's, Newfoundland and Labrador, Canada. Most people pronounce it “kiddy viddy” ['kɪdi 'vɪdi]; some say “kitty vitty”; and the occasional person even says “kwyda vyda”. The origin of the name is unknown and subject to much speculation.
—Updated: June 28, 2025.
Copyright © 2025 Rodney Boyd