W3C

Voice Extensible Markup Language (VoiceXML) Version 2.0
Candidate Recommendation Disposition of Comments

This version:
January 19, 2004
Editor:
Scott McGlashan, Hewlett-Packard

Abstract

This document details the responses made by the Voice Browser Working Group to issues raised during the Candidate Recommendation (beginning 28th January 2003 and ending 10th April 2003) review of Voice Extensible Markup Language (VoiceXML) Version 2.0 . Comments were provided by Voice Browser Working Group members, other W3C Working Groups, and the public via the www-voice-request@w3.org (archive) mailing list.

Status

This document of the W3C's Voice Browser Working Group describes the disposition of comment as of January 19, 2004 on Voice Extensible Markup Language (VoiceXML) Version 2.0 Candidate Recommendation. It may be updated, replaced or rendered obsolete by other W3C documents at any time.

For background on this work, please see the Voice Browser Activity Statement.

Table of Contents


1. Introduction

This document describes the disposition of comments in relation to the Voice Extensible Markup Language (VoiceXML) Version 2.0 (http://www.w3.org/TR/2003/CR-voicexml20-20030220/). Each issue is described by the name of the commentator, a description of the issue, and either the resolution or the reason that the issue was not resolved.

The full set of Issues raised for the Voice Extensible Markup Language (VoiceXML) Version 2.0 since August 2000, their resolution and in most cases the reasoning behind the resolution are available from http://www.w3.org/Voice/Group/2004/voicexml-change-requests.htm [W3C Members Only]. This document provides the analysis of the issues that were submitted and resolved as part of the Last Call Review.

Notation: Each original comment is tracked by a "(Change) Request" [R] designator. Each point within that original comment is identified by a point number. For example, "R5-1" is the first point in the fifth change request for the specification.

2. Comments

Item Commentator Nature Disposition
CR1-1    Arnaud Vallee    Clarification / Typographical / Editorial (§2.1)     accepted (no-reply)   
CR2-1    Arnaud Vallee    Technical Error (§2.2)     accepted (no reply)   
CR3-1    Arnaud Vallee    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR4-1    Arnaud Vallee    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR5-1    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-2    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-3    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-4    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-5    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-6    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-7    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-8    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-9    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-10    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-11    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-12    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-13    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-14    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-15    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR5-16    Guillaume Berche     Change to Existing Feature (§2.3)     accepted   
CR6-1    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR6-2    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR6-3    Guillaume Berche     Change to Existing Feature (§2.3)     accepted   
CR6-4    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR6-5    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR6-6    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR6-7    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR6-8    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR6-9    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR6-10    Guillaume Berche     Technical Error (§2.2)     accepted   
CR6-11    Guillaume Berche     Technical Error (§2.2)     accepted   
CR6-12    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR6-13    Guillaume Berche     Clarification / Typographical / Editorial (§2.1)     accepted   
CR7-1    Max Froumentin     Clarification / Typographical / Editorial (§2.1)     accepted   
CR8-1    Matt Porter    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR9-1    John Voger    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR10-1    Philippe Le Hegaret     Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR11-1    C. M. Sperberg-McQueen    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR11-2    C. M. Sperberg-McQueen    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR11-3    C. M. Sperberg-McQueen    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR11-4    C. M. Sperberg-McQueen    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR11-5    C. M. Sperberg-McQueen    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR11-6    C. M. Sperberg-McQueen    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR11-7    C. M. Sperberg-McQueen    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR11-8    C. M. Sperberg-McQueen    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR12-1    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     accepted   
CR12-2    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     accepted   
CR12-3    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     accepted   
CR13-1    Greg FitzPatrick    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR14-1    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     accepted   
CR14-2    Guillaume Berche    Clarification / Typographical / Editorial (§2.1)     accepted   
CR15-1    Ufuk Kayserilioglu    Clarification / Typographical / Editorial (§2.1)     accepted   
CR16-1    Mark Clark    Clarification / Typographical / Editorial (§2.1)     accepted (no reply)   
CR17-1    Robert Barkan    Clarification / Typographical / Editorial (§2.1)     accepted   
CR18-1    Mark Clark    Change to Existing Feature (§2.3)     accepted (no reply)   
CR19-1    Pavel Cenek     Feature Request (§2.4)     accepted   
CR19-2    Pavel Cenek     Feature Request (§2.4)     accepted   

2.1 Clarifications, Typographical, and Other Editorial

Issue CR1-1

From Arnaud Vallee

I have a question about where the error.badfetch is thrown and caught when a
called document has non existent root document.

Take the following scenario.

The document 1 makes a transition to document 2 whose root document does not
exist.  document 1 and document 2 have error.badfetch handler at the document
level.  Where is the error supposed to be caught?

I think the question could be the same for the following assertion: If a
document's application attribute refers to a document that also has an
application attribute specified, an error.semantic event is thrown.

As i did not get any anwer to the message, i post my query one more time.
The issue is as follows:

    In a document named doc1.vxml, which is a root document (do not
    specify an application attribute in the vxml tag), we transition to a
    document doc2.vxml.  doc2.vxml refers to a non existing root document
    (i.e., application attribute set to doc2-root-unexisting.vxml).

As the spec says (chap 1.5.2), " If a document refers to a non-existent
application root document, an error.badfetch event is thrown ", an
error.badfetch is thrown in this case.

The question: where is the error thrown, or in other way, where do i put the
error.badfetch handler to catch the error?

I see 2 possibilities:
- in doc1.vxml, which means that if a document refers to a non existing root
document, it is a badfecth to try to get this document.
- in doc2.vxml, which means that current document has to be initialized before
getting and initializing the root document.

I think this is the same issue with the following assertion in chapter 1.5.2:
"If a document's application attribute refers to a document that also has an
application attribute specified, an error.semantic event is thrown. "

except that, in this case, the error.semantic could also be catched in the
first root document.

Analysis:
[Pavel Cenek]
I am not member of WBWG, so my answer is only a guess. I also waited for 
an authorized answer and therefore haven't reacted on your first attempt.

> The issue is as follows:

> In a document named doc1.vxml, which is a root document (do not specify an
application attribute in the vxml tag), we transition to a document doc2.vxml.
> doc2.vxml refers to a non existing root document (i.e., application
attribute set to doc2-root-unexisting.vxml).  
>

> As the spec says (chap 1.5.2), 
> " If a document refers to a non-existent application root document, an 
error.badfetch event is thrown ",
> an error.badfetch is thrown in this case.
> 
> The question: where is the error thrown, or in other way, where do i put 
> the error.badfetch handler to catch the error?

The transition is caused by <goto> or <submit>, etc, therefore I would 
apply the rules for these tags (which should be the same for all of 
them). For <goto>, spec says:
"Note that for errors which occur during a dialog or document 
transition, the scope in which errors are handled is platform specific."

> I see 2 possibilities: 
> - in doc1.vxml, which means that if a document
refers to a non existing root document, it is a badfecth to try to get this
document.

In my opinion this possibility is more logical.

> - in doc2.vxml, which means that current document has to be initialized
before getting and initializing the root document.

> I think this is the same issue with the following assertion in chapter
1.5.2: > "If a document's application attribute refers to a document that also
has an application attribute specified, an error.semantic event is thrown. "

I think it would be valuable to mention the citation above also in the 
chapter one.

Resolution: rejected

The specification allows the error.badfetch event to be thrown in either the referring document or the referred document. To guarantee that the error is caught, catch handlers need to be specified in both documents. This error handling pattern is illustrated in numerous tests in our implementation report.

Email Trail:

Issue CR3-1

From Arnaud Vallee

chapter 2.4 of the VoiceXML (24 April 2002)

Attributes of filled are:

mode Either all (the default), or any. If any, this action is executed when
     any of the specified input items is filled by the last user input. If
     all, this action is executed when all of the mentioned input items are
     filled, and at least one has been filled by the last user input. A
     <filled> element in an input item cannot specify a mode.

namelist The input items to trigger on. For a <filled> in a form, namelist
     defaults to the names (explicit and implicit) of the form's input
     items. A <filled> element in an input item cannot specify a
     namelist; the namelist in this case is the input item name. Note that
     control items are not permitted in this list.

As i understand these attributes are not permitted in filled elements which
are child of input item.  But the spec do not say what happens in this case:
- ignore those attributes?
- throw an error (semantic)?

Furthermore, control items items are not permitted in namelist. I suppose any
other ECMA variable are not permitted neither. But how a voice browser should
handle that case? Ignore the non-input variable elements or throw an error
(semantic)?

Resolution: accepted with modifications

The specification will be modified so that upon encountering a document containing a <filled> element specifying either a 'mode' or 'namelist' attribute as a child of an input item, then an error.badfetch is thrown by the platform. In addition, the specification will also make clear that an error.badfetch is thrown when the document contains a <filled> element with a namelist attribute referencing a control item variable.

Email Trail:

Issue CR4-1

From Arnaud Vallee

The bargeintype propery is defined as follows:

"speech: The prompt will be stopped as soon as speech or DTMF input is
detected. The prompt is stopped irrespective of whether or not the input
matches a grammar. "

Would this mean that even if no dtmf grammar is active and the user enter a
dtmf, the prompt should be stopped?

Resolution: accepted with modifications

Yes. If bargeintype is speech then the prompt will be stopped as soon as speech or DTMF input is detected regardless of if it is a match or not. Having dtmf grammars active or not does not effect this. Setting the inputmodes to voice should prevent the DTMF from barging in on the prompts (although some platforms may have difficulty separating in-band DTMF from speech). The specification will be clarified as follows: addition of the words "and irrespective of which grammars are active." to the end of the sentence "The prompt is stopped irrespective of whether or not the input matches a grammar" from table 38.

Email Trail:

Issue CR5-1

From Guillaume Berche

0- Precise the value of the _dtmf special variable when a grammar element is
specified in a choice element.

As specified in the section "2.2 Menus", paragraph "Choice element": "If a
<grammar> element is specified in <choice>, then the external grammar is
used instead of an automatically generated grammar."

However, in such case it is not clear what value will be assigned in the
_dtmf special variable while executing an enumerate element.

Suggested text modification to "2.2.4 ENUMERATE":

"This specifier may refer to two special variables: _prompt is the choice's
prompt, and _dtmf is the choice's assigned DTMF sequence. **If no DTMF
sequence is assigned to the choice element or if a <grammar> element is
specified in <choice> then the _prompt variable is assigned the ECMAScript
undefined value.**"

Resolution: accepted with modifications

We accept the suggested text but will re-word it more precisely (e.g. '_dtmf' instead of '_prompt').

Email Trail:

Issue CR5-2

From Guillaume Berche

1- Precise semantics of id attribute of form and menu

The id attribute is optional according to the schema. However the
specifications do not seem to precise how the interpreter should handle
dialogs without specified id.

Suggested text modification to section "2.1 Forms":

"id         The optional name of the form. If specified, the form can be
referenced within the document or from another document. For instance <form
id="weather">, <goto next="#weather">. **If not specified, an internal name
is generated by the interpreter instead.**"

Suggested text modification to section "2.2 Menus":

"id The optional identifier of the menu. It allows the menu to be the target
of a <goto> or a <submit>. **If not specified, an internal name is generated
by the interpreter instead.**"

Resolution: rejected

If no explicit id is specified, then the developer is not interested in referring to the form or menu element. Whether or not the platform generates an internal name is a vendor-specific issue.

Email Trail:

Issue CR5-3

From Guillaume Berche

2- Precise that <value> should be ignored if the expression resolves to
ECMAScript undefined

There are cases where it is difficult to know whether a variable (such as
special variable as _dtmf) has a non-null value without writing an explicit
if statement. To avoid this, it would be convenient if value elements would
be silently ignored if their expressions resolved into the ECMAScript
undefined value (whereas references to undeclared variables would keep
throwing an error.semantic event).

Suggested text modification to section section "4.1.4 <value> Element":

"expr The ECMAScript expression which provides the text to render, or
resolves into a special variable such as _prompt or _dmtf as specified in
section "2.2 Menus" paragraph "Enumerate element". If the expression
resolves into the ECMAScript undefined value, then the value element is
silently ignored. However, if the expression refers to an undeclared
variable, then an error.semantic event is thrown."

Resolution: rejected

As pointed out, the developer can always write explicit code to check the value of variables. The value of providing a 'convenience' interpretation is not clear to us.

Email Trail:

Issue CR5-4

From Guillaume Berche

3- Precise the value of _prompt when an option has no nested CDATA

As specified in "2.3.1.3. Fields Using Option Lists": "The default
assignment is the CDATA content of the <option> element with leading and
trailing white space removed. If this does not exist, then the DTMF sequence
is used instead."

Since the value of the _prompt variable is computed from the CDATA content,
what values is assigned to the _prompt variable when no CDATA content is
available in an option element? If the undefined value is assigned to the
_prompt special variable, would a <value expr="_prompt"> element fail?

Suggested modification: "if no CDATA is available from the <option> or
<choice> element, then the _prompt special variable is assigned the
undefined ECMAScript value."

Resolution: rejected

Having considered various alternatives including your suggestion, the group felt that at this stage in the process it is better to leave the behavior undefined and thereby platform-specific. A later version of VoiceXML may provide a more optimal solution.

Email Trail:

Issue CR5-5

From Guillaume Berche

4- precise the semantics of the value attribute of option elements

Section "2.3.1.3. Fields Using Option Lists" specifies the following: "value
The string to assign to the field's form item variable when a user selects
this option, whether by speech or DTMF. The default assignment is the CDATA
content of the <option> element with leading and trailing white space
removed. If this does not exist, then the DTMF sequence is used instead. "

However, the DTMF sequence is optional according to the schema.
Consequently, it would be useful to precise the behavior if unspecified

Suggested text modification to section "2.3.1.3. Fields Using Option Lists":

"Each <option> element contains PCDATA that is used to generate a speech
grammar. This follows the grammar generation method described for <choice>
in Section 2.2. Attributes may be used to specify a DTMF sequence for each
option and to control the value assigned to the field's form item variable.
Each option should at least define a DTMF sequence through the dtmf
attribute or contain CDATA content specifying the matching speech element,
otherwise an error.badfetch event is thrown."

Resolution: accepted with modifications

We will modify the specification so that in the situation where neither CDATA content nor a dtmf sequence is specified, then the default for the value attribute is undefined and the form field item is not filled.

Email Trail:

Issue CR5-6

From Guillaume Berche

5- Precise the format of the _dtmf special variable.

Section "2.2 Menus", paragraph "Enumerate element" states that "specifier
may refer to two special variables: _prompt is the choice's prompt, and
_dtmf is the choice's assigned DTMF sequence." However it does not precise
how the DTMF sequence is formatted (whether there are white space delimiters
that makes the string suitable for direct inclusion within a speech prompt)

Suggested text modification to section "2.2 Menus", paragraph "Enumerate
element":
"_prompt is the choice's prompt, and _dtmf is the choice's assigned DTMF
sequence formatted as a string holding the DTMF keystrokes separated by
white spaces (making it suitable for inclusion within a speech prompt)"

Resolution: accepted with modifications

The specification will be modified so that the format of _dtmf is a normalized representation of the dtmf sequence (i.e. single whitespace between DTMF tokens).

Email Trail:

Issue CR5-7

From Guillaume Berche

6- Precise the semantics of the dtmf attribute of option elements

Suggested modification to section "2.3.1.3. Fields Using Option Lists":

"dtmf    An **optional** DTMF sequence for this option. It is equivalent to
a simple DTMF <grammar> and DTMF properties (Section 6.3.3) apply to
recognition of the sequence. Unlike DTMF grammars, whitespace is optional:
dtmf="123#" is equivalent to dtmf="1 2 3 #". **If unspecified, no DTMF
grammar is associated to this option, meaning that this option can not be
matched using a DTMF**"

Rationale: it would make sense to add an option similar to the menu's dtmf
attribute so that dtmf sequence is automatically generated. Without this
attribute, how would an VXML author prevent the automatic generation of DTMF
grammars that may override other grammars (such as links)?
In addition, we would also need to specify what happens if a specified
option's dtmf attributes overlaps an automatically assigned dtmf. Should
this throw an "error.semantic" event as for choice elements or should we
rather apply the default grammar precedence algorithm to select the matching
element?

Resolution: accepted with modifications

We accept the suggested modification to 2.3.1.3 concerning the description of the dtmf attribute based on an alternative rationale; namely, that this is good clarification independent of the new features you mentioned in your rationale.

Email Trail:

Issue CR5-8

From Guillaume Berche

7- Precise semantics of Clear element.

Section "5.3.3 CLEAR" states that "The <clear> element resets one or more
form items" However, the definition of the namelist attribute adds that
"this [i.e. the namelist] can include variable names other than form items"
Besides, in the case where the namelist includes variable names other than
form items, what is the variable scope in which the variable must be defined
to be cleared?

Since a Clear element is an executable which may be included in a catch
element, which variable scope does it targets? In other words, would the
reset of a non-form item variable target the anonymous, dialog, document or
application-level scope?
[In addition, the Clear element may be invoked outside of the FIA (such as
during the document initialization), in which the notion of active element
is not clear, so relying on the scope of the active element as the scope in
which a variable should be cleared is ambiguous.]

Suggested text modification to Section "5.3.3 CLEAR":
"The <clear> element resets one or more form items, and possibly other
variables which are not form items. For each specified variable name, the
variable is resolved in the closest enclosing scope of the currently active
element as described in section "5.1.3 Referencing Variables". To remove
ambiguity, each variable name in the namelist may be prefixed with a scope
name as described in section "5.1.3 Referencing Variables".

Once a declared variable has been identified as declared in a given scope S,
its value is assigned the ECMAScript undefined value. In addition, if the
variable name corresponds to a form item in scope S, then the form item's
prompt counter and event counters are reset."

Resolution: accepted with modifications

We accept that the clear element should be clarified as your text suggests. However, we will modify the wording so that (a) variable references are resolved relative to the current scope as described in section 5.1.3, and (b) in the case of initialization, variable references are handled the same as for other ECMAScript variables.

Email Trail:

Issue CR5-9

From Guillaume Berche

8- Precise that var name attribute does not support scope prefixes

Suggested text modification to section "5.3.1 VAR":
 "name        The name of the variable that will hold the result. **Unlike
the name attribute of assign element, this attribute should not contain dots
(and in particular a scope prefix). The scope in which the variable is
defined is determined from the position in the document at which the var
element is declared.**"

Resolution: accepted with modifications

We accept the suggestion but will modify the text style for consistency with the rest of the document.

Email Trail:

Issue CR5-10

From Guillaume Berche

9- Precise that the assign's name attribute does support scope prefixes

The scope in which a variable is resolved is currently not clear. The
accepted scope prefix in the name attribute is also not clear.

Suggested text modification to section "5.3.2 ASSIGN"

"name The name of the variable being assigned to. As specified in section
"5.1.2 Variable Scopes", the corresponding variable should have been
previously declared otherwise an error.semantic event is thrown. By default,
the scope in which the variable is resolved is the closest enclosing scope
of the currently active element. To remove ambiguity, the variable name may
be prefixed with a scope name as described in section "5.1.3 Referencing
Variables". Note however that the name must refer to a variable and can not
refer to a property of an ECMAScript object or can not be a complex
ECMAScript
expression."

Resolution: accepted with modifications

We accept the suggested text modification but not the final line beginning "Note however" since it is permissable to assign to the property of an object; the second example in 5.3.2 makes this clear - <assign name="document.mycost" expr="document.mycost+14"/>.

Email Trail:

Issue CR5-11

From Guillaume Berche

10- Precise evaluation order of log attributes versus nested text/value, and
constraints on attributes

Suggested modification to section "5.3.13 LOG":

"label An **optional** string which may be used, for example, to indicate
the purpose of the log.
expr An **optional** ECMAscript expression evaluating to a string.
"

"The <log> element may contain any combination of text (CDATA) and <value>
elements. The generated message consists of the concatenation of the
evaluation of the ECMAscript expression followed in their respective order
by the nested text and the string form of the value of the "expr" attribute
of the <value> elements."

Resolution: accepted with modifications

We accept the clarification of 'optional' but not the last paragraph describing the order of evaluation - the order is already specified as document order.

Email Trail:

Issue CR5-12

From Guillaume Berche

11- Precise ordering of anonymous grammar generated for dtmfterm

As specified in section "2.3.6. RECORD": "The <record> element contains a
'dtmfterm' attribute as a developer convenience. A 'dtmfterm' attribute with
the value 'true' is equivalent to the definition of a local DTMF grammar
which matches any DTMF input. "

However, it is legal to have nested grammars in a record element. For
instance, a DTMF grammar that matches only the # key. It is not clear which
grammar would match because the precedence is not described.

Suggested text modification to section "2.3.6. RECORD": "The <record>
element contains a 'dtmfterm' attribute as a developer convenience. A
'dtmfterm' attribute with the value 'true' is equivalent to the definition
of a local DTMF grammar which matches any DTMF input. Any nested grammar
element will have precedence over this anonymous local grammar (even though
usefulness of such nested grammar is not clear)."

Resolution: accepted with modifications

We accept the suggested clarification of 'dtmfterm' attribute, but reject the suggested priority order when both the attribute and local grammars are specified. That is, we maintain that the dtmfterm attribute has priority over local grammars. Developers who want full control can omit the dtmf attribute and write their own local grammar.

Email Trail:

Issue CR5-13

From Guillaume Berche

12- Precise the semantics of the timeout property for the record element

The specs currently state the following "A timeout interval is defined to
begin immediately after prompt playback (including the 'beep' tone if
defined) and its duration is determined by the 'timeout' property. If the
timeout interval is exceeded before recording begins, then a <noinput> event
is thrown. "

However, how the "recording begins" is not clearly defined. I would assume
that when the platform supports speech recognition during recording, the
recording begins as soon as speech is provided by the remote end. However
the specification is not clear on whether in this case the platform should
remove the silence from the end of the first beep prompt up to the first
recognised speech. It is not clear either whether background noise or music
should trigger beginning of recording. For platforms not supporting speech
recognition during recording I believe this timeout property should be
ignored.

Suggested text modification to section "2.3.6. RECORD":

"A timeout interval is defined to begin immediately after prompt playback
(including the 'beep' tone if defined) and its duration is determined by the
'timeout' property. If the timeout interval is exceeded before recording
begins, then a <noinput> event is thrown. When the platform supports
detection of silence, the recording begins as soon as leading silence
(following the 'beep' tone if defined) completes. Note that whether the
recording would include the leading silence is platform specific. For
platforms not supporting silence detection, this property is ignored and no
<noinput> even is ever raised during a recording."

Resolution: accepted with modifications

We believe that when recording begins is clearly defined: in Section 2.3.6, it states:

"A recording begins at the earliest after the playback of any prompts (including the 'beep' tone if defined). As an optimization, a platform may begin recording when the user starts speaking."

i.e. the recording may include initial silence, etc if the platform does not use the optimization (e.g. voice activity detection). With the optimization, the recording can begin with the user's speech. Whether music or other audio triggers voice activity detection is platform-specific. Note that this behavior applies independent of whether speech recognition is supported (while the recording and recognition processes use the same audio data stream, theese processes are independent and therefore their voice activity detection mechanism may be different).

The timeout interval is clearly defined: "A timeout interval is defined to begin immediately after prompt playback (including the 'beep' tone if defined) and its duration is determined by the 'timeout' property."

The timeout interval has an effect on both recording and recognition (which are logically independent).

For recording, the impact is specified in "If the timeout interval is exceeded before recording begins, then a <noinput> event is thrown." In the case of non-optimized recording, recording always begins after prompt playback, so <noinput> would never be thrown. With optimized recording, however, <noinput> may be thrown if no voice activity is detected before timeout interval elapses.

For recognition, the situation is more complex. We are modifying the specification (due to implementation report feedback) so that if recognition is supported during recording (this is an optional feature), then only non-local speech grammars are active. If a non-local speech grammar is matched by audio input, then execution is immediately transferred its enclosing element. This raises the issue of whether a <noinput> or <nomatch> could be thrown by the recognition process. A <noinput> could be generated if the timeout interval has elapsed. A <nomatch> could be generated if the audio triggers recognition but does not match the active grammar. Our belief is that throwing these events by the recognition process during recording is undesirable and not what VoiceXML authors expect. Consequently, we are considering clarifying the specification to make it clear that <noinput> and <nomatch> events are never thrown from the recognition process during recording.

Email Trail:

Issue CR5-14

From Guillaume Berche

13- Precise that maxtime record attribute is mandatory and has no defaults

Suggested text modification to section "2.3.6. RECORD":
"maxtime The maximum duration to record. **This attribute must be specified
as it has no default value. If not specified an error.badfetch event is
thrown.**"

Resolution: rejected

The default value of the maxtime attribute is already specified as platform-dependent (see Table 16).

Email Trail:

Issue CR5-15

From Guillaume Berche

14- Precise that if value is used outside of a prompt element it inherits
default prompt parameters

The prompt element defines that if its attributes are not specified, they
default to values specified by properties. However, for the value element,
the specification do not precise how default values are computed.

Suggested text addition to section "4.1.4 <value> Element":
"The manner in which the value attribute is played is controlled by the
surrounding speech synthesis markup in the case the expression resolves to a
string. In the case the expression resolves to a special variable such as
_prompt, then the prompt attributes are inherited from the enclosing element
of the definition of the referenced element.

If no surrounding prompt element nor SSML tag is available, then the default
attributes of a prompt element (such as bargein, timeout or language) are
applied.

Consequently, the two following constructions are equivalent.
<catch event="noinput">
  <value expr="'please retry'">
</catch>

<catch event="noinput">
  <prompt>
      <value expr="'please retry'">
  </prompt>
</catch>
"

Resolution: accepted with modifications

We accept that clarification is required but not the proposed modification. We will clarify in 4.1.2 that for cases where prompt content is specified without prompt element then attributes are defined as specified in table 33.

Email Trail:

Issue CR6-1

From Guillaume Berche

1- Precise that buffered non-matching DTMF are discarded when an ASR grammar
matches.

It is unclear in the specifications whether the following document

<form name="form1">
  <field>
     <grammar src="builtin:grammar/boolean"/>
     <grammar src="builtin:dtmf/digits?length=4"/>
  <field>
  <filled>
     <goto next="#form2">
  </filled>
</form>

<form name="form2">
  <field>
     <grammar src="builtin:dtmf/digits?length=1"/>
  <field>
  <filled>
     <prompt>thanks for the dtmf</prompt>
  </filled>
  <noinput>
     <prompt>DTMF was discarded</prompt>
  </noinput>
</form>

By pressing the 1 key and speaking "yes" and waiting for the input timeout.
Should the interpreter play the "thanks for the dtmf" prompt or the "DTMF
was discarded" prompt?

Suggested solution: specify that partially buffered data are flushed in case
of grammar match in another mode.

Resolution: accepted with modfications

We will modify the specification to make it clear that this is a platform-specific issue (i.e. platforms may differ in whether or not they discard buffered non-matching DTMF when an ASR grammar matches).

Email Trail:

Issue CR6-2

From Guillaume Berche

2a- Rationale for not accepting local ruleref in inline SRGS grammars?

Can you please provide rationale for not accepting ruleref elements with
pure fragment URLs? Why would this be rejected in grammars provided inline
in VXML documents? What is the reason driving this restriction and forcing
to use remote grammars for any grammar using private rules?

Resolution: accepted with modifications

This is probably a misunderstanding on both sides. In section 3.1.1.4, the paragraph beginning "When referencing an external grammar, the value of src attribute ...", describes which values for the src attribute are permitted and which are not (the last paragraph of this section). It makes no statement about inline grammars. In particular, "Local rule reference: a fragment-only URI is not permited. (See definition in Section 2.2.1 of [SRGS]). A fragment-only URI value for the src attribute causes an error.semantic event." is intended to indicate that it is not permitted to have a fragment-only URI value for the src attribute in a VoiceXML <grammar> element. The simplest clarification is to start the last paragraph of this section "**And** the following are the forms of rule reference defined by [SRGS] that are not supported in VoiceXML 2.0. ...". For <ruleref>s in inline grammars, it is possible to refer rules within the same grammar, or an external grammar. What is not possible is to reference rules within a different inline grammar in a VoiceXML document since the uri is then pointing at a VoiceXML document not a grammar document. We believed that is clearly implied by VoiceXML and SRGS (especially with the clarification above) and that a separate clarfication is not required.

Email Trail:

Issue CR6-4

From Guillaume Berche

3- Precise that when transitionning to a document (without fragment in the
URI) and the transitionned document has no form, then the interpreter exits

Rationale: it can not be requested that every document have at least a
dialog (because a root application may only define variables or links),
however when transitionning to a document (without specifying a dialog) and
this document has no dialog defined, then the execution stops.

Suggested modification to section "5.3.7 GOTO"

"If the form item, dialog or document to transition to is not valid (i.e.
the form item, dialog or document does not exist), an error.badfetch must be
thrown. Note that for errors which occur during a dialog or document
transition, the scope in which errors are handled is platform specific. For
errors which occur during form item transition, the event is handled in the
dialog scope. If the document to transition has no dialog defined (and no
specific dialog was specified), then the execution stops."

Resolution: rejected

We believe it is already precise: a document to transition to without dialog is not valid, so an error.badfetch is thrown as already stated in 5.3.7.

Email Trail:

Issue CR6-5

From Guillaume Berche

4- Precise Prompt selection algorithm when the Prompt element appears as
executable content.

It does not seem clear from the examples provided in section "4.1.6 Prompt
Selection" whether the "prompt tappering" mechanism is supposed to be
applied when a prompt element appears as executable content.

For instance in the following case:

<field ...>
<help>
   <prompt count="1"> prompt 1 </prompt>
   <prompt count="3"> prompt 2 </prompt>
   <goto next="#form2"/>
   <prompt count="4"> prompt 3 </prompt>

</help>
</field>

Which prompt should be heard when the prompt counter of the current form
item (the field in this same) is 4? Applying the algorithm described in
section "4.1.6 Prompt Selection" would result in having the "prompt 3"
speech text to be heard, however it would be very confusing from the VXML
author point of view because it would be expected that after the goto
element no more executable content would be executed as specified in
Appendix C in the definition of the "execute" term.

Suggested modification to section "4.1.6 Prompt Selection":

"Each input item, <initial>, and menu has an internal prompt counter that is
reset to one each time the form or menu is entered. Whenever the system uses
a prompt, its associated prompt counter is incremented. This is the
mechanism supporting tapered prompts within form item elements. **When a
prompt element is specified as executable content (e.g. inside a catch or
filled element) then its count element is ignored and all prompts contained
in this element as queued in document order)**"

Resolution: rejected

As stated in 5.3.5 the count attribute on prompts in executable content is meaningless.

Email Trail:

Issue CR6-6

From Guillaume Berche

5- Precise the value of name$.inputmode when a transfer is not interrupted
by user input

Suggested modification to "Table 22: <transfer> Shadow Variables"

"name$.inputmode     The input mode of the terminating command (dtmf or
voice) or **undefined if the transfer was not interrupted by a grammar
match**"

Resolution: accepted

We will apply the suggested modification.

Email Trail:

Issue CR6-7

From Guillaume Berche

6- Correct typo in example of Section "4.1.3 Audio Prompting"

The extension of the file should rather be .vxml to not introduce confusion.
"<goto next="./make_bid.html"/>"

Resolution: accepted

We will correct the typo.

Email Trail:

Issue CR6-8

From Guillaume Berche

7- Precise that alternate audio is recursive:

According to the schema, the following vxml fragment is legal

<prompt>
  <audio src="http://www.dummy.org/main.wav" >
    <audio src="http://www.dummy.org/alternate1.wav" >
        <audio src="http://www.dummy.org/alternate2.wav"/ >
    </audio>
  </audio>
</prompt>

Can you please confirm my understanding of the specification: I understand
that if both main.wav and alternate1.wav can not be played, but
alternate2.wav can be played, then alternate2.wav will be played and no
error will be thrown.

Resolution: accepted

Your understanding is correct. No modifications will be made to the text since we believe this is sufficiently clear already.

Email Trail:

Issue CR6-9

From Guillaume Berche

8- Precise behavior of submit if undeclared/unvalid variables are references
in submit's namelist attributes

The specifications section "5.3.8 SUBMIT" states the following

"The list of variables to submit. By default, all the named input item
variables are submitted. If a namelist is supplied, it may contain
individual variable references which are submitted with the same
qualification used in the namelist. Declared VoiceXML and ECMAScript
variables can be referenced."

It does not specify the expected behavior in case an undeclared variable or
an invalid variable name is referenced in the namelist attribute.

Suggested modification to section "5.3.8 SUBMIT":
"namelist          The list of variables to submit. By default, all the
named input item variables are submitted. If a namelist is supplied, it may
contain individual variable references which are submitted with the same
qualification used in the namelist. Declared VoiceXML and ECMAScript
variables can be referenced. **If an undeclared or invalid variable name is
referenced then an "error.semantic" event is thrown**"

Resolution: accepted with modifications

We will modify the specification to clarify that an error.semantic is thrown when an undeclared variable is referenced, including reference within the namelist of a submit element (as well as exit, return, and subdialog elements).

Email Trail:

Issue CR6-12

From Guillaume Berche

11- Typo in section "2.3.6. RECORD"

The second sentence of the extract below seems incomplete, I don't get the
impact of the timeout interval on having a record variable unfilled.

"If no audio is collected during execution of <record>, then the record
variable remains unfilled (note). This can occur, for example, when DTMF or
speech input is received during prompt playback or the timeout interval (if
the developer wants input during prompt playback to initiate recording, then
prompts should be placed in an immediately preceding <field> with a zero
timeout). "

Resolution: accepted

We will modify the text so that the second sentence reads "This can occur, for example, when DTMF or speech input is received during prompt playback or *before* the timeout interval *expires* ..."

Email Trail:

Issue CR6-13

From Guillaume Berche

12- Typo in "Last Call Disposition of Comments"

The table in section "2. Comments" has an invalid "disposition" content: all
items are marked as accepted whereas this is not the case.

Resolution: accepted

No action since this document will be replaced by a CR disposition of comments document.

Email Trail:

Issue CR7-1

From Max Froumentin

I would like to object that all the examples in VoiceXML2 come with an
XML declaration and a schemaLocation attribute. It makes the language
appear unneccesarily complex. The Hello World example would be much
simpler as:

<vxml xmlns="http://www.w3.org/2001/vxml" version="2.0">
  <form>
    <block>Hello World!</block>
  </form>
</vxml>

schemaLocation bothers me more than by just making the examples hard to 
read. It suggests that the declaration is mandatory (which the XMLSchema
refutes), or even that the use of the schema is.

Resolution: rejected

It is good practise to provide the XML declaration (even though it is not mandatory). Providing the schemaLocation allows documents to be validated automatically by various tools, although as you correctly point out neither the attribute nor schema are mandatory.

Email Trail:

Issue CR8-1

From Matt Porter

this has to do Guillaume's question...
with let me elaborate on an issue with <record> that i dont understand.  Given this dialog...
 
<?xml version="1.0" encoding="UTF-8"?> 
<vxml version="2.0">
<form>
   <record  name="msg" beep="true" maxtime="10s" finalsilence="4000ms" dtmfterm="true" type="audio/x-wav">
       <prompt timeout="5s">Record a message after the beep.</prompt>
       <noinput>
            I didn't hear anything, please try again.
       </noinput>
      </record>
</form>
</vxml>

 
what if the user does not say anything ( no audio is collected because of
silence detection or whatever ), but terminates the recording with a DTMF.  it
seems to me the "termchar" shadow variable should hold the key they pressed,
and the "noinput" event would still be thrown...is this correct?
 
The <record> section seems to need more clarification....

Resolution: rejected with modifications

If dtmfterm is set to true, recording is terminated when any dtmf key is pressed ("Any DTMF keypress matching an active grammar terminates recording") but if no audio has been collected, then the record variable is not filled ("If no audio is collected during execution of <record>, then the record variable remains unfilled.") and consequently no shadow variables are assigned. The FIA then applies as normal without a noinput event being thrown; in your example, the prompt would be read again and another attempt at recording initiated. This is analogous to the situation with complex grammar result which don't assign any values to form input item variables, but no noinput event is thrown and the FIA applies as normal. Finally, note that there may be information available in these situations via the application.lastresult$ as described in 5.1.5. We will modify the specification to make clearer that information may be available via the application.lastresult$ in these situations.

Email Trail:

Issue CR9-1

From John Voger

Under section 3.1.1.3 Grammar Weight.



The last paragraph contains


..... real speech and textual data on a paricular platform."


Please replace "paricular" with "particular"

Resolution: accepted

We will correct the typo.

Email Trail:

Issue CR10-1

From Philippe Le Hegaret

[ECMASCRIPT] 
        " Standard ECMA-262 ECMAScript Language Specification  ",
        Standard ECMA-262, December 1999.
        See http://www.ecma.ch/ecma1/STAND/ECMA-262.htm
        
should read


[ECMASCRIPT] 
        " Standard ECMA-262 ECMAScript Language Specification  ",
        Standard ECMA-262, December 1999.
        See
        http://www.ecma-international.org/publications/standards/ECMA-262.HTM

Resolution: accepted

We will update the reference.

Email Trail:

Issue CR11-1

From C. M. Sperberg-McQueen

1. Several complex type definitions in vxml.xsd have <choice> model
groups that contain a single particle consisting of a reference to a
group. For example:

   <xsd:complexType name="basic.event.handler" mixed="true">
     <xsd:choice minOccurs="0" maxOccurs="unbounded">
       <xsd:group ref="executable.content" />
     </xsd:choice>
     <xsd:attributeGroup ref="EventHandler.attribs" />
   </xsd:complexType>

Since the particle in the group executable.content is also a <choice>,
this content model becomes

   <xsd:choice minOccurs="0" maxOccurs="unbounded">
     <xsd:choice>
       <xsd:group ref="audio"/>
       <xsd:element ref="assign"/>
       <xsd:element ref="clear"/>
       ... ...
     </xsd:choice>
   </xsd:choice>

The outer <choice> is clearly redundant.  The complex type definition
can be simplified to:

   <xsd:complexType name="basic.event.handler" mixed="true">
     <xsd:group ref="executable.content" minOccurs="0" 
maxOccurs="unbounded" />
     <xsd:attributeGroup ref="EventHandler.attribs" />
   </xsd:complexType>

We think such a simplification makes the schema easier to follow and
we recommend the change.

Resolution: accepted

Change applied.

Email Trail:

Issue CR11-2

From C. M. Sperberg-McQueen

2. Some contents may usefully be constrained more tightly than the
schema now constrains them. For example, the <if> element is declared
as:

   <xsd:element name="if">
     <xsd:complexType mixed="true">
       <xsd:choice minOccurs="0" maxOccurs="unbounded">
         <xsd:group ref="executable.content" />
         <xsd:element ref="elseif" />
         <xsd:element ref="else" />
       </xsd:choice>
       <xsd:attributeGroup ref="If.attribs" />
     </xsd:complexType>
   </xsd:element>

Since there is no order or occurence constraint, instances such as the
following are all valid, which seems too flexible.

   <if>
     ...
     <else/>
     ...
     <else/>
     ...
     <elseif/>
     ...
   </if>

The content can be changed to the following to ensure that all
<elseif> elements occur before <else> and that there is no more than
one <else> element:

   <xsd:element name="if">
     <xsd:complexType mixed="true">
       <xsd:sequence>
         <xsd:group ref="executable.content minOccurs="0"
                    maxOccurs="unbounded" />
         <xsd:sequence minOccurs="0" maxOccurs="unbounded">
           <xsd:element ref="elseif" />
           <xsd:group ref="executable.content minOccurs="0"
                      maxOccurs="unbounded" />
         </xsd:sequence>
         <xsd:sequence minOccurs="0" maxOccurs="1">
           <xsd:element ref="else" />
           <xsd:group ref="executable.content minOccurs="0"
                      maxOccurs="unbounded" />
         </xsd:sequence>
       </xsd:sequence>
       <xsd:attributeGroup ref="If.attribs" />
     </xsd:complexType>
   </xsd:element>

(In passing, we note that on general principles, we believe the
language would be easier to describe and use if the 'elseif' and
'else' elements (and a 'then' element) were not empty elements
followed by appropriate executable content, but non-empty elements
which contained the appropriate executable content.  We recognize
that this may not be a feasible change at this stage in the life of
VoiceXML.)

Resolution: accepted

Change applied. We will look into changing the if-then-else structure in a future version of the language.

Email Trail:

Issue CR11-3

From C. M. Sperberg-McQueen

3. The element "output" in vxml.xsd is declared as abstract, and not
used or referenced anywhere else. The declaration may be removed.

Resolution: accepted

Element removed.

Email Trail:

Issue CR11-4

From C. M. Sperberg-McQueen

4. The VariableName.datatype in vxml-datatypes.xsd has a pattern:

   xsd:pattern value="['$'\c]+" />

The character '$' in the range doesn't need the quotation mark, and as
written the value will accept single quotation marks where a dollar
sign or \c is expected.  We suspect this is not intended.

Resolution: accepted

Change applied.

Email Trail:

Issue CR11-5

From C. M. Sperberg-McQueen

5. The ContentType.datatype in vxml-datatypes.xsd is defined as a list
of string. Since string may contain whitespaces, the definition should
perhaps be changed to a list of token; this is less subject to
misunderstanding by readers of the schema.

Resolution: accepted

Change applied.

Email Trail:

Issue CR11-6

From C. M. Sperberg-McQueen

6. According to the comments in the annotations,
VariableNames.datatype, RestrictedVariableNames.datatype, and
EventNames.datatype are lists of atomic VariableName.datatype,
RestrictedVariableName.datatype and EventNames.datatype
respectively. We believe they should be defined as such rather than as
NMTOKENS or other types:

   <xsd:simpleType name="RestrictedVariableNames.datatype">
     <xsd:annotation>
       <xsd:documentation>space separated list of restricted
         variable names </xsd:documentation>
     </xsd:annotation>
     <xsd:list itmeType="RestrictedVariableName.datatype"/>
   </xsd:simpleType>

   <xsd:simpleType name="VariableNames.datatype">
     <xsd:annotation>
       <xsd:documentation>space separated list of variable names
         including shadow variables</xsd:documentation>
     </xsd:annotation>
     <xsd:list itemType="VariableName.datatype">
   </xsd:simpleType>

   <xsd:simpleType name="EventNames.datatype">
     <xsd:annotation>
       <xsd:documentation>space separated list of
         EventName.datatype</xsd:documentation>
     </xsd:annotation>
     <xsd:list itmeType="EventName.datatype"/>
   </xsd:simpleType>

Resolution: accepted

Change applied.

Email Trail:

Issue CR11-7

From C. M. Sperberg-McQueen

7. Some suggestions for simple type Repeat-prob.datatype in
grammar-core.xsd:

a. The base type might better be made decimal instead of float. It
should be noted that decimal is not a subtype of float and their
mappings from the lexical space to the value space are different. For
example, '1.1' may be rounded to some float value different from
exactly 1.1. Such behavior is not expected in decimal.

b. The maxInclusive value is 1.0, while the patterns allow any
positive values less than 10. They should be made consistent.

c. The pattern ([0-9]+)? should probably be replaced with the
equivalent pattern [0-9]*.

Resolution: accepted

Changes applied.

Email Trail:

Issue CR11-8

From C. M. Sperberg-McQueen

8. The commented-out pattern constraint in
RestrictedVariableName.datatype in vxml.xsd needs to be removed or
fixed.

Resolution: accepted

Change applied.

Email Trail:

Issue CR12-1

From Guillaume Berche

1- precise behavior when only activated grammars are disabled by "inputmodes"
property

In the following example, what is the expected behavior? Should an
error.semantic be thrown as would if no grammar was activated as described in
section "3.1.4 Activation of Grammars"? Should the grammars considered rather
as activated but would not match as described in section "6.3.6 Miscellaneous
Properties" (inputmodes property) ", and thus lead to a nomatch event to be
thrown?


Section "3.1.4 Activation of Grammars" states that "If no grammars are active
when an input is expected, the platform must throw an error.semantic event".

Section "6.3.6 Miscellaneous Properties" states that "For instance, voice-only
grammars may be active when the inputmode is restricted to DTMF. Those
grammars would not be matched, however, because the voice input modality is
not active. "

<menu>
         <prompt>
       Choose wind speed and after temperature then finaly ask for leave choice test.
         </prompt>
     <choice next="#exacte_rain"> rain humidity </choice>
     <choice next="#approx_wind"> wind speed </choice>
     <choice next="#approx_weat">temperature celcius</choice>
     <choice next="#exacte_leave">Leave choice test </choice> </menu>

Suggested modification to Section "6.3.6 Miscellaneous Properties" (inputmodes
definition) "[..] For instance, voice-only grammars may be active when the
inputmode is restricted to DTMF. Those grammars would not be matched, however,
because the voice input modality is not active. If among all grammars active
none can be matched because their associated input modality is not enabled,
then a nomatch event is thrown."

Resolution: rejected

Your question is not very clear but given a menu with active speech grammars and no user input, then a noinput event would be thrown. This also applies if the input mode is set to dtmf only; an error.semantic event would not thrown since the statement in 3.1.4 only applies when there are no active grammars - and there are active grammars in this example, even though their input mode is disabled. In essence, grammar activation is separate from input mode activation.

Email Trail:

Issue CR12-2

From Guillaume Berche

2- out-of-date fetching algorithm: maxage defaults to property value

Section "6.1.2 Caching" specifies the following:

"[...]
If a maxage value is provided,
[...]
Otherwise,
If the resource has expired,
Perform maxstale check.
Otherwise, use the cached copy."

I understand that the predicate "If a maxage value is provided" is always
true, as there are default values for the different maxage properties
(audiomaxage, documentmaxage, grammarmaxage, objectmaxage, scriptmaxage...) as
specified in section "6.3.5 Fetching Properties"

Suggested modification to section "6.1.2 Caching": remove the "If a maxage
value is provided, " part and the corresponding "otherwise" statement.

Resolution: accepted with modifications

We will clarify that maxage and maxstale properties are allowed to have no default value whatsoever. If the value is not provided by the author, and the platform does not provide a default value, then the value is undefined and the 'Otherwise' clause of the algorithm applies. All other properties must provide a default value (either as given by the specification or by the platform).

Email Trail:

Issue CR12-3

From Guillaume Berche

3- schema forbids empty catch event name

Section "5.2.4 Catch Element Selection " specifies the following "The name of
a thrown event matches the catch element event name if it is an exact match, a
prefix match or the catch event name is not specified. A prefix match occurs
when the catch element event attribute is a token prefix of the name of the
event being thrown, where the dot is the token separator, all trailing dots
are removed, and the empty string matches everything. "

However, the schema forbids an empty string event specification as illustrated below:

    <xsd:element name="catch">
        <xsd:complexType>
            <xsd:complexContent mixed="true">
                <xsd:extension base="basic.event.handler">
                   <xsd:attribute name="event" type="EventNames.datatype"/>
                </xsd:extension>
            </xsd:complexContent>
        </xsd:complexType>
    </xsd:element>


    <xsd:simpleType name="EventNames.datatype">
        <xsd:annotation>
            <xsd:documentation>space separated list of EventName.datatype</xsd:documentation>
        </xsd:annotation>
        <xsd:restriction base="xsd:NMTOKENS"/>
    </xsd:simpleType>


The schema specifications
(http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#NMTOKENS) defines
NMTOKENS as the following:

"3.3.5 NMTOKENS
[Definition:]   NMTOKENS represents the NMTOKENS attribute type from [XML
1.0 (Second Edition)]. The value space of NMTOKENS is the set of finite, non-zero-length sequences of NMTOKENs."

Resolution: accepted with modifications

We will modify the description in 5.2.4 to make it clearer that event names cannot be empty strings (i.e. event="" is illegal) but can be unspecified (i.e. <catch> ....) and can prefix match when dots are removed (e.g. event="." will match any event).

Email Trail:

Issue CR13-1

From Greg FitzPatrick

4.1 Prompts  (from Version 2.0 - 20 February 2003)


does not mention the version attribute 

as described by the DTD and schema

<!ATTLIST prompt
    bargein %boolean; #IMPLIED
    bargeintype %bargeintype; #IMPLIED
    cond %expression; #IMPLIED
    count %integer; #IMPLIED
    xml:lang NMTOKEN #IMPLIED
    timeout %duration; #IMPLIED
    xml:base %uri; #IMPLIED
    version CDATA #FIXED "1.0"
>

Resolution: rejected

Since the version is fixed, we see no reason to discuss it further in the text.

Email Trail:

Issue CR14-1

From Guillaume Berche

1- Precise that timeout attribute of the <prompt/> element only applies if
the prompt element is not empty

The <prompt> element is designed to queue prompt element for play. As a side
effect it also set the timeout for the next input collection. However, the
specifications do not describe the expected behavior if the prompt element
is empty: should tghe side-effect still apply? This would make little sense:
this would be a synonym for a "set next timeout" command without queueing
any prompt.

Suggested addition to section "4.1 Prompts":

"Note that an empty prompt such as "<prompt [...]/>" will be silently
ignored and in particular would not set the timeout of the next input
collection phase"


Dependency: IR testsuite case #539 (389/389.vxml)

Resolution: rejected

The timeout property applies as normal even if there is no content in the prompt.

Email Trail:

Issue CR14-2

From Guillaume Berche

2- Precise behavior if "cond" attribute of form item does not resolve into
an EcmaScript boolean value

The VXML specification state the following in section "2.1.3 Form Item
Variables and Conditions":

"cond   An expression to evaluate in conjunction with the test of the form
item variable. If absent, this defaults to true, or in the case of
<initial>, a test to see if any input item variable has been filled in."

However, the specifications do not detail the expected behavior if the
expression does not resolve to a boolean.

Suggested modification to "2.1.3 Form Item Variables and Conditions"
"cond   An expression to evaluate in conjunction with the test of the form
item variable. If absent, this defaults to true, or in the case of
<initial>, a test to see if any input item variable has been filled in. If
the evaluation of the expression results into an error or does not resolve
into an ECMAScript boolean value, then an error.semantic event is thrown."

Resolution: rejected

In the specific description of cond attributes it states that it is "An expression that must evaluate to true after conversion to boolean in order for the form item to be visited". Boolean conversion of an ECMAScript expression always returns either true or false.

Email Trail:

Issue CR15-1

From Ufuk Kayserilioglu

We are trying to implement the <record> tag in our Voice Browser in a
comformant way; however, we cannot understand what, clearly, are the
requirements from a browser for this tag. My points can be summed up as
follows:

I) The main confusion arises form the behaviour of bargein="true"
prompts in <record>. According to Fig 7 in section 2.3.6 (lower left
corner) bargein controls apply to audio queued within <record>. On the
other hand, a few lines below, it is stated:

"A /recording begins/ at the earliest after the playback of any prompts
(including the 'beep' tone if defined). As an optimization, a platform
may begin recording when the user starts speaking."

Now, if recording does not begin DURING the prompt playback, then how
can those prompts be barged-in? Or, should we understand that if the
user barges-in with voice during prompt playback THEN recording should
be started? A clarification of how <record> and audio queued within
<record> with barge-in interacts, in our opinion, is badly needed.

II) The second comment that baffles us in the spec is:

"If no audio is collected during execution of <record>, then the record
variable remains unfilled (note
<http://www.w3.org/TR/voicexml20/#unfilled_record>). This can occur, for
example, when DTMF or speech input is received during prompt playback or
the timeout interval (if the developer wants input during prompt
playback to initiate recording, then prompts should be placed in an
immediately preceding <field> with a zero timeout)." (Section 2.3.6)

This comment is weird in two ways:

  1) How can record variable be unfilled "when DTMF or speech input is
received during ... the timeout interval"? This seems to be the primary
method of filling a record variable.

  2) We cannot grasp, in any way, how it would be possible to achieve
what the spec author has stated within the parantheses. If there is
preceeding <field> with zero timeout then:
    i) if the user starts speaking while the prompts in the <field> are
playing then the input goes to the processing of the field and will be
matched to whatever grammar is specified for it, or will throw a "nomatch",
    ii) else if the user waits for the prompts to finish, then a
"noinput" event will be thrown.
  In neither case, will the input be going into the <record> tag that
succeeds the <field> tag. If the spec is trying to say something else
then it should be clearly explained.

Resolution: rejected with modifications

I). Prompts can be barged in on if active DTMF grammars are defined (active speech grammars too but the ability to combine recognition and recording is may be removed from the specification due to a lack of implementation support). II.1) DTMF input with recording triggered by voice activity detection (i.e. as platform optimization, instead of recording starting immediately after prompt playback, recording only begins when voice activity is detected). II.2) We agree this is confusing (it was intended to cover another use case). So we will remove the text in parenthesis "(if the developer wants input during prompt playback to initiate recording, then prompts should be placed in an immediately preceding <field> with a zero timeout) "

Email Trail:

Issue CR16-1

From Mark Clark

In section 4.1.5 you make the following statement:

"In the case where several prompts are queued, the bargein attribute of each
prompt is honored during the period of time in which that prompt is playing"

I am concerned about the scenario where a barge in *true* prompt is followed
by a barge in *false* prompt while waiting for speech input. I have not yet
encountered a speech recognition engine that allows recognition to ignore
input once recognition waiting has begun.

It would seem reasonable to me that for Speech Recognition, if a barge in
*true* is followed by a barge in *false* prompt, that the *false* (and any
subsequent false settings) would be ignored by the speech recognition until
the next transition state.

I see no problem for the reverse condition. If the first prompt is barge in
*false*, then just play the prompt without starting recognition. Only when a
barge in *true* prompt is encountered is the recognition waiting started.

Resolution: rejected

It is possible to implement *true* to *false* bargein by re-starting recognition. We realize this may not be the perfect solution, but we are reluctant to change it at this stage in the standards process. A future version of the language may provide a better solution.

Email Trail:

Issue CR17-1

From Robert Barkan

In all revs of the VXML 2.0 spec, Appendix J (Changes from VoiceXML 1.0),
"Modified Elements" section, it says:
    added "error.unsupported.language" pre-defined error type (5.3.6) 

However, the reference to section 5.3.6 points to "REPROMPT", which doesn't
have this error listed, and I don't understand any scenarios where REPROMPT
could throw this event.

We are working on a project porting a product from VXML 1.0 to 2.0, and if
this change actually does impact the REPROMPT element, we need to understand
it better.

Alternately, is it possible that this is a typo in the spec, and that instead
of "5.3.6", it should really refer to section "5.2.6" - "Event Types" which
would make complete sense?

Resolution: accepted

It is a typo and will be corrected to 5.2.6.

Email Trail:

2.2 Technical Errors

Issue CR2-1

From Arnaud Vallee

I need some clarification on this point:

2.3.7.1 Blind Transfer

With a blind transfer, an attempt is made to connect the original caller with
the callee. Any prompts preceeding the <transfer>, as well as prompts
within the <transfer>, are queued and played before the transfer attempt
begins; bargein properties apply as normal.

As the transfer is modal, a bargein can happen only if we define a grammar
under transfer.  But what is the consequence of matching the grammar with a
recognition result while the prompt are played?  What will be the value of the
transfer item variable?
Analysis:
[Teemu Tingander]

As you said that transfer is always modal the grammars that are inside
<transfer> element are field item grammars and as such they should filled
the field item specified by name tag. But cause this is a transfer and the
specification says that match in grammar of transfer should terminate the
transfer, my opinnion is that the field should be filled with
'near_end_diconnect' and put the shadow variables as they should be
f$.duration=0.0,f$.utterance=<what-was-recognized>,f$.inputmode=inputmode..

You have the point in here taht specification really does make difference with
the cases

    The possible outcomes for a bridge transfer before the connection to
the callee is established are:
and
    The possible outcomes for a bridge transfer after the connection to
the callee is established are:

And it is not clearly said what should be done if bargein happens. This case
should be defined in the first one of those cases. 

This same issue raises with blind as well as bridgerd transfer, and i used
'near_end_diconnect' to indicate that the caller has requested to cancel or
disconnect the call. 

And what comes in tagging of those grammars, if someone really finds some
reason for that, could explain it more deeply.

[Ken Rehor]
Your summary is correct. The result should be 'near_end_disconnect' if a
caller cancels a transfer by barging in on a prompt, for both blind and bridge
transfers. This is because prompts are queued and played to completion before
the call transfer begins in either case.

The shadow variables would be filled as you describe.

This will be clarified in a future revision of the specification.

Resolution: accepted

Following the thread responses by Teema Tingander and Ken Rehor, the specification will be modified to indicate that the transfer item variable will have the value 'near_end_disconnect' if a caller cancels a transfer by barging in on a prompt, for both blind and bridge transfers and the shadow variables will be filled as described above.

Email Trail:

Issue CR6-10

From Guillaume Berche

9- Inconsistent variable scope description:

In section "5.1.2 Variable Scopes", the dialog variable scope is defined as:

"dialog     Each dialog (<form> or <menu>) has a dialog scope that exists
while the user is visiting that dialog, and which is visible to the elements
of that dialog. Dialog variables are declared by <var> and <script> child
elements of <form> and by the various form item elements.  "

However, a block element is also a form item, as such variables defined in
its are part of the dialog scope.

Then the anonymous variable scope is defined as:

"(anonymous)  Each <block>, <filled>, and <catch> element defines a new
anonymous scope to contain variables declared in that element."

This definition is in contradiction with the first definition which
specified that the block element had its variables assigned into the dialog
scope.


Correction suggestion to section "5.1.2 Variable Scopes":
"(anonymous)  Each <filled>, and <catch> element defines a new anonymous
scope to contain variables declared in that element." (note block was
removed from the description to make it consistent with the definition of
the dialog scope)

Resolution: accepted with modifications

We reject the suggested correction, but accept that a clarification is required in the definition of dialog scope, namely, that form element item names are being referred to.

Email Trail:

Issue CR6-11

From Guillaume Berche

10- Catch only apply to input items not to control items

Section "5.2.2 Catch" state that "The catch element associates a catch with
a document, dialog, or form item." However, the schema for block is the
following:

"    <xsd:element name="block">
        <xsd:complexType mixed="true">
            <xsd:choice minOccurs="0" maxOccurs="unbounded">
                <xsd:group ref="executable.content"/>
            </xsd:choice>
            <xsd:attributeGroup ref="Form-item.attribs"/>
        </xsd:complexType>
    </xsd:element>"

Suggested correction to Section "5.2.2 Catch":
"The catch element associates a catch with a document, dialog, or a form
item except for blocks."

Resolution: accepted

We will apply the suggested correction.

Email Trail:

2.3 Requests for Change to Existing Features

Issue CR5-16

From Guillaume Berche

15- Inconsistent behavior of option vs choice.

- Choice can have nested grammars that override the default grammar whereas
options can not
- Choice can have nested audio as alternate prompts whereas options can not
- Menus can have a dtfm boolean flag to turn on automatic dtmf grammar
generation where as options within fields can not.

Suggested modification: upgrade options so that they are equivalent to
choice and only differ in the treatment of a match (which in the case of
options does not trigger a transition)

Resolution: rejected

Making them consistent at this stage in the specification is problematic. However, we will consider this issue for a future version of VoiceXML.

Email Trail:

Issue CR6-3

From Guillaume Berche

2b- Schema imposes that grammar rule roots and [private] rule ids are unique
among grammar elements on a same VXML document.

The VXML schema imposes the following constraint to the root attribute of
the grammar element:

    <xsd:simpleType name="Root.datatype">
        <xsd:annotation>
            <xsd:documentation>does not expression the constraint that NULL
VOID GARBAGE
are illegal as rule name</xsd:documentation>
        </xsd:annotation>
        <xsd:restriction base="xsd:IDREF">
            <xsd:pattern value="[^.:-]+"/>
        </xsd:restriction>
    </xsd:simpleType>

I understand this implies that it is illegal to have two different grammars
with refering to distinct root rules with the same name.


In addition, the VXML schema imposes the following constraint to the id
attribute of the rule element:

    <xsd:simpleType name="Id.datatype">
        <xsd:annotation>
            <xsd:documentation>
does not expression the constraint that NULL VOID GARBAGE are illegal as
rule name
</xsd:documentation>
        </xsd:annotation>
        <xsd:restriction base="xsd:ID">
            <xsd:pattern value="[^.:-]+"/>
        </xsd:restriction>
    </xsd:simpleType>

I understand this implies that it is illegal in a same VXML document to have
two different grammars with private rules that have the same id. To me this
defeats the purpose of SRGS private rules (even if referencing to one inline
private is currently forbidden in VXML as noted in remark #2a)


Suggested modification: investigate modification of the VXML schema to waive
the restrictions described above.

Resolution: rejected

This is a common problem with using ids on elements from multiple namespaces within the same document. The W3C Schema working group are aware of the problem and we may be able to provide a better solution in future versions of VoiceXML.

Email Trail:

Issue CR18-1

From Mark Clark

Is there really no way to specify a URI for an external SSML file in VXML 2.0?
I am looking for an "src=" attribute of the <prompt> element that could
specify resources whose mime types are either "application/ssml+xml" or
"text/plain". This would be analogous to the "src=" attribute of the
<grammar> element that takes a URI specifying a resource whose mime type is
"application/srgs+xml" or "application/srgs". Currently it appears that all
Speech markup must be in line.

Am I missing something?

Resolution: rejected

The specification doesn't provide a mechanism to reference external SSML documents. This will be considered for a future version of the language.

Email Trail:

2.4 Requests for New Feature

Issue CR19-1

From Pavel Cenek

1. What did you say?
--------------------
In a real dialog in a noisy environment or when a user is not
concentrated, it can happen that a user does not understand properly a
system prompt and wants the system to repeat it. The last prompt should be
repeated without increasing the prompt counter in order to repeat really
the same prompt. It is not possible to do it in the current version of
VoiceXML.

I suggest the following solution: Add an atribute to the <reprompt> tag,
which allows to repeat prompts without increasing the prompt counter.

Resolution: rejected

This will be considered for a future version of the language.

Email Trail:

Issue CR19-2

From Pavel Cenek

2. detection and handling of multiple fills of one slot
-------------------------------------------------------
VoiceXML provides no means for detection and handling the situation when a
slot value is re-specified.

In real conversation it can happen that a participant specifies a piece of
information twice with different value. The normal reaction is that the
other participant detects this situation and asks the first one for a
clarification. VoiceXML has no means for doing this.

I suggest the following solution: Define a standard event, e.g.
slotredefinition.slotname that would be thrown in such a case and the old
value would be contained in the _message variable in the <catch> tag's
anonymous scope.

Resolution: rejected

This can be done within current specification by storing the values and when new input is received, comparing the stored values with the latest values. A future version of the language may provide a more flexible approach along the lines you suggest.

Email Trail: