Google Search diml.org
[ french ]

Table des matières
1. Rationale
2. Definitions
3. The data injection gateway
3.1 Calling an XML document
3.2 Functional limitation of the gateway<
4. Tranlating XML identifiers
4.1 The DIML variable namespace
4.1.1 DIML namespace rules
4.1.2 The standard scoping of DIML variables
4.1.3 Translation constraints
4.2 Constructing the DIML translated identifier
4.2.1 Identifying the element
4.2.2 Source document identification
4.3 General syntax for XML derivated identifiers
4.3.1 Expanded form
4.3.2 Expanded form examples
4.3.3 Canonical forms
5. The %xml statement
5.1 Syntax overview
5.2 The source attribute
5.3 The scope attribute
5.4 The select attribute
5.4.1 General syntax for the tree selector
5.4.2 Selecting a single element
5.4.3 Selecting XML elements
5.4.4 Selecting attribute values of XML elements
5.4.5 The "innerText" meta-attribute
5.4.6 The "innerXml" meta-attribute
5.4.7 The "innerHtml" meta-attribute
5.4.8 Using regexp patterns for selecting elements
5.4.9 The super-meta '**' as an element
5.5 The alias attribute
5.6 The nobr attribute
5.7 The flat attribute
A. Appendix
Appendix 1 : References
Appendix 2 : Special thanks
 Latest News
 An application of the  WCT_NEWS  component
>> Top of page

Author: V.G. FREMAUX
E.I.S.T.I. / Cergy, France
Applied Research Laboratory
Internet and Network Applications

Version: 1.0 / Janvier 2002

0. Document scope

This project issued from the settlment of the Applied Research Laboratory of the International Graduate School for Computing Sciences in Cergy (France). We had till now sufficient success with the initial implementation of the DIML processor to keep on specifying and extending the process. We do not still now, when writing this specification, the possible outcome for this document, wether it may be published one day, submitted to the W3C or any other way to have it discussed in a larger community. We tried however to write it in the closest respect of the actual requirements of the Internet Community.

1. Rationale

The present document defines a strategy to have DIML pages fed with data extracted form standardized XML documents.

The general encoding of an electronic document that uses dynamic content generation, however it is published, can be divided in four syntaxical subensembles. The first layer represents the procedural process by which resources and content elements are fetched and prepared in order to produce the document. The second layer is the encoding syntaxes for the presenting options that has been actually efficiently condensed with the use of stylesheets based either on cascading attributes (CSS) or pipe-lined transformers (XSL). Then comes the layout or the physical organization of the distinct parts of the information, that is in another way the document macro-structure. This syntaxic encoding will define the location, the occupied space, and will imply a semantical hierarchy or categorization of content sequences (in-line sequences). The last one is the encoding of the proper content, generally using some set of symbols (character set).

DIML optimizes the document structure management allowing a huge reusability of code sequences that encodes the content and thus allows an efficient separation between layout and process. Conversely, the DIML does not include any strong mechanism to dissociate semantic content and structure/layout. Someone who were exclusively responsible of the "message" writing cannot edit and update the information in a readable and natural way. The real information is usually mixed and merged with layout and structural HTML code.

We though when we started this study that DIML could be a "soft" alternative to XML. We must express now our confidence in that XML combination with DIML when dealing with highly automated document process has a great interest. XML allows some form of extra-clean expression of the basic semantic whithout any influence of layout or presentation. This elementary construct may be sufficient to provide qualified parts of litterals involved in a content, having unique identification capability, and thus allowing them to be extracted and processed as unity elements. The minimalism of the basic XML, when being processed by the DIML processor, can be the native form of a huge amount of litteral content in highly automatized document generation applications.

2. Definitions

XML
"eXtended Markup Language", a global specification for tag-based languages encoding documents by giving a description of their structure and defining the content of each element.

3. The Data Injection Gateway

3.1 Calling A XML Document

The XML Data Injection Gateway allows translating the elements of the XML file for use by the ESSI processor. After an XML document has been called, its XML elements (or some defined part of these elements) are translated as DIML variables. These variables thus can be used as any other DIML variable may it be obtained compiling a template, or invoking scripts or retrieving standard variables.

The gateway will be mainly used for injecting litteral static contents that is part of some dynamic generated contents. This interface should be usefull to get higher management control on informational texts, user support indications, multi-language adaptation, or dynamic process configuration and will allow application logic to be developped and tested whil final writings are executed elsewere. The final contents formated in a simple XML construction will fit accurately in the locations that will be reserved for.

Lets examine a first example :

Consider this document abstract, that is, a user screenboard in a web-based application :

<%if ((%FORM::what% ne "documentation") 
	and (%FORM::what% ne "demos") 
		and (%AUTH::OK% ne "1")) %>
<DIV ID="member_result_shadow" 
	STYLE="position : absolute ; 
	       display : block ; 
          width : 450px ; 
          height : 200px ; 
          top : 143px ; 
          left : 203px ; 
          background-color : 
          rgb(69,77,94)"></DIV>
<DIV ID="member_result" 
   STYLE="position : absolute ; 
          display : block ; 
          width : 450px ; 
          height : 200px ; 
          top : 140px ; 
          left : 200px ; 
          background-color : 
          rgb(200,0,0) ; 
          text-align : center ; 
          vertical-align : middle">
<TABLE WIDTH=90% BORDER=0 CELLSPACING=5 CELLPADDING=0>
<TR><TD ALIGN=right COLSPAN=2>
</TD></TR>
<TR>
<TD VALIGN=top>
   <P CLASS="error"><B>Authentication Error.</B> 
<P CLASS="error">Your login or your password is wrong. You are not admitted in the download zone.
<P CLASS="error"><A 
   HREF="Javascript:history.back()" 
   CLASS="error">Downloads gate</A>
</TD>
</TR>
</TABLE>
</DIV>
<%endif %>

It will allow displaying a pseudo dialog box when an error condition is detected.

Authentication Error.

Your login or your password is wrong. You are not admitted in the download zone.

Downloads gate

Who is responsible for the message semantic will only care on the following elements :

"Authentication error."

,

"Your login or your password is wrong. You are not admitted in the download zone."

et

"Downloads gate".

Any other code sequence will encode either structure (DIV, location, conditional DIML expression, etc.), or formatting (classes , STYLE attributes, etc.). This example sequence does not contain any significant dynamic (process) encoding.

The XML Gateway of the DIML processor allows using a separate XML file to define this sentences in a very clean and readable way, and provides the DIML a mean to extract and inject these content elements in their appropriate locations in the DIML flow.

The above file would be separated in a XML file :

<?xml version="xml1.0" ?>
<error>
	<title>Authentication error.</title>
	<text>Your login or your password is wrong. You are not admitted in the download zone.</text>
	<link_label>Downloads gate</link_label>
</error>

close to a DIML injection framework :

<%if ((%FORM::what% ne "documentation") and (%FORM::what% ne "demos") 
and (%AUTH::OK% ne "1")) %>
<DIV ID="resultat_membre_shadow" STYLE="position : absolute ; 
                                           display : block ; 
                                           width : 450px ; 
                                           height : 200px ; 
                                           top : 143px ; 
                                           left : 203px ; 
                                           background-color : 
                                           rgb(69,77,94)"></DIV>
<DIV ID="resultat_membre" STYLE="position : absolute ; 
                                    display : block ; 
                                    width : 450px ; 
                                    height : 200px ; 
                                    top : 140px ; 
                                    left : 200px ; 
                                    background-color : rgb(200,0,0) ; 
                                    text-align : center ; 
                                    vertical-align : middle">
<TABLE WIDTH=90% BORDER=0 CELLSPACING=5 CELLPADDING=0>
<TR><TD ALIGN=right COLSPAN=2>
</TD></TR>
<TR>
<TD VALIGN=top><P CLASS="error"><B><%%error::title%%></B> 
<P CLASS="error"><%%error::text%%>
<P CLASS="error">
<A HREF="Javascript:history.back()" CLASS="error"><%%error::link_label%%></A>
</TD>
</TR>
</TABLE> 
</DIV>
<%endif %>

The XML Gateway for the DIML MUST :

  • Identify the elements of an XML file and be capable to discriminate any instance.
  • Select a subset of the instances
  • Convert XML discriminators (implicit or explicit) into DIML variable names
  • Allow repeting operation with the same XML document.
  • Allow repeting XML documents based on the same DTD, and care to possible name conflicts.
  • Allow HTML content (namespace:html) to be extracted as litteral content

3.2 Limits Of The Gateway

The XML Gatexay WILL NOT verify the well-formed status of the document. Verifying conformance to a DTD would add much overhead to the production process and would affect performances. Documents are assumed being well-formed in a general XML context, that is having correct SGML element/attribute constructs.

The Gateway WILL NOT consider, process, and apply any XSL rule or stylesheet call. The Gateway IS NOT dedicated to become a complete XML parser. However, some syntaxes used in XSLT filters may be used by the gateway to avoid confusion, and when exact or very similar semantic can be kept.

The content of XML elements may contain:

  • #CDATA sequences (litteral text)
  • HTML tags, that MUST be isolated in a xmlns:html explicit namespace
  • Javascript sequences within <html:SCRIPT>...<html:/SCRIPT> element
  • SGML entities
  • DIML statements or variable calls, that should be parsed when the effective translated data will be used

    4. XML Identifier Translation

    4.1 The DIML Namespace

    4.1.1 Naming Rules In The DIML Namespace

    DIML keeps free the organization of namespaces and scoping segmentation by using extended characters in the variable names. The variable external delimiter %...% allow such use of extended character set. Variables name such as:

    • <%%VARIABLE%%> (1)
    • <%%FUNCTION()%%> (2)
    • <%%VARIABLE1->VARIABLE2%%> (3)
    • <%%SCOPE::VARIABLE%%> (4)
    • <%%SCOPE::SUBSCOPE::VARIABLE%%> (4)
    • <%%TABLE§0%%> (5)

    uses inhabitual characters compared to C, Java, or other languages usual tokens.

    This syntactic flexibility will allow the developer to express its own syntactic semantics, denoting for some pseudo-behaviour of such variables. As in the above examples:

    • (1) the simplest scalar variable
    • (2) semanticizes a "function like" semantic such as a CGI call ("action" or "method" like).
    • (3) a data structure like semantic
    • (4) explicit namespaced variables
    • (5) array semantic

    Extra symbol set will open to a wide range of new semantics, pursuant the unicity of the DIML identifier is preserved.

    However the identifier is constructed, the ESSI processor does not care at the implicit or conventional meaning intended by the developer. The semantic of the variable remains under developer's own responsibility.

    4.1.2 Standard Scopes For DIML Names

    The DIML specification 1.0 defines some standard scopes that were recognized in the first implementations of the ESSI processor.

    The ESSI processor uses the following scopes:

    <%%ENV::variable_suffix%%>
    The standard scope for the system environement variables that were passed to the current executing instance of ESSI process. This environement set has been fed by the Web server with the CGI gateway variables (deprecated, but kept for compatibility).

    <%%CGI::variable_suffix%%>
    The specific scope of variables created by the HTTP server when reporting the CGI interface. This variable scope has been implemented on late versions of the processor.

    <%%FORM::variable_suffix%%>
    The scope of any variable passed through the CGI gateway using forms.

    User scopes
    Any explicit scope obtained by using at least the conventional scope sequence "::" once, whether it is meanful or not.

    The base scope
    Any DIML variable name that do not use the scope sequence "::".

    The ESSI processor in version 2.1 adds a new standard scope:

    <%%FILE::variable_suffix%%>
    Used for file content when "uploading" file through a form-data/multipart [MIME1] form.
    4.1.3 Translation Constraints

    XML element integration in a DIML document framework assumes names are translated in a name scope that the DIML has access to. Constructing such scope needs some constraints to be worked with:

    • The XML resulting scope SHOULD prevent any name cillsion with the other standard or user scopes (specifying an explicit output scope for the XML parser will be a good way to avoid such collisions).
    • The XML scope MUST avoid collision with other XML extractions.
    • Translation must not be DTD dependant, and should neither add nor be influenced by rules within it.
    • The translation MUST assume that some elements may be in multiple instance.
    • The most reasonable base of the name translation is the DTD. But the DTD is a model and defines element type and rules to assemble them. The translation must provide the mechanisme to identify each effective instance of the elements.
    • The translation should construct readable names, easy to write in the DIML source. A process that would resolve names two or three text lines long should have no utility at all.
    • The translation should provide rules to extract the distinct interpretations of the element whether it is terminal or not.

    4.2 Constructing The DIML Name

    4.2.1 Identifying The Element

    Injecting XML elements assumes opening of an XML file occured (!) and the elements have been identified. Analysing a XML file to identify elements needs a three stage process:

    1. The element is located somewhere in the hierarchical construct of the document. Qualifying the element whithin the file scope may be achieved expressing a path to the desired element:

      As an example, let's consider the file /home/diml/xml/xmlsample.xml:

      <?xml version="xml1.0" ?>
      <root>
         <box1>
            <box2>
               <element>sample1</element>
            </box2>
         </box1>
      </root>
      

      The string "sample1" is the reral value of an XML element that should be associated with the path:

      (root.)box1.box2.element.value

      Note : The first term root could have been avoided as the unique root of the XML file. But as the root may be any element and does not have the same name, we decided ot would remain expressely in the path.

      Using this qualification for identification of the value is not sufficient in the general context of XML translation. It assumes no element can be used more than once as direct child of any other element. This would be confusing qualifying and identifying the element. All elements of the following file could not be fully identified:

      <?xml version="xml1.0" ?>
      <root>
         <box1>
            <box2>
               <element>sample1</element>
            </box2>
         </box1>
         <box1>
            <box2>
               <element>sample2</element>
            </box2>
         </box1>
      </root>
      

      Note that this file would be well-formed when associated to the following DTD:

      <!ELEMENT root (box1+)>
      <!ELEMENT box1 (box2)>
      <!ELEMENT box2 (element)>
      <!ELEMENT element (#PCDATA)>
      
    2. The issue comes from the possible multiplication of the instance box1. Discrimination of multiples instances of box1 within the root element becomes necessary. The element name has only a qualifying (typing) status (and associated rules, but this is not relevant here). Constructing a unique access discriminator can be achieved using XML attributes. The source file:

      <?xml version="xml1.0" ?>
      <root>
         <box1 id="1">
            <box2>
               <element>sample1</element>
            </box2>
         </box1>
         <box1 id="2">
            <box2>
               <element attr1="attrvalue1">sample2</element>
            </box2>
         </box1>
      </root>
      

      Note that using the attribute to identify the element is only necessary for elements that are multiple instances of the same element (DTD occurence modifiers * or +). In the above document a key inspired from the XSL matcher will give an acceptable path to the value "sample2":

      root.box1(@id=='2').box2.element.value

      We could even reach a particular value of an XML attribute using the following identifier:

      root.box1(@id=='2').box2.element.attr1

      When no attribute is available to help discriminating the element, the meta-attribute @@ numbering element occurances within a node should be used. As an example:

      root.box1(@@=='2').box2.element.attr1

      intends the attribute "attr1" of the element named "element" in the "box2" box of the third "box1" box in "root", i.e. in the document.

    3. Finally, if files based on the same DTD are opened, variable sets should not overlap. It should be valuable that the identifier contains a range that is source dependant as if we would inject values of the element root.box1(id=2).box2.element.innerText from either of /home/diml/xml/xmlsample.xml and /home/diml/xml/xmlsample2.xml files.
    4.2.2 Identifying the source

    On a theoretic approach, discriminating the source of the data should rely on an URI, and would enjoin us to use a fully specified identifier as:

    /home/diml/xml/xmlsample.xml#root.box1(@id=='2').box2.element.innerText

    or, if the file is a network reachable file:

    http://www.diml.net/xml/xmlsample.xml#root.box1(@id=='2').box2.element.innerText

    Practically, manipulating fully specified references should be avoided, for all advantages of the HTML meta-tagging (i.e. keeping the source flow as readable as possible) would be lost.

    We proposed another way to discriminate data injected from distinct sources but having the same internal structure. The XML to DIML translation will operate a distinct indirection defining a distinct scope for generated variables each time an %xml statement is called (see the scope attribute).

    4.3 General syntax for the XML derivated identifiers

    4.3.1 Expanded form

    Translation output identifiers are available as other DIML variables, according to the following scheme:

    <%%scope_id::XML_tree_identifier%%>
    

    in which :

    scope_id is the scoping identifier as defined by the scope attribute in the %xml statement.
    XML_tree_identifier is the hierarchic access key to the data that copes to the following symbolic syntax:

    XML_tree_identifier ::= [ *[ node_id "." ] XML_tree_identifier ] | node_id
    node_id             ::= element_id ?[ "(" selector ")" ]
    element_id          ::= #NMTOKEN
    selector             ::= #CDATA | [ quotemark #CDATA quotemark ]
    quotemark           ::= '"'
    

    Specifying a selector value needs an explicit selector has been used in the "<%xml" statement or an ordinal default selection was made (implicit @@ selector). Elemtns that have single descenant resolution will not have any explicit selector. When an element has multiple descendance, and no selector has been defined, all the childs will be affected the value of the default ordinal counter @@ as discriminator.

    Note 1: The ordinal selector @@ counts the instances of a specific child element within its father's scope. Its value will alway be 0 for the first child instance. Childs as distinct elements will have independant counters.

    Note 2: The selector name has been hidden in the syntax of the XML element identifer to shorten the expression. Thez consequence is that no more than one selector can be used for a specific XML tree node. This selector is :

    • The explicit selector mentionned in the XML invocation.
    • The ordinal (@@) selector as a default.

    A further release of this documentation will argue on if this specification is too restrictive or not.

    4.3.2 Developped forms examples

    Let consider the following XML file:

    <?xml version="xml1.0" ?>
    <root>
       <box1 id="1">
          <box2>
             du contenu de box2
             <element attrib1="value1" attrib2="value2">sample a</element>
             <element attrib2="value2" attrib1="value2">sample b</element>
          </box2>
       </box1>
       <box1 id="2">
          <box2>
             <element attrib1="value1">sample2</element>
             <souselement>subsample1</element>
             <souselement>subsample2</element>
             <souselement>subsample3</element>
             <souselement>subsample4</element>
          </box2>
       </box1>
    </root>
    

    The expanded access keys below are valid:

    <%%SAMP1::box1("1").box2.element("value1").innerText%%>
    This key points to the string "sample a", as being the textual content of the elenmet [element] selected by the "value1" value of the explicit selector "@attrib1", in the unique element [box2], (unindexed as being unique), in the element [box1] selected with the value "1" of an explict "@id" selector. This assumes the extraction filter mentioned was explicitely define the selectors @id and @attrib1 in the expression:
    <%xml ... select="box1(@id).box2.element(@attrib1).*" ...%>
    <%%SAMP1::box1("1").box2.element("value1").attrib2%%>
    This key points to the string "value2", as being the attribute of [element] selected by the value "value1" of the explicit selector "@attrib1", in the unique element [box2], still unindexed, in the element [box1] selected by the value "1" of the explicit selector "@id". This is based on the same assumptions than above.
    <%%SAMP1::box1("2").box2.souselement(3).innerText%%>
    This key poijnts to the string "subsample4", as being the textual content of the element [souselement] selected by the ordinal selector @@="3", in the unique element [box2], in the element [box1] selected by the value "2" of the explicit selector "@id". The former assumption keeps on.
    4.3.3 Short form

    Short form is usefull when extracting data from document that have been generated by XML capable authoring applications, when document structure and density is much heavier than simple data sheets based on trivial DTDs.

    Complexity of real life documents set limits to usability of the expanded form, because of the deepness of the element tree. Some data may nevertheless be fetched deeply within the document structure. Short form of translated identifiers allow in some way extracting data in this context, without getting DIML flow too inextricable.

    5 The %xml statement

    5.1 Common syntax

    Using variables from an XML flow assumes a previously executed call that describes and processes the XML data, and passes values to a translated DIML variable set.

    This results in a brand new DIML statement which general form is :

    <%xml source="filename" scope="variable_scope" select="XML_selector" alias="alias_prefix" %>
    

    Here is an application for a local 'relatively accessed) XML file :

    <%xml source="../xml/xmlsample.xml" scope="SAMP1" select="box1(@id=~'valid')" %>
    

    5.2 The source attribute

    The source attribute specifies the XML source file. It may contain a valid physical URI depending on the actual operating system conventions. Alternatively (version 3.0 of the DIML processor) this attribute should be associated to a DIML symbol, containing a well-formed XML sequence.

    Examples :

    <%xml source="../xml/sample.xml"   -> relative physical access
    <%xml source="D:/dimlweb/xml/sample.xml"  -> Windows absolute access
    <%xml source="/home/www/dimlweb/xml/sample.xml"  -> Unix/Linux absolute access
    <%xml source=%XML_SEQUENCE%  -> localized XML fragment
    

    5.3 The scope attribute

    The scope attribute defines the DIML variable range the produced variables will be set in. This scope can be any DIML valid name using any char excepting %, [, ], '\n' nor '\r'.

    Note : scoping prefixes having included subscope separation (::) are allowed. Scope names should not be scope (::) terminated (i.e. the DIML engine adds the scoping separator), although the DIML processor is not told to verify this situation.

    Examples :

    <%xml source="../foo.xml" scope="XMLDATA" ...
    <%xml source="../foo.xml" scope="XML::FOO" ...
    

    Variables generated after DIML translation should start respectively with prefixes "XMLDATA::" and "XML::FOO::". Scoping output variable sets is valuable when multiple XML documents are opened sharing the same DTD.

    5.4 The select attribute

    The select attribute is obviously the hardest to explain. It acts as a selector (XSL-like) on the source XML tree, to reduce the number of sybols the translation generates in the DIML variable space. Selection can point to a unique element, a recurrent element at a defined hierarchy level, an element coping a particular DTD define, or which has some precise values for some attributes.

    5.4.1 Common form of the selector

    Selecting an element in an XML flow assumes resolving an access path to that element in the XML name structure. In order to reach the element's value, some tree nodes should be evaluated, in the right order, and be applied a selection filter.

    The common form of the selector is a dotted sequence of element identifiers, denoting an inclusion path from the XML root element. The common form of this matches the following BNF definition (Bakkus-Naur Form):

    XML_tree_identifier ::= [ *[ node_id "." ] XML_tree_identifier ] | node_id
    node_id             ::= element_id ?[ "(" selector ")" ] | '*'
    element_id          ::= #NMTOKEN | '*'
    selector            ::= [ *selector_expr "," selector ] | selector_expr
    selector_expr       ::= var_name [ pattern_operator pattern | comparison_operator operand ]
    var_name            ::= "@" attribute_id
    comparison_operator ::= "=" | "<" | ">" | "<=" | ">=" | "!="
    operand             ::= #CDATA | [ quotemark #CDATA quotemark ] | '*'
    quotemark           ::= '"'
    pattern_operator    ::= "=~" |"!~"
    pattern             ::= #EREGEXP
    

    #EREGEXP is a POSIX 2 extended regular expression.

    At a single node stage, selection uses existing attributes as a node discriminator. This specification does not reclaim any behaviour in case multiple "brothers" elements share the same attribute value set (non integral discrimination). Anyway, this specification assumes the complete signature of the element (element name + attribute set) is a unique discriminator for any node in the XML document.

    Please note some following definitions before continuing reading this specification:

    Selection node
    Part of the selector corresponding to a particular tree level of the XML document.
    Discriminated element
    An XML element matching the selector super-tree (i.e. which selectability has not been disabled at an upper level in the XML tree).
    Fully discriminated element
    When a selection node has a unique solution in the XML document.
    Selected element
    Any element matching the selection node, when there are multiple solution for the selector.
    Discriminator
    Expression used to select elements on an attribute basis.
    Identifier
    The XML element's name.
    Node selector
    The complete expression acting at a single node level to discriminate, select or reject values.
    Ignored element
    An element which is not discriminated nor selected.
    5.4.2 Selection of one single XML value

    Element selection is considered as having a unique solution when the following conditions are matched:

    • First selection node is the XML root element, a unique direct sub-element, a a fully discriminated direct sub-element of the root.
    • All multiple elements in the way are fully discriminated.
    • All node selectors in the way hav an unique solution.
    • The last identifier may be an attribute name or the reserved "innerText" "innerXml" or "innerHtml" keys.
    5.4.3 Selecting an element by its reference

    The selector stopping with an element name, without telling any attribute nor a special meta-attribute within "innerText", "innerHtml" or "innerXml" is called "reference selection int he XML tree". this selector form extracts the XML subtree contained as content of the discriminated element. It is strictly equivalent to applying for the "innerXml" attribute.

    Example :

    to be written...
    
    5.4.4 Extracting XML element attributes

    Should be considered that attribute values of XML element are also containing usefull information. the unique selector should be able to point to attribute values of a discriminated node. Any valid attribute of an XML element will thus be reachable using a selector of the generic following syntax:

    XML_attribute_identifier ::= XML_tree_identifier ] "." attribute_id | reserved_attribute
    reserved_attribute ::= "innerText" | "innerXml" | "innerHtml"
    

    XML_tree_identifier was defined previously in this specification.

    The implicit "innerText", "innerXml" and "innerHtml" are metas.

    5.4.5 The "innerText" virtual attribute

    La sélection du contenu d'un élément XML peut signifier plusieurs choses. Lorsque la source de données est un document linéaire, que le XML signale en terme de qualification d'éléments, il est possible de considérer la séquence de contenu littérale contenue dans cet élément. Par exemple, si l'on considère le fragment XML suivant :

     
    <root>
       <adresse>
          <numero>20</numero>
          <voie>Av du Parc</voie>
          <codeposte><I>95011</I></codeposte>
          <ville>CERGY</ville>
          <pays>FRANCE</pays>
       </adresse>
    </root>
    

    il peut être souhaîtable de récupérer la totalité de l'adresse comme un seul fragment texte. C'est le rôle de l'attribut "innerText" qui détruit tout balisage inférieur au niveau d'élément auquel il est appliqué dans le sélecteur. Par exemple, le sélecteur :

     
    adresse.innerText
    

    construit bien une variable unique contenant la séquence :

    20 Av du Parc 95011 CERGY FRANCE

    On remarque ici que les fins de ligne sont respectés à l'intérieur de l'élément "adresse", de façon à ce que les données textualisées restent identifiables. Globalement, et pour simplifier l'analyse, ce sont tous les caractères d'espacement qui sont conservés par le filtre de textualisation.

    5.4.6 L'attribut virtuel "innerXml"

    L'attribut virtuel "innerXml" permet également d'obtenir une certaine interprétation de l'élément XML sélectionné par le filtre. Ce méta-attribut représente l'ensemble du contenu de l'élément XML sans aucun filtrage.

    5.4.7 L'attribut virtuel "innerHtml"

    L'attribut virtuel "innerHtml" est la dernière forme de représentation du contenu qui admet de conserver d'éventuelles balises HTML lors de l'interprétation de l'élément. En reprenant l'exemple précédant,

     
    adresse.innerHtml
    

    construit une variable unique contenant la séquence

    20
    Av du Parc
    95011
    CERGY
    FRANCE

    5.4.8 L'utilisation du méta '*'

    L'utilisation du méta '*' à la place d'un noeud de l'expression du sélecteur est possible, et indique qu'à ce niveau d'arbre, l'identité de l'élément est quelconque. Si une expression de sélection explicite est définie pour ce noeud (l'expression aura alors la forme "*(@selector operator operand)"), alors la résolution du sélecteur sera appliqué à tous les éléments occupant ce niveau d'arbre, même si l'attribut discriminant ne leur est pas applicable. Si tel est le cas, l'élément ne pourra être sélectionné.

    5.5 L'attribut alias

    L'attribut alias permet de réduire l'identifiant DIML translaté à une forme canonique dans laquelle seules les valeurs des sélecteurs sont mentionnées. L'objectif de cette canonisation est de rendre plus concises les variables d'arrivées et de mieux les intégrer dans le flux DIML.

    La valeur de cet attribut est un préfixe servant à générer le nom de variable. La formation de la variable translatée est différente en fonction du résultat du sélecteur :

    • Si le sélecteur est à résolution unique,

      la valeur d'alias est le nom de la variable générée.

    • Si le sélecteur convoque des éléments multiples, la variable est générée sous forme d'un tableau associatif à dimension n, n étant le nombre de

      6 Exemples

      Soit un fichier XMl source :

      <?xml version="xml1.0" ?>
      <root>
         <box1 id="1">
            <box2>
               du contenu de box2
               <element attrib1="value1" attrib2="value2">sample a</element>
               <element attrib2="value3" attrib1="value2">sample b</element>
            </box2>
         </box1>
         <box1 id="2">
            <box2>
               <element attrib1="value1">sample2</element>
            </box2>
         </box1>
      </root>
      

      6.1 Sélections à résolution unique

      6.1.1 Elément complètement discriminé

      
      <%xml source="fich.xml" 
      select="box1(@id='1').box2.element(@attrib1='value2')" 
      alias="ELM1" %>
      

      <element attrib1="value1" attrib2="value2">sample a</element>

      On note dans cette expression l'affectation de cette valeur à la variable DIML %ELM1%.

      6.1.2 Contenu complètement discriminé

      
      <%xml source="fich.xml" 
      scope="SAMP1" select="box1(@id='1').box2.element(@attrib1='value2').innerText" 
      alias="ELM" %>
      

      sample a

      On note dans cette expression l'affectation de cette valeur à la variable DIML %SAMP1::ELM%.

      6.1.3 Attribut d'élément discriminé

      
      <%xml source="fich.xml" 
      scope="SAMP3" select="box1(@id='1').box2.element(@attrib1='value2').attrib2" 
      alias="ATTR" %>
      

      value 3

      On note dans cette expression l'affectation de cette valeur à la variable DIML %SAMP3::ATTR%. On s'apperçoit par ailleur que l'unicité de la discrimination sur l'attribut "attrib2" n'est due qu'à la sélection par l'attribut "attrib1".

      6.2 Sélections à résultat multiple

      6.2.1 Elément sans discriminant

      
      <%xml source="fich.xml" 
      select="box1.box2.element(@attrib1='value1')" 
      alias="ELM1" %>
      

      ELM1§1§1§value1 : sample a ELM1§2§1§value1 : sample2

      Le manque de discrimination du selecteur vient de l'incapacité de sélectionner de quel élément "box1" il s'agit. Cette indétermination n'est pas levée au niveau de la sélection d'élément, car le selecteur de noeud utilisé laisse passer deux instances d'élément, une dans la première "box1", l'autre dans la deuxième.

      6.2.2 Attribut sur un élément non discriminé

      
      <%xml source="fich.xml" 
      select="box1(@id='1'.box2.element.attrib1" 
      alias="ATTR2" %>
      

      ATTR2§1§1§1 : sample a ATTR2§1§1§2 : sample2

      Dans ce cas, la première indiscrimination est donnée par l'appel de l'élément "element" sans discriminant. L'appel de l'attribut "attrib1" ne lève pas la discrimination.


      All material is copyleft V.G. FREMAUX (EISTI France) 1999 to 2003 except explicitly mentioned