OBSERVATIONS ON XML CORE SERVICES - MSXML 4.0 RTM

By Peter A. Bromberg, Ph.D.

Peter Bromberg  


Microsoft has released MSXML 4.0 RTM (release to manufacturing) of what is now called Microsoft XML Core Services. As with any new technology, there are both plusses and minuses. I'll try to cover both here. First, some of the good things:



New functionality added

This is the RTM (supported, production quality) release of Microsoft® XML Core Services (MSXML) 4.0, formerly called the Microsoft XML Parser. This version has a number of improvements compared to MSXML 3.0:

* Support of the World Wide Web (W3) Consortium final recommendation for XML Schema, with both DOM and SAX.

* Substantially faster XSLT engine. Microsoft claims tests show about x4, and for some scenarios x8, acceleration.

* New and substantially faster SAX parser, which is also available in DOM with the NewParser property [use dom.setProperty(*NewParser,* true)].

* Better support for sequential architectures and streamed XML processing based on SAX 2, including DOM-SAX integration and HTML generation.

* Improved standards conformance and scalability. Specifically, the following old, non-conformant technologies have been removed: old XSL with XSLPattern; uuid namespaces for XDR; the proprietary XmlParser object; and the normalize-line-breaks property in SAX. Corresponding standard technologies (XSLT 1.0, XPath 1.0, and http-based namespaces for XDR and SAX2) have been available since MSXML 3.0.

* True side-by-side functionality, which ensures that MSXML 4.0 can work without any collision with previous or future versions of MSXML. As a result, replace mode is removed completely. XmlInst.exe will not work with this release. Version-independent ProgIDs, such as DOMDocument, are also removed. You'll have to "bite the bullet" and should use DOMDocument.4.0 to get MSXML 4.0 functionality.

* A number of bug fixes.

* The msxml4 RTM cab file, which enables redistribution of MSXML over Internet or intranet.


Because version-independent ProgIDs existed in previous releases, but have been removed from MSXML 4.0, installing this release will make them nonfunctional. To avoid this, run the following two commands from the command line before installing this release.
        regsvr32 /u msxml4.dll
        regsvr32 msxml3.dll
 
This will restore version-inde
pendent ProgIDs to point to MSXML 3.0. It is important that you do this before installing this release.


New Features:
XML Schema Support

The latest version of Microsoft XML Core Services (MSXML 4.0) complies with the World Wide Web Consortium (W3C) 2001 XML Schema Recommendation.
Numerous features in this version provide XML Schema support. You can validate XML against XML schemas in both SAX and DOM using either an external schema cache or xsi:schemaLocation/xsi:noNamespaceSchemaLocation attributes. Although there is no XPath 2.0 yet, MSXML 4.0 provides extension functions, permitted by standards, to support handling XSD types in XPath and XSLT.

MSXML 4.0 also provides a way to reach schema information in validated documents using type discovery in SAX and the Schema Object Model (SOM) in DOM. In addition to this added support for the final XML Schema (XSD) recommendation, MSXML continues to support XML-Data Reduced (XDR) and document type definition (DTD) validations.

Performance Improvements:

MSXML 4.0 provides the new, faster XML parser and a substantially improved XSLT engine. You can use the new parser with DOM by setting the NewParser property to True, e.g. xmlDoc.setProperty("NewParser", true).
The new parser does not yet support asynchronous DOM load or DTD validation. However, everything else functions the same way as with the old parser, only faster. Microsoft's tests of MSXML 4.0 showed about 2x better performance for pure parsing, and more than 4x better performance for XSLT transformation. Other test claims I've seen show up to an 8x better performance. My own tests confirm this, but I would be hesitant to make such claims, although my tests -- mostly because of time constraints -- have been less than "Lab Quality".
Extended Support For Sequential XML Processing
MSXML 4.0 provides extended support for sequential XML processing architectures based on the SAX2 API. This includes:
* Integration between the DOM and SAX parsing models
* Ability to generate HTML output
* Ability to plug the SAX content handler to the output of the XSLT processor
* Tracking of namespace declarations
You can now use the MXXMLWriter object to generate SAX events from a DOM tree. You can also build a DOM tree out of SAX events. This feature allows you to closely integrate DOM and SAX in your applications. For developers who really want to integrate complex XML processing, this opens up a whole new world of efficiencies.

A new object, MXHTMLWriter, enables you to output HTML using a stream of SAX events in much the same way that the <xsl:output> element in XSLT can generate HTML from a result tree. The new MXHTMLWriter object provides support for high-performing Active Server Pages, which can now read XML documents with a SAX reader, put those documents through custom SAX filters, and output the data to the user as an HTML page. The MXHTMLWriter object is also useful for a number of other applications such as the manual generation of HTML pages. You will also find corresponding classes in the .NET platform to do things like this, with fine - grained control over the output.

The XSLT processor can now accept the SAX content handler as output. This means that the chain of SAX filters can directly process the transformed XML. You can use this feature to eliminate XML regeneration and reparsing, allowing XML documents to be consumed immediately by an application when incoming XML documents need to be translated to the same dictionary.

The new MXNamespaceManager object allows you to programatically track namespace declarations and resolve them, either in the current context or in the context of a particular DOM node. Of course MSXML supports namespaces and can automatically resolve names of elements and attributes, but there are many cases in which an attribute's value or an element's content uses qualified names. The MXNamespaceManager object tracks and resolves these qualified names.

Separate WinHTTP Version 5.0 Component

The former functions of the ServerHTTPRequest component are now provided by the separate WinHTTP component. This is a new server-side component (which has been in separate BETA as a stand-alone component for some months, with its own newsgroup on MS) and which provides reliable HTTP stack functionality. Without the WinHTTP component, ServerHTTPRequest and DOM/SAX with server-side mode can not access HTTP-based data. When you install MSXML 4.0 on a computer running NT / 2000 / XP, you automatically get the WinHTTP component.   Windows 98 / Me / 95 can not support WinHTTP. You can still install MSXML 4.0 on Windows 98 or Windows Me, but you will have to use the default DOM/SAX mode, or the XMLHTTPRequest object, which uses WinInet.

The RTM release provides more compact, faster, and more conformant XML processing components to be used in a server-side environment with enterprise-grade systems. MSXML 4.0 can still be used on the client side in a controlled environment where you can ensure installation of the component on client machines, as in cases with Intranet or trusted site environments and applications. Now let's look at a few of the negatives (at least for some of us).

NewParser Property to use new Parser with DOMDocument(Transitional):

The NewParser internal property (flag)- True/False holds a value indicating whether MSXML uses the old or new internal parser when loading DOMDocument objects.

IMPORTANT: If you want to use the new faster parser, you must explicitly set this flag to "true"!

This property is transitional for the period while MSXML provides a choice of two internal parsers. The new parser is faster and more reliable, but it does not yet support asynchronous mode or DTD validation. Once the new parser has been updated to provide for asynchronous mode and DTD validation, this property will always be set to True.

If the newParser property is set to False, which is the current default setting, subsequent DOMDocument objects are loaded using the old parser.
If this property is set to True, subsequent DOMDocument objects are loaded using the new parser.
For example, the following code makes a DOMDocument object use the new parser when loading.
xmldoc.setProperty("NewParser", True );


Side-by-Side Functionality and the Removal of Replace Mode:

XMLInst.exe is Gone!

Until MSXML 3.0, you could use replace mode to make the latest MSXML component simulate MSXML 2.0, which Internet Explorer 5.0 and 5.5 used for presenting XML when browsing. Now replace mode is completely removed from MSXML 4.0 and cannot be used to substitute MSXML 2.0 for Internet Explorer. That means that if Internet Explorer is your default program to open XML files, and you double click on an XML document, Internet Explorer will not use MSXML 4.0 to show it. MSXML 4.0 can still be used in the traditional way to manipulate XML within an HTML page using a script.

Removal of Version-Independent ProgIDs

Version-independent ProgIDs are gone. This provides true side-by-side installation, compared to previous versions in which some ProgIDs were upgraded with the installation of a new version of MSXML. Now CreateObject("MSXML2.DOMDocument"*) will not instantiate the MSXML 4.0 DOM, but a previous version (if it is registered). If you want to use MSXML 4.0, you must use a ProgId like this: CreateObject(*MSXML2.DOMDocument.4.0*). With C++ and Visual Basic you will create "MSXML2.DOMDocument40". Similar changes will be necessary with all other MSXML objects in order to use the MSXML 4.0 version.

The reason for this change is to improve the maintainability of code which otherwise would be error-prone when unexpected changes occur in the environment. Version-independent ProgIDs were great for developers trying MSXML, but proved risky in a production environment. If a user developed code with version-independent ProgIDs, expecting MSXML 3.0 to be in place, and later installed or reinstalled SQL Server, for example, they might find that they were using MSXML 2.6 instead of MSXML 3.0. Removing version-independent ProgIDs in MSXML 4.0 kind of "bites the bullet", eliminating such instability, and improves MSXML as a server-side enterprise-grade component.

Side-by-Side Functionality

The release version of MSXML 4.0 is shipped with the same DLL names (msxml4.dll, msxml4r.dll, and msxml4a.dll) as in preview releases. With version-independent ProgIDs removed, this guarantees that MSXML 4.0 will not interfere with any versions of MSXML (2.0, 2.6, or 3.0) previously installed. If you have code that uses version - independent ProgID's instantiating MSXML 3.0 or 3.0 SP1, the installation of MSXML 4.0 RTM should have no effect whatsoever. Windows XP Side-by-Side installation does this in an even more integrated manner. This means that with Windows XP, you can use the special side-by-side functionality to manage how your applications are using MSXML and which versions (starting from MSXML 4.0) that they are using. You'll create a Windows XP application manifest which will link your application to the specific version of MSXML 4.0.

Important Notes

If you have MSXML 4.0 Previews installed (April or July Technical Preview Release of MSXML 4.0):
Direct upgrade from previews to RTM is still supported. You will have to uninstall preview, and after that install RTM. You might have to manually unregister and remove msxml4*.dll files from your system32 directory. To unregister the MSXML 4.0 preview, run:
regsvr32 /u msxml4.dll
If you have the MSXML 4.0 April Technical Preview Release of MSXML 4.0 installed:
Note that version-independent ProgIDs have been removed from MSXML 4.0 (despite having existed in the April release), so installing this release will make them non-functional. This might seriously affect a number of applications (such as Microsoft Visual Studio® .NET setup) that use MSXML 3.0. To avoid this problem, run the following two commands from the command line and delete msxml4*.dll files from the system32 directory before installing this release.

regsvr32 /u msxml4.dll
regsvr32 msxml3.dll

Note that after unregistration you might have to manually delete msxml4*.dll files from your system32 directory.

Some Final Comments:

MSXML 4.0 RTM represents what I believe is the final stage in the evolution of Microsoft's COM - compliant XML technologies, ex "Dot Net". Developers and organizations who want to be able to upgrade their code base will be well-advised to use global constants application - wide for the instantiation of Version Specific ProgID's. In. this manner an entire application's source code base can be upgraded by simply doing a search and replace of ,for example "strMSXMLVersionNum='.3.0' and changing the 3 to a "4". In addition, it is possible to use the temporary "NewParser" property in code in an "if" test such that: if(strMSXMLVersionNum=".4.0") xmlDoc.setProperty("NewParser", true).

However, conversion may not be that smooth. Developers should be ready to run in to additional problems most of which will revolve around that fact that older, non standards - compliant code in XPATH - type statements, replacement - type variables such as "$any$" and other previously acceptable constructs will no longer work under MSXML 4.0 RTM. It's time to meet the W3C and bring the company's code base up to standards if you wanna play, and unfortunately it may not be a picnic.

Peter Bromberg is an independent consultant specializing in distributed .NET solutionsa Senior Programmer /Analyst at in Orlando and a co-developer of the NullSkull.com developer website. He can be reached at info@eggheadcafe.com