Learning and Using Jakarta Digester

本文介绍了使用Apache Jakarta Commons Digester框架将XML文档转换为Java bean对象的方法。该框架源于Jakarta Struts Web工具包,可根据模式和规则处理XML文档。文中给出示例文档和bean类,还介绍了使用xmlrules包在运行时配置规则,最后对该框架进行了简要总结。

Learning and Using Jakarta Digester
by Philipp K. Janert, Ph.D.
10/23/2002
Turning an XML document into a corresponding hierarchy of Java bean objects is a fairly common task. In a previous article, I

described how to accomplish this using the standard SAX and DOM APIs.

 
Although powerful and flexible, both APIs are, in effect, too low-level for the specific task at hand. Furthermore, the

unmarshalling procedure itself requires a fair amount of coding: a parse-stack must be maintained when using SAX, and the

DOM-tree must be navigated when using DOM.

This is where the Apache Jakarta Commons Digester framework comes in.

The Jakarta Digester Framework
The Jakarta Digester framework grew out of the Jakarta Struts Web toolkit. Originally developed to process the central

struts-config.xml configuration file, it was soon recognized that the framework was more generally useful, and moved to the

Jakarta Commons project, the stated goal of which is to provide a "repository of reusable Java components." The most recent

version, Digester 1.3, was released on August 13, 2002.

The Digester class lets the application programmer specify a set of actions to be performed whenever the parser encounters

certain simple patterns in the XML document. The Digester framework comes with 10 prepackaged "rules," which cover most of

the required tasks when unmarshalling XML (such as creating a bean or setting a bean property), but each user is free to

define and implement his or her own rules, as necessary.

The Example Document and Beans
In this example, we will unmarshall the same XML document that we used in the previous article:

<?xml version="1.0"?>

<catalog library="somewhere">

   <book>
      <author>Author 1</author>
      <title>Title 1</title>
   </book>

   <book>
      <author>Author 2</author>
      <title>His One Book</title>
   </book>

   <magazine>
      <name>Mag Title 1</name>

      <article page="5">
         <headline>Some Headline</headline>
      </article>

      <article page="9">
         <headline>Another Headline</headline>
      </article>
   </magazine>

   <book>
      <author>Author 2</author>
      <title>His Other Book</title>
   </book>

   <magazine>
      <name>Mag Title 2</name>

      <article page="17">
         <headline>Second Headline</headline>
      </article>
   </magazine>

</catalog>
The bean classes are also the same, except for one important change: In the previous article, I had declared these classes to

have package scope -- primarily so that I could define all of them in the same source file! Using the Digester framework,

this is no longer possible; the classes need to be declared as public (as is required for classes conforming to the JavaBeans

specification):

import java.util.Vector;

public class Catalog {
   private Vector books;
   private Vector magazines;

   public Catalog() {
      books = new Vector();
      magazines = new Vector();
   }

   public void addBook( Book rhs ) {
      books.addElement( rhs );
   }
   public void addMagazine( Magazine rhs ) {
      magazines.addElement( rhs );
   }

   public String toString() {
      String newline = System.getProperty( "line.separator" );
      StringBuffer buf = new StringBuffer();

      buf.append( "--- Books ---" ).append( newline );
      for( int i=0; i<books.size(); i++ ){
         buf.append( books.elementAt(i) ).append( newline );
      }

      buf.append( "--- Magazines ---" ).append( newline );
      for( int i=0; i<magazines.size(); i++ ){
         buf.append( magazines.elementAt(i) ).append( newline );
      }

      return buf.toString();
   }
}

--------------------------------------------------------------------------------

public class Book {
   private String author;
   private String title;

   public Book() {}

   public void setAuthor( String rhs ) { author = rhs; }
   public void setTitle(  String rhs ) { title  = rhs; }

   public String toString() {
      return "Book: Author='" + author + "' Title='" + title + "'";
   }
}

--------------------------------------------------------------------------------

import java.util.Vector;

public class Magazine {
   private String name;
   private Vector articles;

   public Magazine() {
      articles = new Vector();
   }

   public void setName( String rhs ) { name = rhs; }

   public void addArticle( Article a ) {
      articles.addElement( a );
   }

   public String toString() {
      StringBuffer buf = new StringBuffer( "Magazine: Name='" + name + "' ");
      for( int i=0; i<articles.size(); i++ ){
         buf.append( articles.elementAt(i).toString() );
      }
      return buf.toString();
   }
}

--------------------------------------------------------------------------------

public class Article {
   private String headline;
   private String page;

   public Article() {}

   public void setHeadline( String rhs ) { headline = rhs; }
   public void setPage(     String rhs ) { page     = rhs; }

   public String toString() {
      return "Article: Headline='" + headline + "' on page='" + page + "' ";
   }
}

 


Specifying Patterns and Rules
The Digester class processes the input XML document based on patterns and rules. The patterns must match XML elements, based

on their name and location in the document tree. The syntax used to describe the matching patterns resembles the XPath match

patterns, a little: the pattern catalog matches the top-level <catalog> element, the pattern catalog/book matches a <book>

element nested directly inside a <catalog> element, but nowhere else in the document, etc.


 

All patterns are absolute: the entire path from the root element on down has to be specified. The only exception are patterns

containing the wildcard operator *: the pattern */name will match a <name> element anywhere in the document. Also note that

there is no need for a special designation for the root element, since all paths are absolute.


Whenever the Digester encounters one of the specified patterns, it performs the actions that have been associated with it. In

this, the Digester framework is of course related to a SAX parser (and in fact, the Digester class implements

org.xml.sax.ContentHandler and maintains the parse stack). All rules to be used with the Digester must extend

org.apache.commons.digester.Rule -- which in itself exposes methods similar to the SAX ContentHandler callbacks: begin() and

end() are called when the opening and closing tags of the matched element are encountered.

The body() method is called for the content nested inside of the matched element, and finally, there is a finish() method,

which is called once processing of the closing tag is complete, to provide a hook to do possible final clean-up chores. Most

application developers will not have to concern themselves with these functions, however, since the standard rules that ship

with the framework are likely to provide all desired functionality.

To unmarshal a document, then, create an instance of the org.apache.commons.digester.Digester class, configure it if

necessary, specify the required patterns and rules, and finally, pass a reference to the XML file to the parse() method. This

is demonstrated in the DigesterDriver class below. (The filename of the input XML document must be specified on the command

line.)

import org.apache.commons.digester.*;

import java.io.*;
import java.util.*;

public class DigesterDriver {

   public static void main( String[] args ) {

      try {
         Digester digester = new Digester();
         digester.setValidating( false );

         digester.addObjectCreate( "catalog", Catalog.class );

         digester.addObjectCreate( "catalog/book", Book.class );
         digester.addBeanPropertySetter( "catalog/book/author", "author" );
         digester.addBeanPropertySetter( "catalog/book/title", "title" );
         digester.addSetNext( "catalog/book", "addBook" );

         digester.addObjectCreate( "catalog/magazine", Magazine.class );
         digester.addBeanPropertySetter( "catalog/magazine/name", "name" );

         digester.addObjectCreate( "catalog/magazine/article", Article.class );
         digester.addSetProperties( "catalog/magazine/article", "page", "page" );
         digester.addBeanPropertySetter( "catalog/magazine/article/headline" );
         digester.addSetNext( "catalog/magazine/article", "addArticle" );

         digester.addSetNext( "catalog/magazine", "addMagazine" );

         File input = new File( args[0] );
         Catalog c = (Catalog)digester.parse( input );

         System.out.println( c.toString() );

      } catch( Exception exc ) {
         exc.printStackTrace();
      }
   }
}
After instantiating the Digester, we specify that it should not validate the XML document against a DTD -- because we did not

define one for our simple Catalog document. Then we specify the patterns and the associated rules: the ObjectCreateRule

creates an instance of the specified class and pushes it onto the parse stack. The SetPropertiesRule sets a bean property to

the value of an XML attribute of the current element -- the first argument to the rule is the name of the attribute, the

second, the name of the property.

Whereas SetPropertiesRule takes the value from an attribute, BeanPropertySetterRule takes the value from the raw character

data nested inside of the current element. It is not necessary to specify the name of the property to set when using

BeanPropertySetterRule: it defaults to the name of the current XML element. In the example above, this default is being used

in the rule definition matching the catalog/magazine/article/headline pattern. Finally, the SetNextRule pops the object on

top of the parse stack and passes it to the named method on the object below it -- it is commonly used to insert a finished

bean into its parent.

Note that it is possible to register several rules for the same pattern. If this occurs, the rules are executed in the order

in which they are added to the Digester -- for instance, to deal with the <article> element, found at

catalog/magazine/article, we first create the appropriate article bean, then set the page property, and finally pop the

completed article bean and insert it into its magazine parent.

Invoking Arbitrary Functions
It is not only possible to set bean properties, but to invoke arbitrary methods on objects in the stack. This is accomplished

using the CallMethodRule to specify the method name and, optionally, the number and type of arguments passed to it.

Subsequent specifications of the CallParamRule define the parameter values to be passed to the invoked functions. The values

can be taken either from named attributes of the current XML element, or from the raw character data contained by the current

element. For instance, rather than using the BeanPropertySetterRule in the DigesterDriver implementation above, we could have

achieved the same effect by calling the property setter explicitly, and passing the data as parameter:

   digester.addCallMethod( "catalog/book/author", "setAuthor", 1 );
   digester.addCallParam( "catalog/book/author", 0 );
The first line gives the name of the method to call (setAuthor()), and the expected number of parameters (1). The second line

says to take the value of the function parameter from the character data contained in the <author> element and pass it as

first element in the array of arguments (i.e., the array element with index 0). Had we also specified an attribute name

(e.g., digester.addCallParam( "catalog/book/author", 0, "author" );), the value would have been taken from the respective

attribute of the current element instead.

One important caveat: confusingly, digester.addCallMethod( "pattern", "methodName", 0 ); does not specify a call to a method

taking no arguments -- instead, it specifies a call to a method taking one argument, the value of which is taken from the

character data of the current XML element! We therefore have yet another way to implement a replacement for

BeanPropertySetterRule:

   digester.addCallMethod( "catalog/book/author", "setAuthor", 0 );

To call a method that truly takes no parameters, use digester.addCallMethod( "pattern", "methodName" );.

Summary of Standard Rules


 

Below are brief descriptions of all of the standard rules.

Creational
ObjectCreateRule: Creates an object of the specified class using its default constructor and pushes it onto the stack; it is

popped when the element completes. The class to instantiate can be given through a class object or the fully-qualified class

name.

FactoryCreateRule: Creates an object using a specified factory class and pushes it onto the stack. This can be useful for

classes that do not provide a default constructor. The factory class must implement the

org.apache.commons.digester.ObjectCreationFactory interface.

Property Setters
SetPropertiesRule: Sets one or several named properties in the top-level bean using the values of named XML element

attributes. Attribute names and property names are passed to this rule in String[] arrays. (Typically used to handle XML

constructs like <article page="10">.)

BeanPropertySetterRule: Sets a named property on the top-level bean to the character data enclosed by the current XML

element. (Example: <page>10</page>.)

SetPropertyRule: Sets a property on the top-level bean. Both the property name, as well as the value to which this property

will be set, are given as attributes to the current XML element. (Example: <article key="page" value="10" />.)

Parent/Child Management
SetNextRule: Pops the object on top of the stack and passes it to a named method on the object immediately below. Typically

used to insert a completed bean into its parent.

SetTopRule: Passes the second-to-top object on the stack to the top-level object. This is useful if the child object exposes

a setParent method, rather than the other way around.

SetRootRule: Calls a method on the object at the bottom of the stack, passing the object on top of the stack as argument.

Arbitrary Method Calls
CallMethodRule: Calls an arbitrary named method on the top-level bean. The method may take an arbitrary set of parameters.

The values of the parameters are given by subsequent applications of the CallParamRule.

CallParamRule: Represents the value of a method parameter. The value of the parameter is either taken from a named XML

element attribute, or from the raw character data enclosed by the current element. This rule requires that its position on

the parameter list is specified by an integer index.

Specifying Rules in XML: Using the xmlrules Package

So far, we have specified the patterns and rules programmatically at compile time. While conceptually simple and

straightforward, this feels a bit odd: the entire framework is about recognizing and handling structure and data at run time,

but here we go fixing the behavior at compile time! Large numbers of fixed strings in source code typically indicate that

something is being configured (rather than programmed), which could be (and probably should be) done at run time instead.

The org.apache.commons.digester.xmlrules package addresses this issue. It provides the DigesterLoader class, which reads the

pattern/rule-pairs from an XML document and returns a digester already configured accordingly. The XML document configuring

the Digester must comply with the digester-rules.dtd, which is part of the xmlrules package.

Below is the contents of the configuration file (named rules.xml) for the example application. I want to point out several

things here.

Patterns can be specified in two different ways: either as attributes to each XML element representing a rule, or using the

<pattern> element. The pattern defined by the latter is valid for all contained rule elements. Both ways can be mixed, and

<pattern> elements can be nested -- in either case, the pattern defined by the child element is appended to the pattern

defined in the enclosing <pattern> element.

The <alias> element is used with the <set-properties-rule> to map an XML attribute to a bean property.

Finally, using the current release of the Digester package, it is not possible to specify the BeanPropertySetterRule in the

configuration file. Instead, we are using the CallMethodRule to achieve the same effect, as explained above.

<?xml version="1.0"?>

<digester-rules>
   <object-create-rule pattern="catalog" classname="Catalog" />
   <set-properties-rule pattern="catalog" >
      <alias attr-name="library" prop-name="library" />
   </set-properties-rule>

   <pattern value="catalog/book">
      <object-create-rule classname="Book" />
      <call-method-rule pattern="author" methodname="setAuthor"
                 paramcount="0" />
      <call-method-rule pattern="title" methodname="setTitle"
                 paramcount="0" />
      <set-next-rule methodname="addBook" />
   </pattern>

   <pattern value="catalog/magazine">
      <object-create-rule classname="Magazine" />

      <call-method-rule pattern="name" methodname="setName" paramcount="0" />

      <pattern value="article">
         <object-create-rule classname="Article" />
         <set-properties-rule>
            <alias attr-name="page" prop-name="page" />
         </set-properties-rule>   
         <call-method-rule pattern="headline" methodname="setHeadline"
             paramcount="0" />
         <set-next-rule methodname="addArticle" />
      </pattern>

      <set-next-rule methodname="addMagazine" />
   </pattern>
</digester-rules>
Since all the actual work has now been delegated to the Digester and DigesterLoader classes, the driver class itself becomes

trivially simple. To run it, specify the catalog document as the first command line argument, and the rules.xml file as the

second. (Confusingly, the DigesterLoader will not read the rules.xml file from a File or an org.xml.sax.InputSource, but

requires a URL -- the File reference in the code below is therefore transformed into an equivalent URL.)

import org.apache.commons.digester.*;
import org.apache.commons.digester.xmlrules.*;

import java.io.*;
import java.util.*;

public class XmlRulesDriver {
   public static void main( String[] args ) {
      try {

         File input = new File( args[0] );
         File rules = new File( args[1] );

         Digester digester = DigesterLoader.createDigester( rules.toURL() );

         Catalog catalog = (Catalog)digester.parse( input );
         System.out.println( catalog.toString() );
 
      } catch( Exception exc ) {
         exc.printStackTrace();
      }
   }
}
Conclusion
This concludes our brief overview of the Jakarta Commons Digester package. Of course, there is more. One topic ignored in

this introduction are XML namespaces: Digester allows you to specify rules that only act on elements defined within a certain

namespace.

We mentioned briefly the possibility of developing custom rules, by extending the Rule class. The Digester class exposes the

customary push(), peek(), and pop() methods, giving the individual developer freedom to manipulate the parse stack directly.

Lastly, note that there is an additional package providing a Digester implementation which deals with RSS (Rich-Site-

Summary)-formatted newsfeeds. The Javadoc tells the full story.

References
Jakarta Commons Digester Homepage
"Simple XML Parsing with SAX and DOM"
Jakarta Struts Homepage
"Java & XML Data Binding" -- XML Data Binding addresses the general problem of making XML data available in applications.
Programming Jakarta Struts -- the upcoming book on Jakarta Struts at O'Reilly.
Philipp K. Janert, Ph.D. is a Software Project Consultant, server programmer, and architect.

 

内容概要:本文系统研究了电力系统短期负荷预测问题,提出并实现了基于极限学习机(ELM)及其智能优化改进模型的预测方法。研究涵盖标准ELM、白鲸优化算法(BWO)优化ELM和鹭鹰优化算法(IBOA)优化ELM三种模型,重点通过智能优化算法对ELM的输入权重与偏置参数进行全局寻优,有效克服了传统ELM因参数随机初始化导致的不稳定性和泛化能力不足的问题。文章完整呈现了从数据预处理、特征选择、模型构建、参数优化到预测结果对比分析的全流程,利用Matlab编程实现各模型的仿真验证,显著提升了预测精度与模型鲁棒性,为电力系统调度决策提供了可靠的技术支撑。; 适合人群:具备电力系统基础知识、时间序列预测理论及Matlab编程能力的高校研究生、科研机构研究人员以及电力公司从事负荷预测、电网调度与规划工作的技术人员。; 使用场景及目标:①应用于实际电力系统短期负荷预测业务中,提升电网运行调度的精细化与智能化水平;②作为智能优化算法与神经网络融合的经典案例,服务于学术论文撰写、科研项目申报及算法性能对比研究;③应对新能源大规模接入背景下负荷波动加剧的挑战,为构建高精度、强鲁棒性的现代负荷预测体系提供解决方案。; 阅读建议:建议读者结合所提供的Matlab代码进行动手实践,深入理解ELM网络结构与优化算法的集成机制,重点对比分析不同优化策略在收敛速度、预测误差(如MAE、RMSE、MAPE)等方面的性能差异,进而掌握智能优化技术在提升预测模型性能方面的关键作用。
内容概要:本文研究了基于Benders分解与输电网运营商(TSO)和配电网运营商(DSO)协调机制的不确定环境下输配电网双层优化模型,旨在提升高比例可再生能源接入背景下电网系统的协调性与鲁棒性。模型上层以系统整体经济性为目标进行优化调度,下层采用Benders分解实现TSO与DSO之间的信息交互与协同决策,通过引入割平面迭代机制保障求解的收敛性与全局最优性。研究充分考虑新能源出力与负荷需求的不确定性,构建了具有强适应性的双层优化框架,并基于Matlab完成了模型的编程实现与仿真验证,有效解决了多主体、多层级、多不确定性因素耦合下的电力系统优化调度难题。; 适合人群:具备电力系统分析、运筹学与优化理论基础,熟悉Matlab编程环境,从事智能电网、能源互联网、分布式能源集成、电力市场等方向的研究生、科研人员及工程技术人员。; 使用场景及目标:①研究高渗透率可再生能源条件下输配电网协同优化调度策略;②掌握Benders分解在电力系统双层优化建模中的应用方法与实现技巧;③构建TSO-DSO多主体协调机制,实现跨层级电网资源的高效互动与决策解耦;④提升对不确定性建模、分解算法设计及大规模优化问题求解能力。; 阅读建议:建议读者结合Matlab代码逐模块剖析模型构建流程,重点理解Benders割的生成逻辑、主从问题的信息传递机制及收敛判据设定,推荐在标准IEEE测试系统上复现实验以深入掌握模型特性与算法性能。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值