jakarta-bcel-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jvan...@apache.org
Subject cvs commit: jakarta-bcel/xdocs/stylesheets project.xml
Date Fri, 09 Nov 2001 23:25:10 GMT
jvanzyl     01/11/09 15:25:10

  Modified:    xdocs/stylesheets project.xml
  Added:       xdocs    manual.xml
  Log:
  - manual converted to anakia format, some further formatting is required
    and the eps images still need to be converted.
  
  Revision  Changes    Path
  1.1                  jakarta-bcel/xdocs/manual.xml
  
  Index: manual.xml
  ===================================================================
  <?xml version="1.0"?>
  <document>
  
    <properties>
      <author email="markus.dahm@inf.fu-berlin.de">Markus Dahm</author>
      <title>Byte Code Engineering Library (BCEL) v1</title>
    </properties>
  
    <body>
  
    <section name="Abstract">
    <p>
      Extensions  and improvements of the  programming language Java and its
      related  execution environment (Java   Virtual Machine,  JVM)  are the
      subject  of a large number  of research  projects and proposals. There
      are  projects, for  instance, to add   parameterized types to Java, to
      implement ``Aspect-Oriented Programming'', to perform sophisticated
      static analysis, and to improve the run-time performance.
    </p>
  
    <p>
      Since  Java classes  are  compiled into  portable  binary class  files
      (called <em>byte code</em>), it is   the   most  convenient   and
      platform-independent  way  to  implement  these  improvements  not  by
      writing a  new compiler or changing  the JVM, but  by transforming the
      byte  code.   These  transformations  can either  be  performed  after
      compile-time,  or at load-time.   Many programmers  are doing  this by
      implementing their own specialized byte code manipulation tools, which
      are, however, restricted in the range of their re-usability.
    </p>
  
    <p>
      To deal with the necessary class file transformations, we introduce an
      API that helps developers   to  conveniently   implement   their
      transformations.
    </p>
    </section>
  
    <section name="Introduction">
    <p>
      The  Java language  [] has  become very  popular  and many
      research projects  deal with further  improvements of the  language or
      its run-time behavior.  The possibility  to extend a language with new
      concepts  is surely  a  desirable feature,  but implementation  issues
      should be hidden from the user.  Fortunately, the concepts of the Java Virtual Machine 
      permit  the user-transparent  implementation of  such  extensions with
      relatively little effort.
    </p>
  
    <p>
      Because the target language of  Java is an interpreted language with a
      small  and  easy-to-understand  set  of instructions  (the  <em>byte
      code</em>), developers  can implement  and test their  concepts in  a very
      elegant way.   One can  write a plug-in  replacement for  the system's
      class loader which is  responsible for dynamically loading class files
      at  run-time  and  passing the  byte  code  to  the Virtual Machine (see  section
      ).  Class loaders may  thus be used to intercept
      the  loading process and  transform classes  before they  get actually
      executed  by the  JVM  [].  While  the original  class
      files always remain unaltered, the behavior of the class loader may be
      reconfigured for every execution or instrumented dynamically.
    </p>
    
    <p>
      The <font face="helvetica">BCEL </font>API (Byte Code Engineering Library), formerly known as
      JavaClass, is a toolkit for the static analysis and dynamic creation
      or transformation of Java class files.  It enables developers to
      implement the desired features on a high level of abstraction without
      handling all the internal details of the Java class file format and
      thus re-inventing the wheel every time.  <font face="helvetica">BCEL </font>is written entirely in
      Java and freely available under the terms of the Apache Software
      License.
    </p>
  
    <p>
      This paper is structured as follows:  We give a  brief description of
      the Java Virtual Machine and the class  file format in section .  Section
      introduces  the <font face="helvetica">BCEL </font>API.   Section 
      describes  some typical  application areas  and example  projects. The
      appendix contains  code examples that are  to long to  be presented in
      the  main  part  of this  paper.  All  examples  are included  in  the
      down-loadable distribution.
    </p>
  
    </section>
  
    <section name="Related work">
    <p>
      There are  a number  of proposals and  class libraries that  have some
      similarities with  BCEL: The JOIE [] toolkit can
      be used to instrument class loaders with dynamic behavior.  Similarly,
      ``Binary  Component Adaptation''  [] allows  components  to be
      adapted and  evolved on-the-fly.  Han  Lee's ``Byte-code Instrumenting
      Tool'' [] allows the user  to insert calls to analysis methods
      anywhere in the  byte code.  The Jasmin language  [] can be
      used  to   hand-write  or  generate   pseudo-assembler  code.   D-Java
      [] and JCF [] are class viewing tools.
    </p>
  
    <p>
      In contrast to these projects, <font face="helvetica">BCEL </font>is intended 
      to be a general purpose
      tool  for ``byte  code engineering''.   It gives  full control  to the
      developer on a high level of  abstraction and is not restricted to any
      particular application area.
    </p>
    </section>
  
    <section name="The Java Virtual Machine">
    <p>
      Readers already familiar with the Java Virtual Machine and the Java class file format
      may want to skip this section and proceed with section .
    </p>
  
    <p>
      Programs written  in the  Java language are  compiled into  a portable
      binary format called <em>byte  code</em>.  Every class is represented by
      a  single class  file  containing  class related  data  and byte  code
      instructions. These  files are loaded dynamically  into an interpreter
      (Java Virtual Machine, JVM) and executed.
    </p>
  
    <p>
      Figure    illustrates  the  procedure  of  compiling  and
      executing a Java class:  The source file (<tt>HelloWorld.java</tt>) is
      compiled into a Java class file (<tt>HelloWorld.class</tt>), loaded by
      the  byte  code  interpreter  and  executed.  In  order  to  implement
      additional  features, researchers  may want  to transform  class files
      (drawn  with  bold lines)  before  they  get  actually executed.  This
      application area is one of the main issues of this article.
    </p>
    
    <p>
    <img src="images/jvm.gif"/>
    <br/>
    Figure 1: Compilation and execution of Java classes
    </p>
      
    <p>
      Note that the  use of the general term  ``Java'' implies two meanings:
      on the one hand, Java as a programming language is meant, on the other
      hand, the Java  Virtual Machine, which is not  necessarily targeted by
      the Java language  exclusively, but may be used  by other languages as
      well (e.g.   Eiffel [], or Ada []).   We assume the
      reader to  be familiar with  the Java language  and to have  a general
      understanding of the Virtual Machine.
    </p>
  
    </section>
  
    <section name="2.1 Java class file format">
    <p>
      Giving a  full overview of  the design issues  of the Java  class file
      format and the  associated byte code instructions is  beyond the scope
      of this paper.   We will just give a  brief introduction covering the
      details  that  are  necessary  for  understanding  the  rest  of  this
      paper. The format of class files and the byte code instruction set are
      described  in more  detail  in the  ``Java Virtual Machine Specification''  []
      ,and  in [].   Especially,  we  will  not  deal  with  the  security
      constraints that the Java Virtual Machine has to check at run-time, i.e. the byte code
      verifier.
    </p>
  
    <p>
      Figure  shows a  simplified example of the contents
      of a  Java class file:  It starts with  a header containing  a ``magic
      number'' (<tt>0xCAFEBABE</tt>) and the version number, followed by the
      <em>constant pool</em>, which can be roughly thought of as the text segment of an
      executable, the  <em>access rights</em>  of the class  encoded by  a bit
      mask, a list of interfaces  implemented by the class, lists containing
      the  fields and  methods of  the  class, and  finally the  <em>class
      attributes</em>, e.g.  the  <tt>SourceFile</tt> attribute telling the name
      of  the source  file.  Attributes  are  a way  of putting  additional,
      e.g. user-defined,  information into class file  data structures.  For
      example, a  custom class  loader may evaluate  such attribute  data in
      order to perform its  transformations.  The JVM specification declares
      that unknown, i.e.  user-defined attributes must be ignored by any Virtual Machine 
      implementation.
    </p>
  
    <p>
    <img src="images/classfile.gif"/>
    <br/>
    Figure 2: Java class file format
    </p>    
  
    <p>
      Because  all of  the  information needed  to  dynamically resolve  the
      symbolic  references to  classes, fields  and methods  at  run-time is
      coded  with string  constants, the  constant pool contains  in fact  the largest
      portion of an average class file, approximately 60% [].
      The byte code instructions themselves just make up 12%.
    </p>
    
    <p>
      The right upper box shows a  ``zoomed'' excerpt of the constant pool, while the
      rounded box below depicts  some instructions that are contained within
      a  method  of the  example  class.  These  instructions represent  the
      straightforward translation of the well-known statement:
    </p>
    
    <source>
    System.out.println("Hello, world");
    </source>
  
    <p>
      The first instruction loads the  contents of the field <tt>out</tt> of
      class  <tt>java.lang.System</tt> onto  the operand  stack. This  is an
      instance of  the class <tt>java.io.PrintStream</tt>.  The <tt>ldc</tt>
      (``Load constant'') pushes a reference  to the string "Hello world" on
      the  stack.    The  next  instruction  invokes   the  instance  method
      <tt>println</tt>  which  takes  both  values as  parameters  (Instance
      methods always  implicitly take an  instance reference as  their first
      argument).
    </p>
    
    <p>
      Instructions,  other  data  structures   within  the  class  file  and
      constants  themselves  may  refer  to  constants  in  the  constant pool.  Such
      references are implemented via fixed indexes encoded directly into the
      instructions.   This  is illustrated  for  some  items  of the  figure
      emphasized    with   a    surrounding   box.
    </p>
    
    <p>
      For  example,  the  <tt>invokevirtual</tt>  instruction  refers  to  a
      <tt>MethodRef</tt> constant  that contains information  about the name
      of the  called method, the  signature (i.e.  the encoded  argument and
      return types),  and to  which class the  method belongs.  In  fact, as
      emphasized by the boxed  value, the <tt>MethodRef</tt> constant itself
      just refers to other entries holding the real data, e.g.  it refers to
      a <tt>ConstantClass</tt> entry containing  a symbolic reference to the
      class <tt>java.io.PrintStream</tt>.   To keep the  class file compact,
      such  constants  are   typically  shared  by  different  instructions.
      Similarly, a field is represented by a <tt>Fieldref</tt> constant that
      includes information about the name, the type and the containing class
      of the field.
    </p>
  
    <p>
      The constant pool basically holds the following types  of constants: References
      to methods, fields and  classes, strings, integers, floats, longs, and
      doubles.
    </p>
    
    </section>
    
    <section name="2.2 Byte code instruction set">
    <p>
      The JVM  is a  stack-oriented interpreter that  creates a  local stack
      frame of fixed size for every method invocation. The size of the local
      stack has to  be computed by the compiler.  Values  may also be stored
      intermediately in a frame area containing <em>local variables</em> which
      can  be used  like  a set  of  registers.  These  local variables  are
      numbered from 0  to 65535, i.e.  you have a maximum  of 65536 of local
      variables.   The  stack  frames   of  caller  and  callee  method  are
      overlapping, i.e.  the caller  pushes arguments onto the operand stack
      and the called method receives them in local variables.
    </p>
    
    <p>
      The byte code instruction  set currently consists of 212 instructions,
      44  opcodes  are  marked  as  reserved  and may  be  used  for  future
      extensions   or   intermediate   optimizations  within   the   Virtual
      Machine. The instruction set can be roughly grouped as follows:
    </p>
    
    <p>
      <b>Stack operations:</b>
       Constants can be pushed onto the stack either
      by loading them from the constant pool with the <tt>ldc</tt> instruction or with
      special ``short-cut''  instructions where the operand  is encoded into
      the  instructions, e.g.   <tt>iconst_0</tt> or  <tt>bipush</tt> (push
      byte value).
    </p>
    
    <p>
      <b>Arithmetic  operations:</b>
         The  instruction  set   of  the  Java Virtual Machine 
      distinguishes  its  operand  types  using  different  instructions  to
      operate on  values of  specific type.  Arithmetic  operations starting
      with  <tt>i</tt>, for  example,  denote  an  integer  operation.  E.g.,
      <tt>iadd</tt> that adds two integers and pushes the result back on the
      stack.     The    Java    types    <tt>boolean</tt>,    <tt>byte</tt>,
      <tt>short</tt>, and <tt>char</tt> are handled as integers by the JVM.
    </p>
      
    <p>
      <b>Control flow:</b>
       There  are branch instructions like <tt>goto</tt>
      and   <tt>if_icmpeq</tt>,    which   compares   two    integers   for
      equality.  There  is  also   a  <tt>jsr</tt>  (jump  sub-routine)  and
      <tt>ret</tt> pair of instructions that  is  used to implement the
      <tt>finally</tt> clause of  <tt>try-catch</tt> blocks.  Exceptions may
      be thrown with the <tt>athrow</tt> instruction.
      Branch  targets  are coded  as  offsets  from  the current  byte  code
      position, i.e. with an integer number.
    </p>
    
    <p>
      <b>Load   and   store operations</b>
          for   local   variables   like
      <tt>iload</tt> and  <tt>istore</tt>.  There are also  array operations
      like <tt>iastore</tt> which stores an integer value into an array.
    </p>
    
    <p>
      <b>Field access:</b>
       The  value of an instance field  may be retrieved
      with <tt>getfield</tt> and written with <tt>putfield</tt>.  For static
      fields,   there    are   <tt>getstatic</tt>   and   <tt>putstatic</tt>
      counterparts.
    </p>
    
    <p>
      <b>Method  invocation:</b>
       Methods  may  either be  called via  static
      references with <tt>invokestatic</tt> or be bound virtually with the
      <tt>invokevirtual</tt>  instruction. Super  class methods  and private
      methods are invoked with <tt>invokespecial</tt>.
    </p>
      
    <p>
      <b>Object  allocation:</b>
        Class  instances  are allocated  with  the
      <tt>new</tt>  instruction, arrays  of basic  type  like <tt>int[]</tt>
      with <tt>newarray</tt>, arrays  of references like <tt>String[][]</tt>
      with <tt>anewarray</tt> or <tt>multianewarray</tt>.
    </p>
    
    <p>
      <b>Conversion and type checking:</b>
        For stack operands of basic type
      there  exist casting  operations  like <tt>f2i</tt>  which converts  a
      float  value into  an integer.   The validity  of a  type cast  may be
      checked with  <tt>checkcast</tt> and the  <tt>instanceof</tt> operator
      can be directly mapped to the equally named instruction.
    </p>
  
    <p>
      Most  instructions  have a  fixed  length,  but  there are  also  some
      variable-length instructions: In particular, the <tt>lookupswitch</tt>
      and  <tt>tableswitch</tt> instructions,  which are  used  to implement
      <tt>switch()</tt>  statements.   Since  the  number  of  <tt>case</tt>
      clauses  may vary,  these instructions  contain a  variable  number of
      statements.
    </p>
  
    <p>
      We  will not list  all byte  code instructions  here, since  these are
      explained in  detail in the  JVM specification.  The opcode  names are
      mostly self-explaining,  so understanding the  following code examples
      should be fairly intuitive.
    </p>
  
    </section>
  
    <section name="2.3 Method code">
    <p>
      Non-abstract methods  contain an attribute  (<tt>Code</tt>) that holds
      the following data: The maximum  size of the method's stack frame, the
      number   of   local   variables    and   an   array   of   byte   code
      instructions. Optionally,  it may  also contain information  about the
      names of local variables and source file line numbers that can be used
      by a debugger.
    </p>
    
    <p>
      Whenever  an exception is thrown, the  JVM performs exception handling
      by looking   into a  table  of exception  handlers.   The table  marks
      handlers, i.e.  pieces  of code, to  be responsible for  exceptions of
      certain types  that  are raised   within a  given  area  of  the  byte
      code. When there is no appropriate handler the exception is propagated
      back to the caller of the method. The handler information is itself
      stored in an attribute contained within the <tt>Code</tt> attribute.
    </p>
    
    </section>
    
    <section name="2.4 Byte code offsets">
    <p>
      Targets  of  branch instructions  like  <tt>goto</tt>  are encoded  as
      relative offsets  in the array  of byte codes. Exception  handlers and
      local variables refer to absolute addresses within the byte code.  The
      former  contains  references   to  the  start  and  the   end  of  the
      <tt>try</tt> block,  and to the instruction handler  code.  The latter
      marks the  range in which a  local variable is valid,  i.e. its scope.
      This makes it  difficult to insert or delete code  areas on this level
      of abstraction, since one has  to recompute the offsets every time and
      update the referring objects. We will see in section 
      how <font face="helvetica">BCEL </font>remedies this restriction.
    </p>
  
    </section>
  
  
    <section name="2.5 Type information">
    <p>
      Java is  a type-safe language and  the information about  the types of
      fields,    local    variables,    and    methods    is    stored    in
      <em>signatures</em>. These are strings stored  in the constant pool and encoded in
      a special  format.  For example the  argument and return  types of the
      <tt>main</tt> method
    </p>
  
    <source>
    public static void main(String[] argv)
    </source>
  
    <p>
    are represented by the signature
    </p>
  
    <source>
    ([java/lang/String;)V
    </source>
  
    <p>
      Classes  and  arrays  are   internally  represented  by  strings  like
      <tt>"java/lang/String"</tt>,  basic types  like  <tt>float</tt> by  an
      integer number. Within signatures they are represented by single
      characters, e.g., <tt>&#207;"</tt>, for integer.
    </p>
  
    </section>
  
    <section name="2.6 Code example">
    <p>
      The  following example  program prompts  for a  number and  prints the
      faculty  of  it.  The  <tt>readLine()</tt>  method  reading  from  the
      standard input  may raise an <tt>IOException</tt> and  if a misspelled
      number    is    passed   to    <tt>parseInt()</tt>    it   throws    a
      <tt>NumberFormatException</tt>. Thus, the critical area of code must be
      encapsulated in a <tt>try-catch</tt> block.
    </p>
    
    <source>  
      import java.io.*;
      public class Faculty {
      private static BufferedReader in = new BufferedReader(new
                                  InputStreamReader(System.in));
      public static final int fac(int n) {
          return (n == 0)? 1 : n * fac(n - 1);
      }
      public static final int readInt() {
          int n = 4711;
          try {
          System.out.print("Please enter a number&#62; ");
          n = Integer.parseInt(in.readLine());
          } catch(IOException e1) { System.err.println(e1); }
          catch(NumberFormatException e2) { System.err.println(e2); }
          return n;
      }
      public static void main(String[] argv) {
          int n = readInt();
          System.out.println("Faculty of " + n + " is " + fac(n));
      }}
    </source>
  
    <p>
      This code example  typically compiles to the following  chunks of byte
      code:
    </p>
    
    <source>
      0:  iload_0
      1:  ifne            #8
      4:  iconst_1
      5:  goto            #16
      8:  iload_0
      9:  iload_0
      10: iconst_1
      11: isub
      12: invokestatic    Faculty.fac (I)I (12)
      15: imul
      16: ireturn
  
      LocalVariable(start_pc = 0, length = 16, index = 0:int n)
    </source>
  
    <p>
      The  method <tt>fac</tt>  has only  one local  variable,  the argument
      <tt>n</tt>, stored in  slot 0.  This variable's scope  ranges from the
      start of  the byte  code sequence to  the very  end.  If the  value of
      <tt>n</tt> (stored  in local variable  0, i.e. the value  fetched with
      <tt>iload_0</tt>) is  not equal  to 0, the  <tt>ifne</tt> instruction
      branches to  the byte code at offset  8, otherwise a 1  is pushed onto
      the operand stack  and the control flow branches  to the final return.
      For ease of reading, the offsets of the branch instructions, which are
      actually   relative,  are displayed  as  absolute  addresses in  these
      examples.
    </p>
    
    <p>
      If  recursion has to  continue, the  arguments for  the multiplication
      (<tt>n</tt>  and <tt>fac(n -  1)</tt>) are  evaluated and  the results
      pushed onto the operand stack.  After the multiplication operation has
      been performed the function returns the computed value from the top of
      the stack.
    </p>
  
    <source>
      0:  sipush        4711
      3:  istore_0
      4:  getstatic     java.lang.System.out Ljava/io/PrintStream;
      7:  ldc           "Please enter a number&#62; "
      9:  invokevirtual java.io.PrintStream.print (Ljava/lang/String;)V
      12: getstatic     Faculty.in Ljava/io/BufferedReader;
      15: invokevirtual java.io.BufferedReader.readLine ()Ljava/lang/String;
      18: invokestatic  java.lang.Integer.parseInt (Ljava/lang/String;)I
      21: istore_0
      22: goto          #44
      25: astore_1
      26: getstatic     java.lang.System.err Ljava/io/PrintStream;
      29: aload_1
      30: invokevirtual java.io.PrintStream.println (Ljava/lang/Object;)V
      33: goto          #44
      36: astore_1
      37: getstatic     java.lang.System.err Ljava/io/PrintStream;
      40: aload_1
      41: invokevirtual java.io.PrintStream.println (Ljava/lang/Object;)V 
      44: iload_0
      45: ireturn
  
      Exception handler(s) = 
      From    To      Handler Type
      4       22      25      java.io.IOException(6)
      4       22      36      NumberFormatException(10)
  
    </source>
    
    <p>
      First the local variable <tt>n</tt>  (in slot 0) is initialized to the
      value  4711.   The  next  instruction, <tt>getstatic</tt>,  loads  the
      static  <tt>System.out</tt> field onto  the stack.   Then a  string is
      loaded and  printed, a number   read from the  standard input and
      assigned to <tt>n</tt>.
    </p>
  
    <p>
      If    one   of   the    called   methods    (<tt>readLine()</tt>   and
      <tt>parseInt()</tt>) throws  an exception, the  Java Virtual Machine calls one  of the
      declared exception  handlers, depending on the type  of the exception.
      The <tt>try</tt>-clause  itself does not  produce any code,  it merely
      defines the range in which  the following handlers are active.  In the
      example  the specified  source  code area  maps  to a  byte code  area
      ranging from offset 4 (inclusive)  to 22 (exclusive).  If no exception
      has   occurred   (``normal''   execution   flow)   the   <tt>goto</tt>
      instructions  branch behind  the  handler code.   There  the value  of
      <tt>n</tt> is loaded and returned.
    </p>
  
    <p>
      For  example the handler   for <tt>java.io.IOException</tt>  starts at
      offset 25. It simply prints the error  and branches back to the normal
      execution flow, i.e. as if no exception had occurred.
    </p>
  
    </section>
  
    <section name="3 The BCEL API">
    <p>
      The <font face="helvetica">BCEL </font>API abstracts from  the concrete circumstances of the Java Virtual Machine and
      how  to  read and  write  binary Java  class  files.   The API  mainly
      consists of three parts:
    </p>
  
    <p>
  
      <ol type="1">
      <li> A package that contains classes that describe ``static''
      constraints of class files, i.e., reflect the class file format and
      is not intended for byte code modifications.  The classes may be
      used to read and write class files from or to a file.  This is
      useful especially for analyzing Java classes without having the
      source files at hand.  The main data structure is called
      <tt>JavaClass</tt> which contains methods, fields, etc..</li>
  
      <li> A  package to dynamically generate  or modify <tt>JavaClass</tt>
      objects.  It  may be  used  e.g. to  insert  analysis  code, to  strip
      unnecessary  information from class  files, or  to implement  the code
      generator back-end of a Java compiler.</li>
  
      <li> Various code examples and  utilities like a class file viewer, a
      tool  to convert class  files into  HTML, and  a converter  from class
      files to the Jasmin assembly language [].</li>
      </ol>
    </p>
    
    </section>
    
    <section name="3.1 JavaClass">
    <p>
      The  ``static''  component of  the  <font face="helvetica">BCEL </font>API  resides in  the  package
       and represents class files.  All of the
      binary   components  and   data   structures  declared   in  the   JVM
      specification  [] and described  in section  <a href="#sec:jvm">2</a> are
      mapped to classes.  Figure  shows an UML diagram of the
      hierarchy of  classes of the  <font face="helvetica">BCEL </font>API.  Figure   in the
      appendix also  shows a  detailed diagram of  the <tt>ConstantPool</tt>
      components.
    </p>
    
    <p>
    <img src="eps/javaclass.gif"/>
    <br/>
    Figure 3: UML diagram for the <font face="helvetica">BCEL</font>API
    </p>
  
    <p>
      The  top-level data  structure  is <tt>JavaClass</tt>,  which in  most
      cases is created by  a <tt>ClassParser</tt> object that is capable
      of parsing  binary class files. A  <tt>JavaClass</tt> object basically
      consists of  fields, methods, symbolic  references to the  super class
      and to the implemented interfaces.
    </p>
    
    <p>
      The  constant pool serves  as some  kind of  central repository  and is  thus of
      outstanding  importance  for  all  components.   <tt>ConstantPool</tt>
      objects contain  an array of fixed size  of <tt>Constant</tt> entries,
      which may be retrieved via the <tt>getConstant()</tt> method taking an
      integer  index as argument.  Indexes to  the constant pool may be  contained in
      instructions as well as in other components of a class file and in constant pool 
      entries themselves.
    </p>
    
    <p>
      Methods and  fields contain  a signature, symbolically  defining their
      types.   Access  flags  like  <tt>public static  final</tt>  occur  in
      several  places  and  are  encoded   by  an  integer  bit  mask,  e.g.
      <tt>public static final</tt> matches to the Java expression
    </p>
  
  
    <source>
    int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;
    </source>
  
    <p>
      As mentioned in section <a href="#sec:format">2.1</a> already, several components
      may contain <em>attribute</em> objects: classes, fields, methods, and
      <tt>Code</tt> objects (introduced in section <a href="#sec:code2">2.3</a>).  The
      latter is an attribute itself that contains the actual byte code
      array, the maximum stack size, the number of local variables, a table
      of handled exceptions, and some optional debugging information coded
      as <tt>LineNumberTable</tt> and <tt>LocalVariableTable</tt>
      attributes. Attributes are in general specific to some data structure,
      i.e. no two components share the same kind of attribute, though this
      is not explicitly forbidden. In the figure the <tt>Attribute</tt>
      classes are marked with the component they belong to.
    </p>
  
    </section>
    
    <section name="3.2 Class repository">
    <p>
      Using the provided <tt>Repository</tt> class, reading class files into
      a <tt>JavaClass</tt> object is quite simple:
    </p>
  
    <source>
    JavaClass clazz = Repository.lookupClass("java.lang.String");
    </source>
  
    <p>
      The repository also contains methods providing the dynamic equivalent
      of the <tt>instanceof</tt> operator, and other useful routines:
    </p>
  
    <source>
    if(Repository.instanceOf(clazz, super_class) {
      ...
    }
    </source>
  
    </section>
    
    <section name="3.2.1 Accessing class file data">
  
    <p>
      Information within the class file components may be accessed like Java
      Beans via intuitive set/get methods.  All of them also define a
      <tt>toString()</tt> method so that implementing a simple class viewer
      is very easy. In fact all of the examples used here have been produced
      this way:
    </p>
  
    <source>
    System.out.println(clazz);
    printCode(clazz.getMethods());
    ...
    public static void printCode(Method[] methods) {
      for(int i=0; i &lt; methods.length; i++) {
        System.out.println(methods[i]);
  
        Code code = methods[i].getCode();
        if(code != null) // Non-abstract method
          System.out.println(code);
      }
    }
    </source>
  
    </section>
  
    <section name="3.2.2 Analyzing class data">
    <p>
      Last but not least, <font face="helvetica">BCEL </font>supports the <em>Visitor</em> design
      pattern [],  so one can write visitor  objects to traverse
      and analyze the contents of a class file. Included in the distribution
      is a  class <tt>JasminVisitor</tt> that converts class  files into the
      Jasmin assembler language [].
    </p>
  
    </section>
  
    <section name="3.3 ClassGen">
    <p>
      This part of the API (package ) supplies
      an abstraction level for creating or transforming class files
      dynamically.  It makes the static constraints of Java class files like
      the hard-coded byte code addresses generic.  The generic constant pool, for
      example, is implemented by the class <tt>ConstantPoolGen</tt> which
      offers methods for adding different types of constants.  Accordingly,
      <tt>ClassGen</tt> offers an interface to add methods, fields, and
      attributes.  Figure  gives an overview of this part of
      the API.
    </p>
  
    <p>
      <img src="images/classgen.gif"/>
      <br/>
      Figure 4: UML diagram of the ClassGen API
    </p>
  
    </section>
  
    <section name="3.3.1 Types">
    <p>
      We abstract from the concrete details of the type signature syntax
      (see <a href="#sec:types">2.5</a>) by introducing the <tt>Type</tt> class, which is
      used, for example, by methods to define their return and argument
      types.  Concrete sub-classes are <tt>BasicType</tt>,
      <tt>ObjectType</tt>, and <tt>ArrayType</tt> which consists of the
      element type and the number of dimensions. For commonly used types the
      class offers some predefined constants.  For example the method
      signature of the <tt>main</tt> method as shown in section
      <a href="#sec:types">2.5</a> is represented by:
    </p>
  
    <source>
    Type   return_type = Type.VOID;
    Type[] arg_types   = new Type[] { new ArrayType(Type.STRING, 1) };
    </source>
  
    <p>
      <tt>Type</tt> also contains methods to convert types into textual
      signatures and vice versa. The sub-classes contain implementations of
      the routines and constraints specified by the Java Language
      Specification [].
    </p>
  
    </section>
  
    <section name="3.3.2 Generic fields and methods">
    <p>
      Fields  are represented  by  <tt>FieldGen</tt> objects,  which may  be
      freely  modified  by  the  user.   If  they  have  the  access  rights
      <tt>static final</tt>, i.e. are constants  and of basic type, they may
      optionally have an initializing value.
    </p>
    
    <p>
      Generic  methods contain  methods  to add  exceptions  the method  may
      throw,  local variables, and  exception handlers.  The latter  two are
      represented by  user-configurable objects as  well.  Because exception
      handlers  and   local  variables  contain  references   to  byte  code
      addresses, they  also take the role of  an <em>instruction targeter</em>
      in   our  terminology.    Instruction  targeters   contain   a  method
      <tt>updateTarget()</tt>    to   redirect    a    reference.    Generic
      (non-abstract) methods refer  to <em>instruction lists</em> that consist
      of  instruction  objects.   References  to  byte  code  addresses  are
      implemented by  handles to instruction  objects. This is  explained in
      more detail in the following sections.
    </p>
    
    <p>
      The maximum stack size needed by the method and the maximum number of
      local variables used may be set manually or computed via the
      <tt>setMaxStack()</tt> and <tt>setMaxLocals()</tt> methods
      automatically.
    </p>
  
    </section>
  
    <section name="3.3.3 Instructions">
    <p>
      Modeling instructions as objects may look somewhat odd at first sight,
      but in fact enables programmers to obtain a high-level view upon
      control flow without handling details like concrete byte code offsets.
      Instructions consist of a tag, i.e. an opcode, their length in bytes
      and an offset (or index) within the byte code. Since many instructions
      are immutable, the <tt>InstructionConstants</tt> interface offers
      shareable predefined ``fly-weight'' constants to use.
    </p>
    
    <p>
      Instructions are grouped via sub-classing, the type hierarchy of
      instruction classes is illustrated by (incomplete) figure
       in the appendix.  The most important family of
      instructions are the <em>branch instructions</em>, e.g.  <tt>goto</tt>,
      that branch to targets somewhere within the byte code.  Obviously,
      this makes them candidates for playing an <tt>InstructionTargeter</tt>
      role, too. Instructions are further grouped by the interfaces they
      implement, there are, e.g., <tt>TypedInstruction</tt>s that are
      associated with a specific type like <tt>ldc</tt>, or
      <tt>ExceptionThrower</tt> instructions that may raise exceptions when
      executed.
    </p>
    
    <p>
      All instructions can be traversed via <tt>accept(Visitor v)</tt> methods,
      i.e., the Visitor design pattern. There is however some special trick
      in these methods that allows to merge the handling of certain
      instruction groups. The <tt>accept()</tt> do not only call the
      corresponding <tt>visit()</tt> method, but call <tt>visit()</tt>
      methods of their respective super classes and implemented interfaces
      first, i.e. the most specific <tt>visit()</tt> call is last. Thus one
      can group the handling of, say, all <tt>BranchInstruction</tt>s into
      one single method.
    </p>
    
    <p>
      For debugging purposes  it may even make sense  to ``invent'' your own
      instructions. In a sophisticated code generator like the one used as a
      backend of  the Barat framework  [] one often has  to insert
      temporary  <tt>nop</tt> (No  operation) instructions.   When examining
      the produced  code it may  be very difficult  to track back  where the
      <tt>nop</tt>  was actually  inserted.  One  could think  of  a derived
      <tt>nop2</tt>   instruction   that   contains   additional   debugging
      information. When  the instruction  list is dumped  to byte  code, the
      extra data is simply dropped.
    </p>
    
    <p>
      One  could also  think  of  new byte  code  instructions operating  on
      complex numbers that  are replaced by normal byte  code upon load-time
      or are recognized by a new JVM.
    </p>
    
    </section>
  
    <section name="3.3.4 Instruction lists">
    <p>
      An <em>instruction list</em> is implemented by a list of
      <em>instruction handles</em> encapsulating instruction objects.
      References to instructions in the list are thus not implemented by
      direct pointers to instructions but by pointers to instruction
      <em>handles</em>. This makes appending, inserting and deleting areas of
      code very simple. Since we use symbolic references, computation of
      concrete byte code offsets does not need to occur until finalization,
      i.e.  until the user has finished the process of generating or
      transforming code.  We will use the term instruction handle and
      instruction synonymously throughout the rest of the paper.
      Instruction handles may contain additional user-defined data using the
      <tt>addAttribute()</tt> method.
    </p>
    
    <p>
      <b>Appending</b>
      One can append instructions or  other instruction lists anywhere to an
      existing  list.   The  instructions   are  appended  after  the  given
      instruction  handle.   All append  methods  return  a new  instruction
      handle which may  then be used as the target  of a branch instruction,
      e.g..
    </p>
  
    <source>
    InstructionList il = new InstructionList();
    ...
    GOTO g = new GOTO(null);
    il.append(g);
    ...
    InstructionHandle ih = il.append(InstructionConstants.ACONST_NULL);
    g.setTarget(ih);
    </source>
  
    <p>
      <b>Inserting</b>
      Instructions may be  inserted anywhere into an existing  list.  They are
      inserted  before the  given  instruction handle.   All insert  methods
      return a  new instruction handle which  may then be used  as the start
      address of an exception handler, for example.
    </p>
  
    <source>
    InstructionHandle start = il.insert(insertion_point,
                                        InstructionConstants.NOP);
    ...
    mg.addExceptionHandler(start, end, handler, "java.io.IOException");
    </source>
  
    <p>
      <b>Deleting</b>
      Deletion of instructions is also very straightforward; all instruction
      handles and the contained instructions within a given range are
      removed from the instruction list and disposed.  The <tt>delete()</tt>
      method may however throw a <tt>TargetLostException</tt> when there are
      instruction targeters still referencing one of the deleted
      instructions.  The user is forced to handle such exceptions in a
      <tt>try-catch</tt> block and redirect these references elsewhere. The
      <em>peep hole</em> optimizer described in section  gives a
      detailed example for this.
    </p>
  
    <source>
    try {
      il.delete(first, last);
    } catch(TargetLostException e) {
      InstructionHandle[] targets = e.getTargets();
      for(int i=0; i &lt; targets.length; i++) {
        InstructionTargeter[] targeters = targets[i].getTargeters();
        for(int j=0; j &lt; targeters.length; j++)
           targeters[j].updateTarget(targets[i], new_target);
      }
    }
    </source>
  
    <p>
      <b>Finalizing</b>
      When the instruction list is ready to be dumped to pure byte code, all
      symbolic references must be mapped to real byte code offsets.  This is
      done by the <tt>getByteCode()</tt> method which is called by default
      by <tt>MethodGen.getMethod()</tt>. Afterwards you should call
      <tt>dispose()</tt> so that the instruction handles can be reused
      internally. This helps to reduce memory usage.
    </p>
    
    <source>
    InstructionList il = new InstructionList();
  
    ClassGen  cg = new ClassGen("HelloWorld", "java.lang.Object",
                                "&lt;generated&#62;", ACC_PUBLIC | ACC_SUPER,
                                null);
    MethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC,
                                 Type.VOID, new Type[] { 
                                   new ArrayType(Type.STRING, 1) 
                                 }, new String[] { "argv" },
                                 "main", "HelloWorld", il, cp);
    ...
    cg.addMethod(mg.getMethod());
    il.dispose(); // Reuse instruction handles of list
    </source>
  
    </section>
  
    <section name="3.3.5 Code example revisited">
    <p>
      Using  instruction lists gives  us a  generic view  upon the  code: In
      Figure     we  again  present   the  code  chunk   of  the
      <tt>readInt()</tt>   method  of   the  faculty   example   in  section
      <a href="#sec:fac">2.6</a>:  The local  variables <tt>n</tt>  and  <tt>e1</tt> both
      hold two references to  instructions, defining their scope.  There are
      two <tt>goto</tt>s branching  to the <tt>iload</tt> at the  end of the
      method. One of the exception handlers is displayed, too: it references
      the start and the end of the <tt>try</tt> block and also the exception
      handler code.
    </p>
    
    <p>
      <img src="images/il.gif"/>
      <br/>
      Figure 5: Instruction list for <tt>readInt()</tt> method
    </p>
    
    </section>
    
    <section name="3.3.6 Instruction factories">
    <p>
      To simplify the creation of certain instructions the user can use the
      supplied <tt>InstructionFactory</tt> class which offers a lot of
      useful methods to create instructions from scratch. Alternatively, he
      can also use <em>compound instructions</em>: When producing byte code,
      some patterns typically occur very frequently, for instance the
      compilation of arithmetic or comparison expressions.  You certainly do
      not want to rewrite the code that translates such expressions into
      byte code in every place they may appear. In order to support this,
      the <font face="helvetica">BCEL </font>API includes a <em>compound instruction</em> (an interface with
      a single <tt>getInstructionList()</tt> method).  Instances of this
      class may be used in any place where normal instructions would occur,
      particularly in append operations.
    </p>
  
    <p>
      <b>Example: Pushing constants</b>
      Pushing constants  onto the  operand stack may  be coded  in different
      ways.  As   explained  in   section  <a href="#sec:code">2.2</a>  there   are  some
      ``short-cut'' instructions that can be  used to make the produced byte
      code  more  compact.  The   smallest  instruction  to  push  a  single
      <tt>1</tt> onto  the stack is  <tt>iconst_1</tt>, other possibilities
      are <tt>bipush</tt> (can be used to push values between -128 and 127),
      <tt>sipush</tt>  (between  -32768 and  32767),  or <tt>ldc</tt>  (load
      constant from constant pool).
    </p>
    
    <p>
      Instead of repeatedly selecting  the most compact instruction in, say,
      a switch, one can  use the compound <tt>PUSH</tt> instruction whenever
      pushing a constant  number or string. It will  produce the appropriate
      byte code instruction and insert entries into to constant pool if necessary.
    </p>
  
    <source>
    il.append(new PUSH(cp, "Hello, world"));
    il.append(new PUSH(cp, 4711));
    </source>
  
    </section>
        
    <section name="3.3.7 Code patterns using regular expressions">
    <p>
      When  transforming  code, for  instance  during  optimization or  when
      inserting analysis  method calls,  one typically searches  for certain
      patterns  of  code to  perform  the  transformation  at.  To  simplify
      handling such situations <font face="helvetica">BCEL </font>introduces a special feature: One can
      search  for  given code  patterns  within  an  instruction list  using
      <em>regular  expressions</em>.   In  such expressions,  instructions  are
      represented by symbolic names, e.g.  "<tt>`IfInstruction'</tt>".  Meta
      characters  like  <tt>+</tt>, <tt>*</tt>,  and  <tt>(..|..)</tt> have  their
      usual meanings. Thus, the expression
    </p>
    
    <source>
    "`NOP'+(`ILOAD__'|`ALOAD__')*"
    </source>
  
    <p>
      represents a  piece of  code consisting of  at least  one <tt>NOP</tt>
      followed  by   a  possibly   empty  sequence  of   <tt>ILOAD</tt>  and
      <tt>ALOAD</tt> instructions.
    </p>
  
    <p>
      The  <tt>search()</tt> method  of class  <tt>FindPattern</tt>  gets an
      instruction list and a regular  expression as arguments and returns an
      array  describing   the  area  of   matched  instructions.  Additional
      constraints to  the matching  area of instructions,  which can  not be
      implemented via  regular expressions, may be  expressed via <em>code
      constraints</em>.
    </p>
    
    </section>
    
    <section name="3.3.8 Example: Optimizing boolean expressions">
    <p>
      In Java, boolean  values are mapped to 1 and  to 0, respectively. Thus,
      the simplest way to evaluate boolean expressions is to push a 1 or a 0
      onto the operand stack depending on the truth value of the expression.
      But this  way, the subsequent combination of  boolean expressions (with
      <tt>&amp;&amp;</tt>, e.g) yields  long chunks of code that push  lots of 1s and
      0s onto the stack.
    </p>
  
    <p>
      When the code has been finalized  these chunks can be optimized with a
      <em>peep  hole</em>  algorithm:  An  <tt>IfInstruction</tt>  (e.g.   the
      comparison of two  integers: <tt>if_icmpeq</tt>) that either produces
      a  1  or  a 0  on  the  stack  and  is  followed by  an  <tt>ifne</tt>
      instruction (branch  if stack value 
        0) may be  replaced by the
      <tt>IfInstruction</tt> with  its branch target replaced  by the target
      of  the  <tt>ifne</tt>  instruction:
    </p>
  
    <p>
      <font size="-1"></font>The  applied code  constraint  object ensures  that  the matched  code
      really  corresponds to  the targeted  expression  pattern.  Subsequent
      application of this algorithm removes all unnecessary stack operations
      and  branch instructions from  the byte  code. If  any of  the deleted
      instructions  is still  referenced by  an <tt>InstructionTargeter</tt>
      object, the reference has to be updated in the <tt>catch</tt>-clause.
    </p>
  
    <p>
      Code example  gives a  verbose example of how to create
      a class  file, while  example  shows  how to  implement a
      simple  peephole optimizer  and how  to deal  with <tt>TargetLost</tt>
      exceptions.
    </p>
  
    <p>
      <b>Example application:</b>
      The expression:
    </p>
  
    <source>
    if((a == null) || (i &lt; 2))
      System.out.println("Ooops");
    </source>
  
    <p>
      can be mapped to both of the chunks of byte code shown in figure
      . The left column represents the unoptimized code while
      the right column displays the same code after an aggressively
      optimizing peep hole algorithm has been applied:
    </p>
    
    <p>
      FIX ME!
    </p>
  
    </section>
    
    <section name="4 Application areas">
    <p>
      There are many possible application areas for <font face="helvetica">BCEL </font>ranging from class
      browsers, profilers, byte code optimizers, and compilers to
      sophisticated run-time analysis tools and extensions to the Java
      language [,].
    </p>
  
    <p>
      Compilers like the Barat compiler  [] use <font face="helvetica">BCEL </font>to implement a
      byte code  generating back end.  Other possible  application areas are
      the  static  analysis   of  byte code []  or  examining  the
      run-time behavior  of classes by inserting calls  to profiling methods
      into the  code. Further examples  are extending Java  with Eiffel-like
      assertions  [], automated delegation  [], or
      with the concepts of ``Aspect-Oriented Programming'' [].
    </p>
  
    </section>
  
    <section name="4.1 Class loaders">
    <p>
      Class loaders  are responsible for  loading class files from  the file
      system  or  other resources  and  passing the  byte  code  to the  Virtual Machine 
      [].  A custom  <tt>ClassLoader</tt> object may be used
      to  intercept the  standard procedure  of  loading a  class, i.e.  the
      system class loader, and  perform some transformations before actually
      passing the byte code to the JVM.
    </p>
    
    <p>
      A  possible  scenario is  described  in figure  :
      During run-time the Virtual Machine requests a custom class loader to load a given
      class.  But before  the JVM  actually sees  the byte  code,  the class
      loader makes  a ``side-step'' and performs some  transformation to the
      class. To  make sure that  the modified byte  code is still  valid and
      does not violate any of the  JVM's rules it is checked by the verifier
      before the JVM finally executes it.
    </p>
    
    <p>
      <img src="images/classloader.gif"/>
      <br/>
      Figure 7: Class loaders
    </p>
  
    <p>
      Using class loaders  is an elegant way of extending  the Java Virtual Machine with new
      features  without   actually  modifying  it.    This  concept  enables
      developers to use <em>load-time reflection</em> to implement their ideas
      as opposed to  the static reflection supported by  the Java Reflection
      API [].  Load-time transformations supply the user with
      a new  level of abstraction.   He is not  strictly tied to  the static
      constraints of the  original authors of the classes  but may customize
      the applications  with third-party code  in order to benefit  from new
      features. Such  transformations may be executed on  demand and neither
      interfere with other users, nor alter the original byte code. In fact,
      class loaders may even create  classes <em>ad hoc</em> without loading a
      file at all.
    </p>
    
    </section>
    
    <section name="4.1.1 Example: Poor Man's Genericity">
    <p>
      The  ``Poor Man's  Genericity'' project  [] that  extends Java
      with parameterized  classes, for  example, uses <font face="helvetica">BCEL </font>in two  places to
      generate instances of  parameterized classes: During compile-time (the
      standard  <tt>javac</tt> with  some slightly  changed classes)  and at
      run-time  using  a  custom  class  loader.   The  compiler  puts  some
      additional  type information into  class files  which is  evaluated at
      load-time  by  the  class  loader.   The class  loader  performs  some
      transformations on  the loaded  class and passes  them to the  VM. The
      following  algorithm illustrates  how  the load  method  of the  class
      loader   fulfills    the   request   for    a   parameterized   class,
      e.g. <tt>Stack&lt;String&gt;</tt>
    </p>
    
    <p>
      <ol type="1">
      <li>  Search for  class  <tt>Stack</tt>,  load it,  and  check for  a
      certain class  attribute containing additional  type information. I.e.
      the   attribute   defines   the    ``real''   name   of   the   class,
      i.e. <tt>Stack&lt;A&gt;</tt>.</li>
  
      <li>  Replace  all  occurrences  and  references to  the  formal  type
      <tt>A</tt>  with references  to the  actual type  <tt>String</tt>. For
      example the method
      </li>
  
      <source>
      void push(A obj) { ... }
      </source>
    
      <p>
        becomes
      </p>
  
      <source>
      void push(String obj) { ... }
      </source>
  
      <li> Return the resulting class to the Virtual Machine.</li>
      </ol>
    </p>
    
    </section>
    
  </body>
  </document>
  
  
  
  1.3       +1 -0      jakarta-bcel/xdocs/stylesheets/project.xml
  
  Index: project.xml
  ===================================================================
  RCS file: /home/cvs/jakarta-bcel/xdocs/stylesheets/project.xml,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- project.xml	2001/10/30 16:49:46	1.2
  +++ project.xml	2001/11/09 23:25:10	1.3
  @@ -7,6 +7,7 @@
     <body>
       <menu name="BCEL">
         <item name="Overview" href="/index.html"/>
  +      <item name="Manual" href="/manual.html"/>
       </menu>
     </body>
   </project>
  
  
  

--
To unsubscribe, e-mail:   <mailto:bcel-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:bcel-dev-help@jakarta.apache.org>


Mime
View raw message