jakarta-oro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandramouli Kharidehal <Ckharide...@sapient.com>
Subject RE: Doubt about ORO
Date Mon, 07 Jan 2002 07:29:39 GMT
Thanks a Lot again , So i can use a Regexp string in my Java class
For Example /d matches a numeric and /w matches a alpha numeric 
What is it for matching unicode characters . Hope I am underdstandable abt
the question am I asking !!


Thsi is the code I have  and to incorporate a regular expression 
I just give the type as parameters to this class
For xmalple to have a regular expression that accepts only word characters 
I will have RegExFormatter("\\w") But when i say 'w ' here does it mean
unicode also 
or the alpha numeric characters




// jakarta apache imports for the ORO regular expression library
import org.apache.oro.text.regex.Perl5Compiler;
import org.apache.oro.text.regex.Perl5Matcher;
import org.apache.oro.text.regex.Perl5Substitution;
import org.apache.oro.text.regex.Pattern;
import org.apache.oro.text.regex.Util;
import org.apache.oro.text.regex.MatchResult;
import org.apache.oro.text.regex.MalformedPatternException;

// framework imports
public class RegexFormatter implements Formatter {

    // Possible error codes
    public static final String MATCH_FAILURE_ERROR = "MatchFailure";

    /**
     * <p>Regular expression compiler.  Only one is needed, as it is
stateless
     * factory class.</p>
     */
    protected static final Perl5Compiler compiler = new Perl5Compiler();


    /**
     * <p>Regular expression/pattern associated to this RegexFormatter for
validation
     * purposes.</p>
     */
    protected Pattern regex;

    /**
     * <p>The regular espression pattern that is used for matching for
formatting.
     * The first match group will replace $1 in the output substitution
expression.</p>
     */
    protected Pattern outMatch;

    /**
     * <p>The substitution that will be used for output of the formatted
input.</p>
     */
    protected Perl5Substitution outSubstitution;

    /**
     * <p>Defines whether the substitution will be global.</p>
     */
    protected boolean globalSubstitution;

    /**
     * <p>Creates a new RegexFormatter object with the specified regular
     * expression.</p>
     *
     * @param <code>expression</code> the regular expression used to
validate
     *              Strings with this instance of RegexFormatter.
     * @throws <code>InvalidConfigurationException</code> thrown when a
     *               MalformedPatternException is caught due to an invalid
     *               regular expression String being supplied. Thrown as a
     *               RuntimeException due to the way Format objects are
     *               usually instantiated (statically).
     */
    public RegexFormatter(String expression) {
        try {
            // Compile the regular expression object
            this.regex = RegexFormatter.compiler.compile(expression);
        }
        catch (MalformedPatternException mpe) {
            Logger.error(Logger.PRODUCER_FORMAT, "RegexFormatter::<init>.
Class "
                + "could not be instantiated due to malformed pattern
supplied. "
                + "Pattern: " + expression);
            Logger.error(Logger.PRODUCER_FORMAT, "RegexFormatter::<init>. "
                + "MalformedPatternException is: " + mpe.toString());

            // Rethrow the exception after logging the errors.
            throw new InvalidConfigurationException(
                "RegexFormatter::<init> caught a MalformedPatternException.
",mpe);
        }
    }


    /**
     * <p>Creates a new RegexFormatter object with the specified regular
     * expression.</p>
     *
     * @param <code>expression</code> the regular expression used to
validate
     *              Strings with this instance of RegexFormatter.
     * @param <code>outMatch</code> the regular expression used to match
against
     *              during substitution. The first match group will replace
$1 in
     *              the substitution, the second group will replace $2 and
so on.
     * @param <code>outSubst</code> the output substitution expression. A
'$1' in
     *              this string will be replaced by the first matched group
for data.
     * @param <code>globalSubstitution</code> defines whether the
substitution will
     *              be applied as many times as found or only the first
time.
     * @throws <code>InvalidConfigurationException</code> thrown when a
     *               MalformedPatternException is caught due to an invalid
     *               regular expression String being supplied. Thrown as a
     *               RuntimeException due to the way Format objects are
     *               usually instantiated (statically).
     */
    public RegexFormatter(String expression,String outMatch,String
outSubst,boolean globalSubst) {
        try {
            // Compile the regular expression object
            this.regex = RegexFormatter.compiler.compile(expression);

            // Compile the out match
            this.outMatch = RegexFormatter.compiler.compile(outMatch);

            // Create a substituter with the substitution string
            this.outSubstitution = new Perl5Substitution(outSubst);

            // Define if this substitution will be global
            this.globalSubstitution = globalSubst;
        }
        catch (MalformedPatternException mpe) {
            Logger.error(Logger.PRODUCER_FORMAT, "RegexFormatter::<init>.
Class "
                + "could not be instantiated due to malformed pattern
supplied. "
                + "Pattern: " + expression);
            Logger.error(Logger.PRODUCER_FORMAT, "RegexFormatter::<init>. "
                + "MalformedPatternException is: " + mpe.toString());

            // Rethrow the exception after logging the errors.
            throw new InvalidConfigurationException(
                "RegexFormatter::<init> caught a MalformedPatternException.
",mpe);
        }
    }


    /**
     * <p>Implementation of the abstract parseObject() method from
     * Formatter. Major method of this class. Applies a regular expression
     * to a String to determine if it matches. If a match is found, the
     * matching subsection of the string is returned. If a match is not
found
     * then a ParsingException is thrown.  See class level javadocs for
     * examples of usage.</p>
     *
     * @param <code>input</code> string to be matched
     * @return <code>Object</code> a String containing the matching section
of
     *               the String passed in.
     * @throws <code>ParsingException</code> thrown when the String does not
     *               match the regular expression.
     */
    public Object parseObject(String text) throws ParsingException {
        Perl5Matcher matcher = new Perl5Matcher();

        if ( !matcher.contains(text, this.regex) ) {
            // If the string passed in does not match the regular expression
            // against which it is being validated, throw a
ParsingException.
            // Since this can happen fairly frequently we don't want to do a
lot
            // of expensive concatenation for the message.
            throw new ParsingException(MATCH_FAILURE_ERROR);
        }

        // Returns the section of the string which matched the regex.
        return matcher.getMatch().toString();
    }


    /**
     * <p>Implemented for compatibility with the Formatter class. Return
     * the input without modifiction.</p>
     *
     * @param <code>input</code> an Object which should always be a String.
     * @return <code>String</code> the String passed in initially.
     */
    public String format(Object input) {
        if (this.outMatch == null) {
            return input.toString();
        } else {

            Perl5Matcher matcher = new Perl5Matcher();

            if (matcher.contains(input.toString(), this.outMatch) ) {

                StringBuffer sbuf = new StringBuffer();
                String output =
                    Util.substitute(matcher,
                                    this.outMatch,
                                    this.outSubstitution,
                                    input.toString(),
 
this.globalSubstitution?Util.SUBSTITUTE_ALL:1);
                return output;
            }
        }
        return null;
    }
}

-----Original Message-----
From: Daniel F. Savarese [mailto:dfs@savarese.org]
Sent: Monday, January 07, 2002 2:53 AM
To: ORO Users List
Subject: Re: Doubt about ORO 



In message <295A9D64E5DC2D469405DE8037DDAB694FBA57@delmmsx01.sapient.com>,
Chan
>The ORO packages work well for ASCII Character set
>But my doubt does it work for UTF-8 also !! 

As someone else mentioned, UTF-8 is a method of encoding Unicode as a
series of bytes, so the question doesn't make a lot of sense given
that Java characters are always a 16-bit representation of Unicode.  I
assume you mean "Do the ORO packages work with Java character values
greater than 255?"  The answer is yes for everything except for the
.awk package, which only works with character values 0-255.

daniel





--
To unsubscribe, e-mail:   <mailto:oro-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:oro-user-help@jakarta.apache.org>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message