Java RegEx


Introduction

RegEx is use in :

  1. Validation Frameworks
  2. Pattern matching
  3. Translation

Pattern and Matcher class present in javax.util.regex package and are introduced as part of Java 1.4

Pattern : Pattern is object representatin of regular expression. Pattern object can be created using factory method in Pattern class;

Signature:

	public static Pattern compile(String regularExpression);

Matcher : Matcher object is used to match a given pattern in the target String. Matcher object can be created using matcher() method of Pattern class.

Signature:

	public Matcher matcher(String target);

Methods of Matcher class

find() : Attempts to find next match and returns true if it found otherwise, returns false.

Signature:

	public boolean find();

start() : Returns starting index of match.

Signature:

	public int start();

end() : Returns ending index + 1 of match.

Signature:

	public int end();

group() : Returns the string of that is mathed the regular expression.

Signature:

	public String group();

Sample Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExDemo {

	public static void main(String[] args) {
		Pattern p = Pattern.compile("ab");
		Matcher m = p.matcher("ababbaba");
		int count = 0;
		while(m.find()) {
			count++;
			System.out.println(m.start() + "..." + m.end() + "..." + m.group());
		}
		System.out.println("Number of occurrence : " + count);
	}

}
Output
0...2...ab
2...4...ab
5...7...ab
Number of occurrence : 3

Regular Expressions

1. Character Classes

Expression Meaning
[abc] a OR b OR c
[^abc] Except a , b and c
[a-z] a to z i.e Any lower case alphabet
[A-Z] A to Z i.e any upper case alphabet
[a-z A-Z] Any lowercase OR upper case aphabet
[0-9] Any digit 0 to 9
[a-z A-Z 0-9] Any alpha numberic symbol
[^a-z A-Z 0-9] Other than alpha numeric symbols i.e special characters

2. Pre-defined Character classes

Expression Meaning
\s Space Character
\S Any character except space
\d Any digit 0-9
\D Any character except digit
\w Any word character( Alpha numeric)
\W Except word character(Special character)
. Any symbol including special character

Quantifiers

Used to specify the number of occurrences of a character.

Expression Meaning
a Exactly one a (1)
a+ Atleast one a (1 or more)
a* Any number of a's including zero number ( o or more)
a? Atmost one a ( 0 or 1)

Split

Target string can be split with respect to a given delimiter using split method from Pattern class