Java RegEx
Introduction
RegEx is use in :
- Validation Frameworks
- Pattern matching
- Translation
Pattern and Matcher class present in javax.util.regex package and are introduced as part of Java 1.4
Pattern : Pattern is object representatin of regular expression. Pattern object can be created using factory method in Pattern class;
Signature:
public static Pattern compile(String regularExpression);
Matcher : Matcher object is used to match a given pattern in the target String. Matcher object can be created using matcher() method of Pattern class.
Signature:
public Matcher matcher(String target);
Methods of Matcher class
find() : Attempts to find next match and returns true if it found otherwise, returns false.
Signature:
public boolean find();
start() : Returns starting index of match.
Signature:
public int start();
end() : Returns ending index + 1 of match.
Signature:
public int end();
group() : Returns the string of that is mathed the regular expression.
Signature:
public String group();
Sample Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExDemo {
public static void main(String[] args) {
Pattern p = Pattern.compile("ab");
Matcher m = p.matcher("ababbaba");
int count = 0;
while(m.find()) {
count++;
System.out.println(m.start() + "..." + m.end() + "..." + m.group());
}
System.out.println("Number of occurrence : " + count);
}
}
Output
0...2...ab
2...4...ab
5...7...ab
Number of occurrence : 3
Regular Expressions
1. Character Classes
Expression | Meaning |
---|---|
[abc] | a OR b OR c |
[^abc] | Except a , b and c |
[a-z] | a to z i.e Any lower case alphabet |
[A-Z] | A to Z i.e any upper case alphabet |
[a-z A-Z] | Any lowercase OR upper case aphabet |
[0-9] | Any digit 0 to 9 |
[a-z A-Z 0-9] | Any alpha numberic symbol |
[^a-z A-Z 0-9] | Other than alpha numeric symbols i.e special characters |
2. Pre-defined Character classes
Expression | Meaning |
---|---|
\s | Space Character |
\S | Any character except space |
\d | Any digit 0-9 |
\D | Any character except digit |
\w | Any word character( Alpha numeric) |
\W | Except word character(Special character) |
. | Any symbol including special character |
Quantifiers
Used to specify the number of occurrences of a character.
Expression | Meaning |
---|---|
a | Exactly one a (1) |
a+ | Atleast one a (1 or more) |
a* | Any number of a's including zero number ( o or more) |
a? | Atmost one a ( 0 or 1) |
Split
Target string can be split with respect to a given delimiter using split method from Pattern class