Java RegEx
Introduction
RegEx is use in :
- Validation Frameworks
- Pattern matching
- Translation
Pattern and Matcher class present in javax.util.regex package and are introduced as part of Java 1.4
Pattern : Pattern is object representatin of regular expression. Pattern object can be created using factory method in Pattern class;
Signature:
public static Pattern compile(String regularExpression);Matcher : Matcher object is used to match a given pattern in the target String. Matcher object can be created using matcher() method of Pattern class.
Signature:
public Matcher matcher(String target);Methods of Matcher class
find() : Attempts to find next match and returns true if it found otherwise, returns false.
Signature:
public boolean find();start() : Returns starting index of match.
Signature:
public int start();end() : Returns ending index + 1 of match.
Signature:
public int end();group() : Returns the string of that is mathed the regular expression.
Signature:
public String group();Sample Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExDemo {
public static void main(String[] args) {
Pattern p = Pattern.compile("ab");
Matcher m = p.matcher("ababbaba");
int count = 0;
while(m.find()) {
count++;
System.out.println(m.start() + "..." + m.end() + "..." + m.group());
}
System.out.println("Number of occurrence : " + count);
}
}
Output
0...2...ab
2...4...ab
5...7...ab
Number of occurrence : 3
Regular Expressions
1. Character Classes
| Expression | Meaning |
|---|---|
| [abc] | a OR b OR c |
| [^abc] | Except a , b and c |
| [a-z] | a to z i.e Any lower case alphabet |
| [A-Z] | A to Z i.e any upper case alphabet |
| [a-z A-Z] | Any lowercase OR upper case aphabet |
| [0-9] | Any digit 0 to 9 |
| [a-z A-Z 0-9] | Any alpha numberic symbol |
| [^a-z A-Z 0-9] | Other than alpha numeric symbols i.e special characters |
2. Pre-defined Character classes
| Expression | Meaning |
|---|---|
| \s | Space Character |
| \S | Any character except space |
| \d | Any digit 0-9 |
| \D | Any character except digit |
| \w | Any word character( Alpha numeric) |
| \W | Except word character(Special character) |
| . | Any symbol including special character |
Quantifiers
Used to specify the number of occurrences of a character.
| Expression | Meaning |
|---|---|
| a | Exactly one a (1) |
| a+ | Atleast one a (1 or more) |
| a* | Any number of a's including zero number ( o or more) |
| a? | Atmost one a ( 0 or 1) |
Split
Target string can be split with respect to a given delimiter using split method from Pattern class