3

I would like a regular expression to match given a partial or camel cased string. For example, if the search set contains the string "MyPossibleResultString" I want to be able to match it with the likes of the following:

  • MyPossibleResultString
  • MPRS
  • MPRString
  • MyPosResStr
  • M

I'd also like to include wildcard matching, e.g.:

  • MyP*RString
  • *PosResString
  • My*String

If it's not clear what I mean, the only example I can think of is Eclipse's "Open Type" dialog which is pretty much the exact behaviour I'm looking for. I'm not too up on the use of regexes, so I'm not sure if it matters if I'm looking for a solution in Java.

2
  • If Eclipse has pretty much the exact behavior you're looking for, then couldn't you just look in Eclipse's source code? Commented Apr 13, 2009 at 21:12
  • I could, I'll consider that option as I continue to look for an answer myself :-) Commented Apr 13, 2009 at 21:26

5 Answers 5

4

Ok so I can't really see why you would need the wildcard feature if you can already support the matching described in the first example. This is what I put together. Given a query string query, you use a regular expression to create a regular expression:

String re = "\\b(" + query.replaceAll("([A-Z][^A-Z]*)", "$1[^A-Z]*") + ".*?)\\b";

For example the query MyPosResStr will become the regex:

\\b(My[^A-Z]*Pos[^A-Z]*Res[^A-Z]*Str[^A-Z]*.*?)\\b

You then use this regex for your matching using the Matcher.find method to get something like this:

public static String matchCamelCase(String query, String str) {
    query = query.replaceAll("\\*", ".*?");
    String re = "\\b(" + query.replaceAll("([A-Z][^A-Z]*)", "$1[^A-Z]*") + ".*?)\\b";

    System.out.println(re);
    Pattern regex = Pattern.compile(re);

    Matcher m = regex.matcher(str);

    if  (m.find()) {
        return m.group();
    } else return null;
}

This will return the first match to your camel case query in the string str.

EDIT: I have added a line to handle wildcards since in my tired stupor I didn't appreciate the need for them

Sign up to request clarification or add additional context in comments.

The first example does not include the case where I want "*PRS" to match, where "PRS" should not.
I was going to say it doesn't work for a few patterns before you edited, now the only example that doesn't match is the likes of 'M'/'MP'/'MPR' without the explicit dangling wildcard. I'm sure it's just a small factor, I do like that it's more concise (I'm sure I'll be able to read it soon :-D) +1
Thank for catching that! You are correct, thats another case I missed. Coming right up.
If the target string started with lower-case "my", what would the regexp look like? i.e. mPosResStr should match.
2

As danbruc said, you have to generate a new regex for each new query. This code should do what you want.

public Pattern queryToPattern(String query) {
    StringBuilder sb = new StringBuilder();
    char[] chars = query.toCharArray();
    boolean incamel = false;
    for (int i=0; i < chars.length; i++) {
        if (chars[i] == '*') {
                            if (!incamel)
                    sb.append(".*");
        } else if (Character.isUpperCase(chars[i])) {
            if (incamel) {
                sb.append(".*");
            }
            sb.append(chars[i]);
            incamel = true;
        } else {
            sb.append(chars[i]);
        }

    }
    sb.append(".*");
    return Pattern.compile(sb.toString());
}

A query of: MyP*RString

Creates a pattern of: My.* P.* R.* String.*

This worked a treat, +1 for now, but I will let time mature to see if another answer is somehow better (in my ignorance I can't really judge this at the moment). Thanks.
1

It is not possible to do this with a single regular expression. You will have to build a regular expression based on the input and use this to search. It is easy to see that you cannot use a single regex - the user can search for any (cammel cased) string and so your regex needs to match any (cammel cased) string but than it is not a search anymore.

So do you mean, for example, the input string "MyPosRS" would be parsed to generate the regex for "MyPosRS", which could then be matched on?
Yes, you have to generate a regex based on the user input and use this to search. You cannot build one now, because you do not know the input hence not what the regex must match.
Okay, that makes sense. Seems pretty obvious now... +1
0

You could try something like:

class RegexTransformer {
    public String fromQuery(String query) {
        StringBuilder sb = new StringBuilder();
        sb.append("^");
        sb.append(query.replaceAll("(\\p{Upper}(?!\\p{Lower}))",
                "$1\\\\p{Alpha}*?"));
        sb.append("$");
        return sb.toString();
    }
}

See Pattern API for description of negative lookahead assertions (?!pat), POSIX character classes \p{class}, and reluctant quantifiers *?.

Example test case:

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;

import org.junit.Test;

public class RegexTransformerTest {

    private RegexTransformer rt = new RegexTransformer();

    @Test
    public void testQueries() {
        String in = "MyPossibleResultString";
        String q1 = "MyPossibleResultString";
        String q2 = "MPRS";
        String q3 = "MPRString";
        String q4 = "MyPosResStr"; // this wont work
        String q5 = "M";

        test(in, q1, "^MyPossibleResultString$");
        test(in, q2, "^M\\p{Alpha}*?P\\p{Alpha}*?R\\p{Alpha}*?S\\p{Alpha}*?$");
        test(in, q3, "^M\\p{Alpha}*?P\\p{Alpha}*?R\\p{Alpha}*?String$");
        test(in, q5, "^M\\p{Alpha}*?$");
    }

    private void test(String in, String query, String expected) {
        assertEquals("transform", expected, rt.fromQuery(query));
        assertTrue("match", in.matches(rt.fromQuery(query)));
    }
}

Comments

0

Il-Bhima's answear is great, but I find that this code works better for me (pardon my C#, but it's the same):

pattern = Regex.Escape(pattern);
pattern = pattern.Replace(@"\*", ".*?");
pattern = Regex.Replace(pattern, "([A-Z][^A-Z]*)", "$1[^A-Z]*?") + ".*";

Notice the ".* " at the end that allows incomplete "startof" phrases (also allows not specifying all of the capital letters) Also, the asterisk after the "[^A-Z]*" matcher fixes problems like q4 in toolkit's answer where small letters were provided after the capital ones (they should appear directly after the capital letter, not before the next one).

Comments

Your Answer

Draft saved
Draft discarded

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.