Quantcast
Channel: Adam Cameron's Dev Blog
Viewing all articles
Browse latest Browse all 1332

Regex help please

$
0
0
G'day:
I'm hoping Peter Boughton or Ben Nadel might see this. Or someone else who is good @ regular expression patterns that I'm unaware of.

Here's the challenge...



Given this string:

Lorem ipsum dolor sit

I want to extract the leading sub-string which is:
  • no more than n characters long;
  • breaks at the previous whole word, rather than in the middle of a word;
  • if no complete single word matches, them matches at least the first word, even if the length of the sub-string is greater than n.

I've come up with this:

// trimToWord.cfm
string function trimToWord(required string string, required numeric index){
return reReplace(string, "^((?:.{1,#index#}(?=\s|$)\b)|(?:.+?\b)).*", "\1", "ONE");
}

It works, but that regex is a bit hoary.

Here's a visual representation of it (courtesy of regexper.com), by way of explanation:



Anyone fancy improving it for me?

Here's some unit tests to run your suggestions through:

// TestCase.cfc
component extends="testbox.system.BaseSpec" {

function beforeAll(){
include "trimToWord.cfm";
variables.sample = "Lorem ipsum dolor sit";
}

function run(){
describe("Tests for trimToWord()", function(){
it("works when the trim point is smaller than the first word 'Lorem'", function(index){
for (var i=1; i <= 5; i++){
expect(
trimToWord(sample, i)
).toBe(left(sample, 5), "trimming @ #i#");
}
});
it("works when the trim point is between the first and second words 'Lorem ipsum'", function(index){
for (var i=6; i <= 10; i++){
expect(
trimToWord(sample, i)
).toBe(left(sample, 5), "trimming @ #i#");
}
});
it("works when the trim point is between the second and third words 'Lorem ipsum dolor'", function(index){
for (var i=11; i <= 16; i++){
expect(
trimToWord(sample, i)
).toBe(left(sample, 11), "trimming @ #i#");
}
});
it("works when the trim point is between the third and fourth words 'Lorem ipsum dolor sit'", function(index){
for (var i=17; i <= 20; i++){
expect(
trimToWord(sample, i)
).toBe(left(sample, 17), "trimming @ #i#");
}
});
it("works when the trim point is at the end of the string 'Lorem ipsum dolor sit'", function(index){
expect(
trimToWord(sample, 21)
).toBe(sample);
});
});
}
}

Cheers.

--
Adam




Viewing all articles
Browse latest Browse all 1332

Trending Articles