Kenny's Two Pennies

...and that's just about what it's worth
  Home |  Contact |   |  Login

Regex problem / solution

My brother gave me a list of zip codes and their cities.  I needed this list to construct a query that gives revenue information by city, given a couple of tables -- one with account numbers and their zip code, and another with account numbers and revenue.

The revenue information query was easy.  The harder part was getting the danged list of zip codes / city pairs into a table.

Here is what I had to work with:

91605 Los Angeles
91606 Los Angeles
91607 Los Angeles
91630 Thousand Oaks
93115 Fillmore
93552 Palmdale
(etc)

There were only about 60 lines of this stuff, and I figured the easy way would be paste the list into SQL Server's Management Studio query window, do a search and replace to convert the list into a bunch of INSERT statments, and then execute, to actually insert the pairs into a table I already had in place.

Piece o' cake, huh?

I thought I had regular expressions down.  Turns out I pretty much did, but it was the replace string that kicked my butt.  Here was my matching string:

^{[0-9]+}:b+{.*}

It basically says, from the start of a line, match up a string of 1 or more digits and "tag" them for replacement.  Then expect 1 or more spaces or tabs, and then a string of character up to the end of the line.  Tag the string of characters as well.

And here is the replacement string:

insert into zipcities(zipcode, city) values('\1', '\2')

This creates a list of my "insert" commands.  the \1 substitutes the first "tagged" data (the 1 or more digits), and the \2 does the same with the city names.

Here is the result:

insert into zipcities(zipcode, city) values('91606', 'Los Angeles')
insert into zipcities(zipcode, city) values('91605', 'Los Angeles')
insert into zipcities(zipcode, city) values('91607', 'Los Angeles')
insert into zipcities(zipcode, city) values('91630', 'Thousand Oaks')
insert into zipcities(zipcode, city) values('93115', 'Fillmore')
insert into zipcities(zipcode, city) values('93552', 'Palmdale')

Nifty, huh?

The part that kicked my butt?  I naturally assumed that the replacement string followed regular expression rules (the parentheses needed to be escaped with a leading '\' character).  Turns out the replacement string does  not use any regular expression syntax at all.  It just uses the \1, \2, etc. for repacements.  I think this replacement syntax is not universal -- Management Studio does it this way but other engines may use another replacement syntax.  I'm not sure about that though.

There, I have documented this for future reference!

Print | posted on Saturday, September 29, 2007 9:42 AM |

Comments have been closed on this topic.