How to turn a 4 digits sequence into string and and get 5 digits sequence instead ?

  • Hello,

    I have a list of zip codes (~38000) and some of them are malformed.

    How could i detect in a list if a zip code has only 4 digits instead of 5 , if so i would like to add 0 at the head of it to get 5 digits at the end.

    ie , the first 3 zip codes of that list are correct, the last is not ( only 4 digits )

    10800 VILLY-LE-BOIS

    How could i turn “1090 FRANCHELEINS” to “01090 FRANCHELEINS” ?

    I was thinking about checking if the 5th character is a ‘space’ if not add ‘0’ at the end of the string.

    Many thanks for your support,

  • Find What: \b(\d{4})\b\x20
    Replace: 0$1\x20

    • \b = boundary – means a transition from something to word/number or word/number to something (0-width; doesn’t consume a character in the string)
    • () = capture whatever’s inside
    • \d{4} = exactly four digits
    • \x20 = space character
    • $1 = the first captured ()

    Thus, it will look for “boundary / 4 digits / boundary / space”, and replace it with "0 / original 4 digits / space`

  • Many thanks, it works perfectly.

    If the list starts differently, as follow :
    ( zip codes placed at the end of the string , all zip codes start after “.” character )


    How could i turn 4 digits ones to 5 digits, starting with 0xxxx ?
    In that case “.” has not a defined place due to city name length variation…

  • Just remove the \x20 from the end of both. \b matches the boundary between . and a number as well.
    This version will replace any 4-digit number with a 5-digit number.

    Find What: \b(\d{4})\b
    Replace: 0$1

    Note that \b will find lots of boundaries: start of line, end of line, start of word (ie, transition from punctuation or space to alphanumeric), end of word (ie, transition from alphanumeric to punctuation or space). So it should work wherever your zip code lies.

    Note that _ counts as alphanumeric, not punctuation, so this regex would not match BRESSE_1000, though it does match BRESSE.1000. Also, \b does not match the boundary between letters and numbers, so it will not match BRESSE1000 or 1000BRESSE. Using lookaheads or lookbehinds, those would be doable, too…

  • Thank you very much Peter.

  • Hello, @laurent-coulon, @peterjones and All,

    Even more simple :

    SEARCH \b\d{4}\b OR \<\d{4}\>

    REPLACE 0$0

    Just use the $0 syntax, in replacement, which always represents the entire match. So, it avoids to add parentheses in order to store the match at group 1 :-)



Log in to reply