Can Regex paste values into another file?



  • Dear Folks

    I want to know if regex can be used to extract phone numbers from a list and paste them in order into another text file.

    My Excel address book has addresses with telephone numbers at the end of the line. If at all possible, I would like to extract them into another text file.

    Can I use regex to achieve this?

    Best regards and thanks in advance.



  • The extraction of phone numbers would most likely be achieved by either bookmarking the relevant lines with a regex (regular expression) so they could be removed later, or more than likely with a regex to remove all but the phone numbers. So this would be a “destructive” search and replace.
    The insertion of these numbers into another file could also be achieved with a regex, but again it would require setting up a particular way.

    From this you might gather that it ain’t gonna be quick and would very likely be a bit messy.

    Of course N++ can utilize additional plugins such as python-script. This programming language could easily do what you require, but there are far fewer python users than regex users and you might not want to be left with something you could not support later on, without learning the language yourself.

    As the source is Excel, why not just keep to that environment and learn it’s macro code (VBA). I’d think either way you will have to learn something, I’d suggest keep it all in the one package.

    I’ve done Excel programming and whilst the initial learning path can be steep, there are lots of code snippets on the web. You may even find something very close to what you want and thus get off to a great start without much effort.

    Terry



  • Thanks Terry R!
    The main reason I started the notepad++ way was because the ‘text to column’ in excel was turning all digits separated by either a dash or a forward slash into all sorts of dates. I tried very hard (by way of youtube and excel forums,) but I was unable to stop excel from doing this.

    I do not want to take this into an Excel discussion, but there are other problems also. Sometimes when I copy paste from and into excel into text editors, apostrophes appear here and there (both ways)

    But my first two problems (to append a hash mark to all lines staring with a digit and swapping the first and second blocks of text) were solved so elegantly with regex, that I thought may be this is the way to go. As for the process being messy or destructive, I have no issues. The original file is secure and I am just using a lot of copies as I try to get my address book into shape. And only copies of the address column.

    Best regards.



  • @C-Ananda-Rao

    While Terry makes some good points, reading your OP makes me think that your request is doable. How about showing some sample data? [Best to mock some up data in Notepad++, then indent each line 4 spaces before pasting the data here]



  • Dear Mr Scott Sumner
    I am pasting some lines from my address book after adding 4 spaces to the beginning of every line.
    The 10 digit numbers (continuous digits, no dash character) are mobile numbers. Those digits in the form of 09876-987654 are landline numbers. I replaced all comma space by just commas to make it uniform.

    What I would like to happen is for:-
    Another text file to be created with 12 empty lines.
    Paste only telephone numbers line by line.
    If there is no telephone number, just put “none” in that line.

    F Mothar(V),Hagrin(M),1234554321,1122334455
    W/o Torinfel Gerjip,#12-98/C, M Fasthut(V),Beliperi(M),0987-654321,9988776655
    #67-7-A,L Fandrims Colony,Pelletin(T&M),9182736450,01928-4456689
    Rastaferap,Railway Colony,Molarrint(U)
    Dentenalaity(V),Dalrufatha(M),01234-556677,08723-98765
    H/o Hirundinaines Thomme,Lombef(V&M),9786751342,0987-654321
    Gyrrichalon(V),#9-8-76/D,Algewive(M)
    Vasternalik(V),Lombef(M),9786756453,0789-124578
    #11-42-V,Yendraiken Juhar,Tricherenti(V),Doorgassu(R)
    Gotorana Colony,Symboll,Pallore(TM&D),9753124680
    Bavemisuram(V),Gaujunar Dettil(P),Votari(M),9247482263
    

    Thanks and best regards!



  • @C-Ananda-Rao

    I would do it this way, which leaves lines with no phone numbers at the end totally blank; if you really need the word none there you could follow it up by replacing ^$ with none.

    Find what zone: (?-s)^.*?([0-9-]*)$
    Replace with zone: \1
    Match case checkbox: ticked or unticked doesn’t matter
    Wrap around checkbox: As you like it
    Search mode: Regular expression
    . matches newline checkbox: ticked or unticked doesn’t matter
    Action: Press Replace / Replace All button

    Here’s an explanation of how it works:

    THE FIND EXPRESSION:

    (?-s)^.*?([0-9-]*)$

    • [Use these options for the whole regular expression][1 ] (?-s)
      • [(hyphen inverts the meaning of the letters that follow)][1 ] -
      • [Dot doesn’t match line breaks][1 ] s
    • [Assert position at the beginning of a line (at beginning of the string or after a line break character) (carriage return and line feed, form feed)][2 ] ^
    • [Match any single character that is NOT a line break character (line feed, carriage return, form feed)][3 ] .*?
      • [Between zero and unlimited times, as few times as possible, expanding as needed (lazy)][4 ] *?
    • [Match the regex below and capture its match into backreference number 1][5 ] ([0-9-]*)
      • [Match a single character present in the list below][6 ] [0-9-]*
        • [Between zero and unlimited times, as many times as possible, giving back as needed (greedy)][7 ] *
        • [A character in the range between “0” and “9”][6 ] 0-9
        • [The literal character “-”][8 ] -
    • [Assert position at the end of a line (at the end of the string or before a line break character) (carriage return and line feed, form feed)][2 ] $

    THE REPLACE EXPRESSION:

    \1

    • [Insert the text that was last matched by capturing group number 1][9 ] \1
    • [Your regular expression may find zero-length matches][10 ]
      • [boost::regex 1.54–1.57 allows a zero-length match at the position where the previous match ends.][10 ]
      • [boost::regex 1.54–1.57 attempts the next match at the same position as the previous match if it was zero-length and may find a non-zero-length match at the same position.][10 ]

    Created with RegexBuddy

    [1 ]: http://www.regular-expressions.info/modifiers.html
    [2 ]: http://www.regular-expressions.info/anchors.html
    [3 ]: http://www.regular-expressions.info/dot.html
    [4 ]: http://www.regular-expressions.info/repeat.html#lazy
    [5 ]: http://www.regular-expressions.info/brackets.html
    [6 ]: http://www.regular-expressions.info/charclass.html
    [7 ]: http://www.regular-expressions.info/repeat.html
    [8 ]: http://www.regular-expressions.info/characters.html
    [9 ]: http://www.regular-expressions.info/replacebackref.html
    [10 ]: http://www.regular-expressions.info/zerolength.html



  • Now that we start to see the data and the output it does certainly become more doable.

    I suppose I should have asked for more detail at the start. I had visions of the file that the numbers to be inserted into already containing records and we’d have to artificially create some unique key in both so we could “data match”.

    So I was considering creating a “key” for each number, something like 6 characters of a surname, followed by 2 of something else reasonably unique, christian name or road name, then followed by “-0”
    In the 2nd file (which I thought would also contain names etc), make the same key, followed by a “-1”. Merge the 2 files and then sort alphabetically. This would put the phone number line alongside it’s receiving record. Then a regex would move the number from the first line of the “couple” to the 2nd line.

    I am a bit confused now though as “new” text file will be blank. I don’t see the sense of that. I’d just massage the data in the first file, destroy any unwanted information, then pad out with extra blank lines. There is no need to also copy that to another file, if this is the only data that will be in that file.

    Terry



  • Dear Mr Sumner
    Where is the ^$ sequence in the ‘Find what’ zone you have specified? the caret appears after the 1st right parenthesis and the dollar sign appears at the end. They do not occur together in that sequence. It is important for me to have no blanks when I take the problem back into Excel. So it is important for me to have ‘none’ or ‘null’ or some such where ever there are no telephone number sequences.

    Also, when I tried this, it only gives me one telephone number. Mind you, there are 2 telephone numbers in some lines. 2 mobile numbers or 2 landline numbers or 1 mobile and 1 landline number or only 1 telephone number - either mobile or a landline in yet other lines.

    The rigorous aspect about these telephone numbers is that mobile numbers are a sequence of 10 continuous digits, without any spaces or other characters.

    The landline numbers all begin with a zero (0) and have a ‘-’ character somewhere.

    Or was my data not representative enough?

    I have trimmed leading and trailing spaces.

    Am I missing something?

    Best regards!



  • Dear Mr Terry R

    I am depending upon the sequence of lines. They all have a serial number. In other words, each and every address line is on a line by itself. The next address starts at the beginning of the next line.

    In my imagination, I want to create a text file with the same number of lines in my Excel address book. The lines are all empty. Now I go to my text file where I have each address on a separate line. I then do 2 things:

    1. Delete all characters from those lines without phone numbers AND put ‘null’ or ‘none’ or ‘empty’ in all those lines.
    2. Paste all phone numbers (only) into other lines. Whether 1 or 2 telephone numbers or whether 1 mobile or 1 landline number.

    Best regards



  • Hello, @c-ananda-rao and All,

    I think that the following regex could achieve what you want, easily :-))

    As you said :

    I replaced all comma space by just commas to make it uniform.

    I, then, simply supposed that your phone numbers are separated with space(s), comma(s) or a mix range of these symbols !


    So, starting with your sample text, below, copied in a New # file :

    F Mothar(V),Hagrin(M),1234554321,1122334455
    W/o Torinfel Gerjip,#12-98/C, M Fasthut(V),Beliperi(M),0987-654321,9988776655
    #67-7-A,L Fandrims Colony,Pelletin(T&M),9182736450,01928-4456689
    Rastaferap,Railway Colony,Molarrint(U)
    Dentenalaity(V),Dalrufatha(M),01234-556677,08723-98765
    H/o Hirundinaines Thomme,Lombef(V&M),9786751342,0987-654321
    Gyrrichalon(V),#9-8-76/D,Algewive(M)
    Vasternalik(V),Lombef(M),9786756453,0789-124578
    #11-42-V,Yendraiken Juhar,Tricherenti(V),Doorgassu(R)
    Gotorana Colony,Symboll,Pallore(TM&D),9753124680
    Bavemisuram(V),Gaujunar Dettil(P),Votari(M),9247482263
    
    • Open the Replace dialog ( Ctrl + H )

    • Select the Regular expression search mode

    • Tick the Wrap around option

    • Use the following regex S/R :

    SEARCH (?-s)(?=^.*?(([, ]+[\d-]+)+)).+|.+

    REPLACE ?1\1:None

    • And, finally, click once on the Replace All button or several times on the Replace button

    You should get your expected text :

    ,1234554321,1122334455
    ,0987-654321,9988776655
    ,9182736450,01928-4456689
    None
    ,01234-556677,08723-98765
    ,9786751342,0987-654321
    None
    ,9786756453,0789-124578
    None
    ,9753124680
    ,9247482263
    

    Notes :

    • As usual, the (?-s) modifier means any dot ( . ) matches one standard character only ( and not the line–breaks )

    • The regex is made up of two alternatives, separated with the | symbol

      • The first alternative is the regex (?=^.*?(([, ]+[\d-]+)+)).+ which matches all the standard characters of the current line ( .+ ), ONLY IF the condition of the positive look-ahead is true. That is to say, if a non null range of consecutive phone numbers, preceded by, either, spaces and/or commas, exists in current line

      • If, at beginning of line, the previous alternative cannot be matched, then the regex engine tries the second alternative, .+, which, again, matches all the standard characters of the current line, this time, WITHOUT any condition

    • In the look-around structure, the part ([, ]+[\d-]+)+ represents the consecutive list of phone numbers, surrounded, itself, with parentheses in order to store the results as group 1, used, later, in replacement

    • In Replacement, we use the conditional replacement structure ?1\1:None

      • If group1 exists, the list of phone number(s) ( \1 ) is only written

      • If group1 does not exist, the string None replaces the entire line matched

    Remarks :

    • Note that this regex is “auto-convergent” ! If you re-run the regex, against the obtained results, it just replaces all lines, as previously :-))

    • Depending of your mother language, you may change the None English word, as you like !

    Best Regards,

    guy038

    P.S. :

    Of course, in order to delete the first comma, beginning each line and, for example, change any other comma into some space characters, use the regex S/R :

    ^,|(,)

    ?1\x20\x20\x20

    And you’ll get the text :

    1234554321   1122334455
    0987-654321   9988776655
    9182736450   01928-4456689
    None
    01234-556677   08723-98765
    9786751342   0987-654321
    None
    9786756453   0789-124578
    None
    9753124680
    9247482263
    

    Notes :

    • This regex searches, either, for :

      • A comma, beginning the line ( ^, )

      • Any other comma, stored as group 1

    • In replacement :

      • If the first alternatives matches, nothing is rewritten

      • If the second alternative matches, 3 space characters are rewritten.

    • Note that you may enter true space chars, instead of the escape syntax \x20, in the replacement zone !



  • @C-Ananda-Rao

    Well, when I asked for data, I guess I should have asked for an explanation of that data to accompany it… I suppose I assumed that the phone number was the last part of every line where digits were present. Anyway, hopefully @guy038’s response provides the help you needed.


Log in to reply