add a prefix to a each string in a column



  • Hi guys,
    I was looking already long time to this subject, but coulnd find something to this.

    I want to add a prefix (A_) to a each string in a column. This hole string looks like this:

    .3 |1180| 1|1234 |abcd | 0| 0| 0| 2.0000|mm | 0 |W78 | 50|No | IC10 IC9

    and the result schoul look like this:

    .3 |1180| 1|1234 |abcd | 0| 0| 0| 2.0000|mm | 0 |W78 | 50|No | A_IC10 A_IC9

    Do you know what to do?



  • Hello @yascha-badenhop,

    Welcome to the Notepad++ Community !

    As usual, the problem is not to build the right S/R but, simply, to define in which cases the S/R should occur ;-))

    • Must the S/R occur on the last column of each line ?

    • Must the S/R occur on the 15th column of each line, like in your example ?

    • Must the S/R occur right before any string IC##, whatever its column, for any line ?

    • Must the S/R occur right before any string IC##, in the last column of each line ?

    • Must the S/R occur right before any string IC## of the 15th column of each line ?

    From your answer, it won’t be difficult to find out the correct regex S/R !

    Best Regards,

    guy038



  • Hello @guy038
    This case is quit easy, but I have also for example R10 / C10 / vdr10 / xtal10. I would like to add the prefix to als strings in column 15 (seperated with | ).
    I acctualy wanted to edit my last post, but I could anymore - Sorry for the indistict question :)



  • Hi, @yascha-badenhop, and All,

    OK, I get the problem ! I’ll try to describe my different steps in order to reach the right solution. If you’re in a hurry, just skip this section ;-))

    • Firstly, I tried to imagine the way to access to the 15th column, with the | separator

      • A column is composed of some standard characters, different from |, followed with the | separator which can be regex-translated as [^|\r\n]+\|. Indeed [^|\r\n] represents any char different from | and from any EOL character, repeated many times ( + ) and followed with a literal pipe character \| ( must be escaped because it’s a meta-regex character )

      • To get the 15th column we need to count the first 14 columns, from beginning of line, so the regex ^(?:[^|\r\n]+\|){14}. Note that I’m using a non-capturing group (?:........) group as we do not care about the contents of these different columns and we don’t need to store them, for further recall !

      • Then we place the \K regex feature which resets the regex engine search and locate the working position right after the | separator of the 14th column

    • Secondly, I considered the contents of the 15th column :

      • Seemingly, it contents several words, each of them preceded with a space char ( from your second post, we learn that these may be either R10 , C10, vdr10 or xtal10 ) However, I preferred to suppose that the first word of the 15th column could directly follow the | separator.

      • So, for the moment, the search regex is ^(?:[^|\r\n]+\|){14}\K\x20*\w+, as \x20* stands for any range of space chars, even none and \w+ a non-null list of word characters. But, as we must insert the A_ string between the possible blank character(s) and the subsequent word, I enclosed them between parentheses, which define two groups 1 and 2 ( as, you remember, the very first group is a non-capturing one ) giving the regex ^(?:[^|\r\n]+\|){14}\K(\x20*)(\w+)

    • Thirdly, I thought about the way of matching the second and subsequent words of the 15th column :

      • I first thought about to add an alternative ( | ) to the search regex and search for an other word, preceded with space character(s), also enclosed for parentheses as we need the contents for replacement => the regex ^(?:[^|\r\n]+\|){14}\K(\x20*)(\w+)|(\x20+)(\w+). However, this does not work as, when the present working location, of the regex engine, is after the 15th column, and that your text contains other columns, the second alternative, of the search regex, would match any subsequent words ! Not what we want, obviously !

      • The solution is to use the \G feature which needs that the next regex match begins at the exact location where the previous match ends. So, when all the blocks “space(s) + word” ( \x20+\w+ ) of the 15th column will be matched, the process will stop. Indeed, the first word of the 16th column is not closed to the last word of the 15th column because of the | separator and breaks the \G condition !

    • Finally a solution for the search regex could be :

    SEARCH ^(?:[^|\r\n]+\|){14}\K(\x20*)(\w+)|\G(\x20+)(\w+)

    • Fourthly, I built the replacement regex :

      • Note that groups 1 and 2 store the first space character(s) and word characters of the 15th column

      • Groups 3 and 4 store the second and subsequent space character(s) and word characters of the 15th column

      • So, the use conditional replacement (?#........) is needed and gives the replacement regex (?2\1A_\2)(?4\3A_\4). This means that :

        • If group 2 exists, we rewrite the space(s) characters first ( \1), followed with the string A_, and, finally, the word characters ( \2 )

        • If group 4 exists, we rewrite the space(s) characters first ( \3), followed with the string A_, and, finally, the word characters ( \4 )

      • After a while I realized that these two cases are mutually exclusive. And, as a non defined group n, noted \n in replacement, is supposed to be an empty group, with the Boost regex engine, I finally got the final replacement regex : \1\3A_\2\4


    So, @yascha-badenhop, the final regex S/R, to solve your problem, is :

    SEARCH ^(?:[^|\r\n]+\|){14}\K(\x20*)(\w+)|\G(\x20+)(\w+)

    REPLACE \1\3A_\2\4

    To test it, I considered the 2 colums sample text, R10 C10 vdr10 xtal10|xtal10 vdr10 C10 R10|, with various ranges of space chars before the words. this range is then repeated 9 times, giving 18 columns, in totality :

     R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10|
    

    Now :

    • Open the Replace dialog ( Ctrl + H )

    • Type in the regex ^(?:[^|\r\n]+\|){14}\K(\x20*)(\w+)|\G(\x20+)(\w+), in the Find what: zone

    • Type in the regex \1\3A_\2\4 in the Replace with: zone

    • Preferably, tick the Wrap around option

    • Choose the Regular expression search mode

    • Click once on the Replace All button, exclusively ( Because of the \K syntax, you must not use the Replace button ! )

    You should get your expected text :

     R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10| A_R10     A_C10 A_vdr10  A_xtal10|xtal10     vdr10 C10     R10| R10     C10 vdr10  xtal10|xtal10     vdr10 C10     R10|
    

    If you want to be more restrictive and match the exact words, that you spoke of in your second post, we just have to change the regex \w+ into the non-capturing group,followed with 10, so (?:xtal|vdr|R|C)10, giving the final regex S/R :

    SEARCH ^(?:[^|\r\n]+\|){14}\K(\x20*)((?:xtal|vdr|R|C)10)|\G(\x20+)((?:xtal|vdr|R|C)10)

    REPLACE \1\3A_\2\4

    This second solution is even better, as it prevents a second unwanted execution of the regex S/R, leading, for instance, to a 15th column like | A_A_IC10 A_A_IC9

    Best Regards,

    guy038


Log in to reply