Using RegEx to format text for import



  • I have this text

    With lines like these

    820 Review Conputer Process Annually No 2016-06-02 2019-05-31
    1064 Review customer documentation Annually No 2016-07-25 2019-07-25
    1282 Monitor and Clean up the Mailbox Weekly No 2015-09-21 2020-05-12
    1644 Generic paragraph for monthly report Monthly Yes 2015-10-19 2017-04-13

    I want to keep the text how ever I need to set a delimiter between each field.
    I created a regex to Identify the fields. However I cannot seem to figure out how to use it to “replace” the correct spaces to ; as field seperator.

    Sorry for the portato code…
    (\d{3,4}).(?<=)(.) (?=(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))(.)(No|Yes).(\d{4}-\d{2}-\d{2}).(\d{4}-\d{2}-\d{2})

    I need to get the above text to look like this
    820;Review Conputer Process;Annually;No;2016-06-02;2019-05-31
    1064;Review customer documentation;Annually;No;2016-07-25;2019-07-25
    1282;Monitor and Clean up the Mailbox;Weekly;No;2015-09-21;2020-05-12
    1644;Generic paragraph for monthly report;Monthly;Yes;2015-10-19;2017-04-13

    the first segment can be either 3 og 4 digits. There is no lenght on the second, the third can be Weekly,Monthly,Semi-annually,Quarterly,Annually or Bi-monthly and the forth one can be either Yes or No and the lat two is just date.

    I have to do it one segment of a time, that I ok. Doesn’t need to be fancy.

    Hope you can help.



  • @Dennis-Andersen

    from the given data I would think this should do it
    find what:^(\d{3,4})\h(.+?)\h(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly)\h(No|Yes)\h(\d{4}-\d{2}-\d{2})\h(\d{4}-\d{2}-\d{2})
    repalce with: $1;$2;$3;$4;$5;$6



  • Hello, @dennis-andersen, and All,

    Ah ! @ekopalypse beat me to it ;-))

    This other regex S/R should also work :

    SEARCH (?-si)^\d{3,4}\K\h+|\h+(?!.+(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))

    REPLACE ;

    Notes :

    • As usual, the in-line modifiers (?-si) ensure that :

      • Any dot regex symbol . represents a single standard character ( NOT like-breaks )

      • The search is processed in a non-insensitive way

    • Then the regex contains two alternatives, separated by the | alternation symbol. So the regex engine searches, from current location, for, either :

      • Some horizontal blank characters, after a 3 or 4-digits number, at beginning of lines and, due to the \K syntax, only matches these blank char(s)    ^\d{3,4}\K\h+

      • Some horizontal blank characters, ONLY IF NOT  followed by a non-null range of standard chars before the periodicity field    \h+(?!.+(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))

    • Due the \K syntax, you cannot use the step by step replacement with the Replace button. You must use the Replace All button, exclusively

    Best Regards,

    guy038

    BTW, @dennis-andersen, you very well described your goal and data which greatly helped us to figure out regex solutions ;-))



  • @Ekopalypse
    @guy038

    You guys are the best!!!

    One day I hope to be half as smart as you!!

    I got help and I learned something. Best day of the year so far…

    Thanks and have a great day.
    Dennis



  • Hello

    • List itemIdentify the common text string you wish to remove, in this example, “Accepts Google Pay”

    • List itemClick the Column Options dropdown arrow and select Set regular expression. The Set regular expression dialog box appears.

    • List itemIn the Match box, enter “Accepts Google Pay\s+(.+)” to identify the Accepts Google Pay string and then capture the remainder of the string in the first capture capture group.

    • List itemIn the Replace box, enter $1 to print out the first capture group. The prefix disappears from the column data, leaving the remainder of the string intact.

    I hope this information will be usefull for you.
    Thank you


Log in to reply