Community
    • Login

    Using RegEx to format text for import

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 4 Posters 538 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      Dennis Andersen
      last edited by

      I have this text

      With lines like these

      820 Review Conputer Process Annually No 2016-06-02 2019-05-31
      1064 Review customer documentation Annually No 2016-07-25 2019-07-25
      1282 Monitor and Clean up the Mailbox Weekly No 2015-09-21 2020-05-12
      1644 Generic paragraph for monthly report Monthly Yes 2015-10-19 2017-04-13

      I want to keep the text how ever I need to set a delimiter between each field.
      I created a regex to Identify the fields. However I cannot seem to figure out how to use it to “replace” the correct spaces to ; as field seperator.

      Sorry for the portato code…
      (\d{3,4}).(?<=)(.) (?=(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))(.)(No|Yes).(\d{4}-\d{2}-\d{2}).(\d{4}-\d{2}-\d{2})

      I need to get the above text to look like this
      820;Review Conputer Process;Annually;No;2016-06-02;2019-05-31
      1064;Review customer documentation;Annually;No;2016-07-25;2019-07-25
      1282;Monitor and Clean up the Mailbox;Weekly;No;2015-09-21;2020-05-12
      1644;Generic paragraph for monthly report;Monthly;Yes;2015-10-19;2017-04-13

      the first segment can be either 3 og 4 digits. There is no lenght on the second, the third can be Weekly,Monthly,Semi-annually,Quarterly,Annually or Bi-monthly and the forth one can be either Yes or No and the lat two is just date.

      I have to do it one segment of a time, that I ok. Doesn’t need to be fancy.

      Hope you can help.

      EkopalypseE 1 Reply Last reply Reply Quote 2
      • EkopalypseE
        Ekopalypse @Dennis Andersen
        last edited by

        @Dennis-Andersen

        from the given data I would think this should do it
        find what:^(\d{3,4})\h(.+?)\h(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly)\h(No|Yes)\h(\d{4}-\d{2}-\d{2})\h(\d{4}-\d{2}-\d{2})
        repalce with: $1;$2;$3;$4;$5;$6

        1 Reply Last reply Reply Quote 4
        • guy038G
          guy038
          last edited by guy038

          Hello, @dennis-andersen, and All,

          Ah ! @ekopalypse beat me to it ;-))

          This other regex S/R should also work :

          SEARCH (?-si)^\d{3,4}\K\h+|\h+(?!.+(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))

          REPLACE ;

          Notes :

          • As usual, the in-line modifiers (?-si) ensure that :

            • Any dot regex symbol . represents a single standard character ( NOT like-breaks )

            • The search is processed in a non-insensitive way

          • Then the regex contains two alternatives, separated by the | alternation symbol. So the regex engine searches, from current location, for, either :

            • Some horizontal blank characters, after a 3 or 4-digits number, at beginning of lines and, due to the \K syntax, only matches these blank char(s)    ^\d{3,4}\K\h+

            • Some horizontal blank characters, ONLY IF NOT  followed by a non-null range of standard chars before the periodicity field    \h+(?!.+(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))

          • Due the \K syntax, you cannot use the step by step replacement with the Replace button. You must use the Replace All button, exclusively

          Best Regards,

          guy038

          BTW, @dennis-andersen, you very well described your goal and data which greatly helped us to figure out regex solutions ;-))

          1 Reply Last reply Reply Quote 5
          • D
            Dennis Andersen
            last edited by

            @Ekopalypse
            @guy038

            You guys are the best!!!

            One day I hope to be half as smart as you!!

            I got help and I learned something. Best day of the year so far…

            Thanks and have a great day.
            Dennis

            1 Reply Last reply Reply Quote 3
            • Makwana PrahladM
              Makwana Prahlad Banned
              last edited by

              Hello

              • List itemIdentify the common text string you wish to remove, in this example, “Accepts Google Pay”

              • List itemClick the Column Options dropdown arrow and select Set regular expression. The Set regular expression dialog box appears.

              • List itemIn the Match box, enter “Accepts Google Pay\s+(.+)” to identify the Accepts Google Pay string and then capture the remainder of the string in the first capture capture group.

              • List itemIn the Replace box, enter $1 to print out the first capture group. The prefix disappears from the column data, leaving the remainder of the string intact.

              I hope this information will be usefull for you.
              Thank you

              1 Reply Last reply Reply Quote -2
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors