Community
    • Login

    Using RegEx to format text for import

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 4 Posters 877 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D Offline
      Dennis Andersen
      last edited by

      I have this text

      With lines like these

      820 Review Conputer Process Annually No 2016-06-02 2019-05-31
      1064 Review customer documentation Annually No 2016-07-25 2019-07-25
      1282 Monitor and Clean up the Mailbox Weekly No 2015-09-21 2020-05-12
      1644 Generic paragraph for monthly report Monthly Yes 2015-10-19 2017-04-13

      I want to keep the text how ever I need to set a delimiter between each field.
      I created a regex to Identify the fields. However I cannot seem to figure out how to use it to “replace” the correct spaces to ; as field seperator.

      Sorry for the portato code…
      (\d{3,4}).(?<=)(.) (?=(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))(.)(No|Yes).(\d{4}-\d{2}-\d{2}).(\d{4}-\d{2}-\d{2})

      I need to get the above text to look like this
      820;Review Conputer Process;Annually;No;2016-06-02;2019-05-31
      1064;Review customer documentation;Annually;No;2016-07-25;2019-07-25
      1282;Monitor and Clean up the Mailbox;Weekly;No;2015-09-21;2020-05-12
      1644;Generic paragraph for monthly report;Monthly;Yes;2015-10-19;2017-04-13

      the first segment can be either 3 og 4 digits. There is no lenght on the second, the third can be Weekly,Monthly,Semi-annually,Quarterly,Annually or Bi-monthly and the forth one can be either Yes or No and the lat two is just date.

      I have to do it one segment of a time, that I ok. Doesn’t need to be fancy.

      Hope you can help.

      EkopalypseE 1 Reply Last reply Reply Quote 2
      • EkopalypseE Offline
        Ekopalypse @Dennis Andersen
        last edited by

        @Dennis-Andersen

        from the given data I would think this should do it
        find what:^(\d{3,4})\h(.+?)\h(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly)\h(No|Yes)\h(\d{4}-\d{2}-\d{2})\h(\d{4}-\d{2}-\d{2})
        repalce with: $1;$2;$3;$4;$5;$6

        1 Reply Last reply Reply Quote 4
        • guy038G Offline
          guy038
          last edited by guy038

          Hello, @dennis-andersen, and All,

          Ah ! @ekopalypse beat me to it ;-))

          This other regex S/R should also work :

          SEARCH (?-si)^\d{3,4}\K\h+|\h+(?!.+(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))

          REPLACE ;

          Notes :

          • As usual, the in-line modifiers (?-si) ensure that :

            • Any dot regex symbol . represents a single standard character ( NOT like-breaks )

            • The search is processed in a non-insensitive way

          • Then the regex contains two alternatives, separated by the | alternation symbol. So the regex engine searches, from current location, for, either :

            • Some horizontal blank characters, after a 3 or 4-digits number, at beginning of lines and, due to the \K syntax, only matches these blank char(s)    ^\d{3,4}\K\h+

            • Some horizontal blank characters, ONLY IF NOT  followed by a non-null range of standard chars before the periodicity field    \h+(?!.+(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))

          • Due the \K syntax, you cannot use the step by step replacement with the Replace button. You must use the Replace All button, exclusively

          Best Regards,

          guy038

          BTW, @dennis-andersen, you very well described your goal and data which greatly helped us to figure out regex solutions ;-))

          1 Reply Last reply Reply Quote 5
          • D Offline
            Dennis Andersen
            last edited by

            @Ekopalypse
            @guy038

            You guys are the best!!!

            One day I hope to be half as smart as you!!

            I got help and I learned something. Best day of the year so far…

            Thanks and have a great day.
            Dennis

            1 Reply Last reply Reply Quote 3
            • Makwana PrahladM Offline
              Makwana Prahlad Banned
              last edited by

              Hello

              • List itemIdentify the common text string you wish to remove, in this example, “Accepts Google Pay”

              • List itemClick the Column Options dropdown arrow and select Set regular expression. The Set regular expression dialog box appears.

              • List itemIn the Match box, enter “Accepts Google Pay\s+(.+)” to identify the Accepts Google Pay string and then capture the remainder of the string in the first capture capture group.

              • List itemIn the Replace box, enter $1 to print out the first capture group. The prefix disappears from the column data, leaving the remainder of the string intact.

              I hope this information will be usefull for you.
              Thank you

              1 Reply Last reply Reply Quote -2

              Hello! It looks like you're interested in this conversation, but you don't have an account yet.

              Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

              With your input, this post could be even better 💗

              Register Login
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors