Using RegEx to format text for import

Dennis Andersen

I have this text

With lines like these

820 Review Conputer Process Annually No 2016-06-02 2019-05-31
1064 Review customer documentation Annually No 2016-07-25 2019-07-25
1282 Monitor and Clean up the Mailbox Weekly No 2015-09-21 2020-05-12
1644 Generic paragraph for monthly report Monthly Yes 2015-10-19 2017-04-13

I want to keep the text how ever I need to set a delimiter between each field.
I created a regex to Identify the fields. However I cannot seem to figure out how to use it to “replace” the correct spaces to ; as field seperator.

Sorry for the portato code…
(\d{3,4}).(?<=)(.) (?=(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))(.)(No|Yes).(\d{4}-\d{2}-\d{2}).(\d{4}-\d{2}-\d{2})

I need to get the above text to look like this
820;Review Conputer Process;Annually;No;2016-06-02;2019-05-31
1064;Review customer documentation;Annually;No;2016-07-25;2019-07-25
1282;Monitor and Clean up the Mailbox;Weekly;No;2015-09-21;2020-05-12
1644;Generic paragraph for monthly report;Monthly;Yes;2015-10-19;2017-04-13

the first segment can be either 3 og 4 digits. There is no lenght on the second, the third can be Weekly,Monthly,Semi-annually,Quarterly,Annually or Bi-monthly and the forth one can be either Yes or No and the lat two is just date.

I have to do it one segment of a time, that I ok. Doesn’t need to be fancy.

Hope you can help.

Ekopalypse

@Dennis-Andersen

from the given data I would think this should do it
find what:^(\d{3,4})\h(.+?)\h(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly)\h(No|Yes)\h(\d{4}-\d{2}-\d{2})\h(\d{4}-\d{2}-\d{2})
repalce with: $1;$2;$3;$4;$5;$6

guy038

Hello, @dennis-andersen, and All,

Ah ! @ekopalypse beat me to it ;-))

This other regex S/R should also work :

SEARCH (?-si)^\d{3,4}\K\h+|\h+(?!.+(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))

REPLACE ;

Notes :

As usual, the in-line modifiers (?-si) ensure that :
- Any dot regex symbol . represents a single standard character ( NOT like-breaks )
- The search is processed in a non-insensitive way
Then the regex contains two alternatives, separated by the | alternation symbol. So the regex engine searches, from current location, for, either :
- Some horizontal blank characters, after a 3 or 4-digits number, at beginning of lines and, due to the \K syntax, only matches these blank char(s) ^\d{3,4}\K\h+
- Some horizontal blank characters, ONLY IF NOT followed by a non-null range of standard chars before the periodicity field \h+(?!.+(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))
Due the \K syntax, you cannot use the step by step replacement with the Replace button. You must use the Replace All button, exclusively

Best Regards,

guy038

BTW, @dennis-andersen, you very well described your goal and data which greatly helped us to figure out regex solutions ;-))

Dennis Andersen

@Ekopalypse
@guy038

You guys are the best!!!

One day I hope to be half as smart as you!!

I got help and I learned something. Best day of the year so far…

Thanks and have a great day.
Dennis

Makwana Prahlad

Hello

List itemIdentify the common text string you wish to remove, in this example, “Accepts Google Pay”
List itemClick the Column Options dropdown arrow and select Set regular expression. The Set regular expression dialog box appears.
List itemIn the Match box, enter “Accepts Google Pay\s+(.+)” to identify the Accepts Google Pay string and then capture the remainder of the string in the first capture capture group.
List itemIn the Replace box, enter $1 to print out the first capture group. The prefix disappears from the column data, leaving the remainder of the string intact.

I hope this information will be usefull for you.
Thank you