Community
    • Login

    Regex -- how would I do this ...

    Scheduled Pinned Locked Moved General Discussion
    regex
    3 Posts 2 Posters 2.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Ian AlexI
      Ian Alex
      last edited by

      Looking at this example from an old dictionary text file,

      [babafa]
      {n.}
      scent organ of possum

      Is there a regex expression that can handle a variable length string (i.e. different words) inside the square brackets, recognize the presence of the left curly bracket on the next line, and place a # character to the left of the left-square-bracket, e.g.

      #[babafa]
      {n.}
      scent organ of possum

      What this is about: I have an unformatted text dump of a 1980s dictionary with thousands of these sort of entries that need a unique field code in font of each headword. The # character will do for now as the unique character.

      Unfortunately, the square brackets in the source file are used in multiple contexts, such as speech examples, because they delimit [any vernacular language expression].

      So what I am relying on here, to find and tag the headwords that begin a dictionary entry, is that after every headword, the next line has a {part of speech} encased by curly brackets.

      That’s why I’m hoping there is a way using regex to tag the beginning of the headword string with a ‘#’ character or some such, based on the presence of a ‘{’ left curly bracket at the beginning of the next line.

      1 Reply Last reply Reply Quote 0
      • Ian AlexI
        Ian Alex
        last edited by

        The following code works, but is there a better way to write it?

        Using Regular expression, to have Notebook++ replace all text strings of the type:
        [babafa]
        {n.}

        with

        #[babafa]
        {n.}

        by relying on the appearance of a left-curly-bracket on the line below the string enclosed in square brackets:

        Find what: ^[(.*)]\r\n+({)
        Replace with: #[$1]\r\n$2

        Scott SumnerS 1 Reply Last reply Reply Quote 0
        • Scott SumnerS
          Scott Sumner @Ian Alex
          last edited by

          @Ian-Alex ,

          I’m not sure how the find regex you specified worked for you; it did not work for me…I see some obvious problems with it. The big thing is that some of the symbols you are searching for (brackets and braces) have special meaning to the regex engine, and if you want to search for them literally, they have to be “escaped”, that is, preceded with a backslash.

          THIS MIGHT EXPLAIN IT: Perhaps when you posted here, you didn’t examine the preview window close enough; sometimes posting on this website gobbles up your intended characters–you have to use the escape/backslash technique here, too!

          Regardless, this simplified find and replace pair worked for me on your sample data:

          Find what: ^\[.+\]\R\{
          Replace with: #$0

          Some points to note:

          \R is a shorthand line-ending notation, and will match \r\n on Windows

          $0 in the replace is a shorthand notation for the entirety of what text matched in the find phase

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors