• Login
Community
  • Login

specific find/replace function for numeric wildcard plus specific character?

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
4 Posts 4 Posters 12.4k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    David Stambler
    last edited by Sep 11, 2017, 5:48 PM

    We have a specific find/replace that I can’t find the solution for. I’ve done a few searches but can’t find what we need, and perhaps Notepad++ has the solution.

    We have a very long 16000 row list of a book index that we are trying to simplify, and in it there are disease names followed by page numbers which have either f or t appended to the number, i.e.

    • granuloma 553 irritant/toxic causes 251t
    • eruption 551Ð552, 552f idiopathic facial
    • pathogenesis 545, 546f
    • 833 adhesion molecules 1684f, 1685t adhesion proteins

    We want to remove all numbers plus the letter t and f. I think we need to find a find/replace for a wildcard number string PLUS the letter t or f.

    The problem we have is that we can do a wildcard string of 3 characters + f but then if a word that has three letters followed by f, it will also be removed, which we don’t want.

    Is there a formula in Notepad++ where we can specify a numeric wildcard string plus a designated character for removal and/or replacement with a space?

    Please advise, and thank you!

    S 1 Reply Last reply Sep 11, 2017, 8:05 PM Reply Quote 0
    • S
      Scott Sumner @David Stambler
      last edited by Sep 11, 2017, 8:05 PM

      @David-Stambler

      The suggestion I will make is to look at regular expression search and replaces. You should think in terms of “regular expressions” rather than “wildcards”.

      Here’s a small example for you, you can remove “three digits plus an f” with the following:

      Find-what zone: \d{3}f
      Replace-with zone: make sure this zone is empty
      Search mode: Regular expression

      But really, I think your data conversion is relatively complex, based upon your sample data, so I encourage you to do some research in order to do the best conversion possible. Good luck.

      1 Reply Last reply Reply Quote 0
      • M
        Meta Chuh moderator
        last edited by Sep 12, 2017, 4:59 AM

        @David-Stambler

        open replace menu (ctrl+h)
        search mode: regular expression

        find text: \d{1,5}(f|t)
        replace text:
        (leave empty)
        

        this removes any number with 1, 2, 3, 4 or 5 digits if is directly followed by either f or t

        1 Reply Last reply Reply Quote 0
        • A
          AdrianHHH
          last edited by AdrianHHH Sep 12, 2017, 8:29 AM Sep 12, 2017, 8:27 AM

          This is easy in Notepad++ with a regular expression, but it is also easy to get it wrong and to wrongly delete or change things. So first, make sure you have a backup of the file, not just the backups that Notepad++ will keep, but make your own backup copy. Then it is easy to revert if you mess things up.

          Also note that “wildcards” and “regular expressions” are different. Notepad++ supports regular expressions, it does not do wildcards.

          All the numbers to be altered in the example text have 3 or 4 digits and are preceded by a space and then have the f or t. Then there is another space or a comma or the end of line. These strings can be removed by replacing " \d{3,4}[ft](,| |$)" with a single space. That leaves the text as:

          granuloma 553 irritant/toxic causes
          eruption 551Ð552, idiopathic facial
          pathogenesis 545,
          833 adhesion molecules adhesion proteins

          The question is not clear whether the remaining numbers above, i.e. those without a t or f, should be removed. Assuming they are to be removed then the above regular expression could be altered to be “(^| )\d{3,4}[ft]?(,| |$)”. Using it gives the result

          granuloma irritant/toxic causes
          eruption 551Ð552, idiopathic facial
          pathogenesis
          adhesion molecules adhesion proteins

          This leaves 3 issues that I can see.

          • The “551Ð552” . There should not be many numbers left. They can be searched for and removed manually.
          • Leading and trailing spaces on lines. Use menu => Edit => Blank operations => Trim leading and trailing spaces.
          • Embedded double spaces (came from the two replacements at “1684f, 1685t”. Just do a replace-all of two spaces with one space and repeat until no more changes found.

          Note that using simple regular expressions such as “\d{3,4}[ft]” is unwise for this task. It would change, for example, “ulna 123fibia radius 45678tibia pelvis” to be ulna ibia radius 4ibia pelvis". Thus removing wanted letters and leaving some digits in place.

          1 Reply Last reply Reply Quote 1
          3 out of 4
          • First post
            3/4
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors