Community
    • Login
    1. Home
    2. Popular
    Log in to post
    • All Time
    • Day
    • Week
    • Month
    • All Topics
    • New Topics
    • Watched Topics
    • Unreplied Topics
    • All categories
    • CoisesC

      Search++: A work in progress

      Watching Ignoring Scheduled Pinned Locked Moved Notepad++ & Plugin Development
      18
      4 Votes
      18 Posts
      395 Views
      CoisesC

      @guy038 said in Search++: A work in progress:

      So, if I understand you clearly, we need to transform the selection(s) in Marked Text, first and then use the Find in Mark Text option

      Yes; or click the Tools button, open Settings and check Convert selections to marked text before beginning a stepwise search to have Search++ do it automatically. Otherwise, multiple searches that don’t affect the selection (like Count or Find All or Replace All) will work within the selection, but only the first stepwise Find (or the preliminary find in a stepwise Replace) will be constrained to the selection, since after that the original selection will be gone.

      When clicking on the Regex button, do we use your Unicode search engine, as in Columns++ or is it a mix of the Columns++ version and ICU

      It’s the Columns++ search engine, except for one thing. Previously I could not figure out how to incorporate ICU4C into the plugin, so for Columns++ I devised a Python program that reads several of the Unicode character data files and writes C++ code that compiles into a gigantic table containing the information I needed. I stumbled on the way to use ICU4C shortly before I began working on Search++; instead of building and using those tables, I go straight to ICU4C for information (questions like, “What is the general category of this character?” or ”Is this a lower case character?”).

      It might turn out that this will have an efficiency impact (better or worse? — I don’t know). It should fix some of the errors in Columns++, like [[:lower:]] missing characters that are lower case but not letters.

      Oddly, if we choose the ICU button, the Replace and Replace All buttons are not greyed and seem functional, contrary to what you said ?!

      They’re not disabled, but all they do is return the message, “Command not implemented.”

      Can you recommend a few websites, speaking about ICU and the Unicode Word Boundaries specificity ?

      I don’t really have anything except the Unicode documentation. In my brief testing, the practical effect in English is that words like can't are recognized as a single word. Most regular expression engines define a word boundary (\b) in terms of what is a word character (\w). The regular expression engine in ICU lets you do that, but it also provides an option to use Unicode word boundaries to define \b.

      Presently, when hitting the ICU button, do searches like \p{alphabetic} or \p[XID_Continue} are possible against my Total_Chars file of 325,590 characters ?

      Yes. You can even use things like \p{script=Greek}. Unfortunately, I haven’t been able to find any place where ICU documents its own regular expression syntax. The regular-expressions.info web site includes ICU among the regex dialects it shows.

    • JAKJ

      How to compare 2 text files and delete duplicates

      Watching Ignoring Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
      11
      0 Votes
      11 Posts
      538 Views
      guy038G

      Hi, @jak, @peterjones, @pnedev, @phil-pascal and All,

      Ah…, OK :! But it’s quite funny, @jak, because it’s exactly what I proposed in my last two posts !!!

      Giving the initial contents of your two files :

      The Original Music Collection which contains : Coldplay yellow Elton John Rocket man Beatles abbey road Beatles Love me do Beatles hey Jude Monkees daydream believer The New Music file which contains : Beatles abbey road Beatles Love me do Beatles hey Jude Monkees daydream believer The move blackberry way (1) By running a search/ replacement with regular expressions :

      I append, at the very end of the New music file, the contents of the Original Music Collection file, after a line of, at least, 3 equal signs, giving this temporary New music file :

      Beatles abbey road Beatles Love me do Beatles hey Jude Monkees daydream believer The move blackberry way ========== Coldplay yellow Elton John Rocket man Beatles abbey road Beatles Love me do Beatles hey Jude Monkees daydream believer

      Now :

      Switch to the New Music file

      Open the Replace dialog ( Ctrl + H )

      Uncheck all box options

      SEARCH (?x-is) ^ (.+\R) (?= (?s) .+? ^===+ .+ ^ \1) | (?s) ^ ===.+

      REPLACE Leave EMPTY

      Tick the Wrap around option

      Select the Regular expression search mode ( IMPORTANT )

      Click on the Replace All button

      => You should get your expected New Music file, below :

      The move blackberry way Save the modification of the New Music file (2) By using the ComparePlus plugin :

      First, use the Plugins > ComparePlus > Diff Visual Filters... option

      Check the Hide added/removed lines option and valid this choice with the OK button

      Then, use the specific option Plugins > ComparePlus > Find Unique lines

      Now, select the New Music file ( IMPORTANT )

      Run the Plugins > ComparePlus > Delete all/selected visible lines option

      Run the Plugins > ComparePlus > Clear Active Compare option

      Finally, save the modifications of the New Music file ( IMPORTANT )

      Best Regards,

      guy038