Find identical paragraphs

  • Hi everyone,

    I’m new to this community and I’m no developer at all.
    I use Notepad++ to edit song texts, it saves me time with stuff like delete all punctuation, add a character at the end of all lines, etc.

    But there is one feature I can’t seem to find :

    I need to highlight all the repeated paragraphs in a text, with a different style for every paragraph. For example :

    I am thinking of you
    In my sleepless solitude tonight
    If it’s wrong to love you
    Then my heart just won’t let me be right
    I’d give my all to have
    Just one more night with you
    I’d risk my life to feel
    Your body next to mine

    Baby can you feel me
    Imagining I’m looking in your eyes
    I can see you clearly
    Vividly emblazoned in my mind
    'Cause I can’t go on
    Living in the memory of our song
    I’d give my all for your love tonight

    And yet you’re so far
    Like a distant star
    I’m wishing on tonight
    I’d give my all to have
    Just one more night with you
    I’d risk my life to feel
    Your body next to mine

    'Cause I can’t go on
    Living in the memory of our song
    I’d give my all for your love tonight

    Here you can see I found all the paragraphs that are repeated and applied a different style for each one.
    BUT additionally, and this is the point that’s not easy to explain :
    I need it to be done automatically : not have to select paragraph 1, search for its repetitions, highlight it, then do the same thing with paragraph 2, etc…
    I hope I’m explaining clearly…

    It would help me find all the parts of a song that are identical (choruses, bridges…) at once.

  • Your explanation is very clear (thus far).
    Thank you for that.

    One problem is, there doesn’t seem to be any real delimiter between “paragraphs”. Everything runs together. This is going to cause trouble for what you want to do.

    Second problem: You are limited to 5 or maybe six different styles. Also, styles aren’t permanent, so what do you intend to do with this data that is styled?

    So maybe a bit more about your intentions are, to get full advice on how to proceed… (sorry, but know this may invoke MORE problems…)

  • @Caro-Chennouf ,

    Notepad++ is a text editor, not a word processor. It cannot store information like italics or bold in the file itself. So if you saved the text file and someone else opened it on another machine or in another editor, it would just be plain text again. In fact, if you did the highlighting in Notepad++ then exited out, when you reloaded the highlighting would be gone again.

    With some regex searching, you might be able to handle a semi-automated process… ahh, @Alan-Kilborn just posted, and started asking the questions that would need to be answered in order to proceed.

    (I tried coming up with a regex that would match one or more lines in a row that have an exact match later on, but with the “or more”, I couldn’t get it to work right on my first attempt; as I have time, I might think about it more… Or a better regex guru than I might be able to beat me to it)

  • @Caro-Chennouf ,

    I figured out why my regex wasn’t working, so was able to get it to properly match an N-line paragraph that has a repeat later.

    1. if you’re in v8.x, go to Settings > Preferences > Highlighting
      • Go to the Mark All section and uncheck Match whole word only
    2. Go to the first line (Ctrl+Home)
    3. FIND the first instance of each paragraph
      • Search > Find
      • FIND WHAT = (?-s)((^.+?(\R|\Z))+)(?=(?s:.*)\1)
      • Search Mode = ☑ Regular Expression
      • FIND NEXT
        => this highlights the first paragraph that is repeated somewhere else
    4. Use Search > Mark All > Using #th Style (or right click context menu > Style All Occurrences of Token)
      => all of the instances of that first paragraph should be marked with that style number
    5. Use Search > Next or F3 from the editor window, or the FIND NEXT button in the FIND dialog, to select the next “first instance of a paragraph”, and style using a different #th style. Repeat as necessary.

    So it’s not 100% automated, but it’s better than manually having to find each paragraph (chorus or bridge) And remember, this will not be saved in the file; the next time you open it, you’ll have to do it again.

Log in to reply