How to bookmark only the first occurrences of multiple search results?



  • Hello Viktória,

    First let me make clear that this script is not using regular expression at all.
    It just takes the keywords as strings and tries to find it in the text.

    Concerning the 2nd script, I guess, now I got it.:
    It should find each keyword, in a loop, until a sentences is returned which
    has not been returned yet.

    Which is this.

    # loop over the keywords
    for word in keywords:
        # and find each first position
        position = editor.findText(FINDOPTION.WHOLEWORD, 0, end_position, word)
        # if we found the position of the keyword
        while position is not None:
            # check if the line hasn't been added yet
            _new_line = editor.getLine(editor.lineFromPosition(position[0]))
            # get new line
            if _new_line not in new_file_content:
                # append it to the new file content
                new_file_content += _new_line
                break
            position = editor.findText(FINDOPTION.WHOLEWORD, position[1]+1, end_position, word)
    

    Sometimes I do not see the wood because of the trees. :-)

    Cheers
    Claudia



  • @Claudia-Frank

    Bingo!
    My default encoding is UTF-8 without BOM so I’m not sure what was the reason for this alteration but your guess was right, the keywords-file was in UTF-8-BOM.

    I converted it to UTF-8 without BOM and now it solved the issue, emptly line no longer needed for the process to working flawlessly, thank you!

    My explanation worked well as well because your modifed 2nd script does exactly what I was looking for, I’m obliged. (I’m happy anyway for that initial misunderstanding because this way I can have 5 scripts with different tasks.:-)


    Finally, to clear up something:

    We changed some lines due to this UTF-stuff like:
    console.write('word:{}\nlength:{}\n'.format(word.encode('utf-8'),len(word.encode('utf-8')))) and
    _keywords = [line.strip() for line in f if len(line.strip()) > 0]

    in the scriptbase.

    Now that we figured out the BOM-issue, can I return to the initial script and its variants or for safety’s sake should I rather keep the version with these modified lines? What do you suggest?



  • @Viktoria-Ontapado

    use the current (modified) version because

    the _keywords = [line.strip() …

    is needed, don’t change it.

    The two lines starting with console.write can be deleted if you wish but I would keep it
    and comment it, (right click on the line and use context menu to comment) because
    if one day something doesn’t work you just can uncomment it again and you do have
    some simple debugging, which could help to find out what the cause is.

    Glad to see, that it finally works and I hope you can benefit from it.

    Cheers
    Claudia



  • @Claudia-Frank

    All right, I follow your suggestions.

    Again, I’d like to say a big thanks to you from the bottom of my heart, you provided an invaluable help for me, I’m beyond grateful.

    Have a nice week,
    Viktória



  • Hi, @viktoria-ontapado,

    Sorry, but, the last two days, I was far away from my beloved laptop :-D ( Actually, it’s quite an antiquated machine !! )

    I’m pleased, Viktoria, that @claudia-frank succeeded to create your four customized Python scripts. She did great work, indeed :-))

    From your problem, it’s quite easy to understand that dealing with scripts is much more powerful than running a couple of regexes !

    However, just for fun, I tried to imagine how to solve your case #2, with regexes ! So, in a new tab, copy the 8 regexes, corresponding to the eight keywords, followed with your 20 sentences example :

    (?i-s)(?!.+#)(\bstecken\b)(?s)(.*)
    (?i-s)(?!.+#)(\bbesuchen\b)(?s)(.*)
    (?i-s)(?!.+#)(\bdie Antwort\b)(?s)(.*)
    (?i-s)(?!.+#)(\bfertig\b)(?s)(.*)
    (?i-s)(?!.+#)(\bzuletzt\b)(?s)(.*)
    (?i-s)(?!.+#)(\bdie Polizei\b)(?s)(.*)
    (?i-s)(?!.+#)(\bdas Glück\b)(?s)(.*)
    (?i-s)(?!.+#)(\bauch\b)(?s)(.*)
    ------------------------------------------------------------
    Sie stecken in Schwierigkeiten.
    Komm mich besuchen.
    Stecken Sie Ihre Waffe ins Halfter!
    Das Glück war das die Polizei die Antwort kannte.
    Die Antwort gefällt mir.
    Ich bin jetzt fertig.
    Er lachte zuletzt.
    Das Glück hat ihn verlassen, die Polizei verfolgt ihn.
    Wann hast du sie zuletzt gesehen?
    Ich liebe dich.
    Ich bin auch siebzehn.
    Ich bin auch achtzehn.
    Rufen Sie die Polizei!
    Ich bin auch zwanzig.
    Wann kann ich dich besuchen?
    Das Glück war ihm hold.
    Mach das fertig.
    Das Glück ist nicht so launenhaft.
    Steht mir dieses Kleid?
    Ich esse.
    
    • Select the first regex ( (?i-s)(?!.+#)(\bstecken\b)(?s)(.*) )

    • Open the Replace dialog ( Ctrl+ H )

    • Inside the replacement zone, type in the regex \1#\2

    • Click on the Replace All button

    => Only one replacement is performed : A # symbol is added, right after the word stecken

    • Now, select the second regex (?i-s)(?!.+#)(\bbesuchen\b)(?s)(.*)

    • To UPDATE the Replace dialog, which is unfocused, just use, again the Ctrl + H shortcut ! ( Nice trick, indeed ! )

    • Click, again, on the Replace All button

    => This time, a # symbol is added at the end of the word besuchen, in the second sentence

    • Go on, selecting the third regex (?i-s)(?!.+#)(\bdie Antwort\b)(?s)(.*)

    And so on …

    Once the 8 regexes executed, you should get that text, below :

    Sie stecken# in Schwierigkeiten.
    Komm mich besuchen#.
    Stecken Sie Ihre Waffe ins Halfter!
    Das Glück war das die Polizei die Antwort# kannte.
    Die Antwort gefällt mir.
    Ich bin jetzt fertig#.
    Er lachte zuletzt#.
    Das Glück hat ihn verlassen, die Polizei# verfolgt ihn.
    Wann hast du sie zuletzt gesehen?
    Ich liebe dich.
    Ich bin auch# siebzehn.
    Ich bin auch achtzehn.
    Rufen Sie die Polizei!
    Ich bin auch zwanzig.
    Wann kann ich dich besuchen?
    Das Glück# war ihm hold.
    Mach das fertig.
    Das Glück ist nicht so launenhaft.
    Steht mir dieses Kleid?
    Ich esse.
    

    => The keyword matched, on each line, is easily visible, thanks to the # symbol, added to that keyword !


    Finally, to delete all lines, which does NOT contain a # symbol, as well as the symbol, itself, perform the S/R :

    SEARCH ^[^#\r\n]+\R|#

    REPLACE Empty

    Sie stecken in Schwierigkeiten.
    Komm mich besuchen.
    Das Glück war das die Polizei die Antwort kannte.
    Ich bin jetzt fertig.
    Er lachte zuletzt.
    Das Glück hat ihn verlassen, die Polizei verfolgt ihn.
    Ich bin auch siebzehn.
    Das Glück war ihm hold.
    

    Et voilà !


    Notes :

    • The general regex is (?i-s)(?!.+#)(\bKeyWord\b)(?s)(.*)

    • As usual, the modifiers (?i-s) forces a search in an insensitive way and tell the regex engine that the dot matches a single standard character

    • The part (?!.+#) is a negative look-ahead, which means that an overall match implies that a # symbol cannot be found, further, on the current line

    • If so, the regex (\bKeyWord\b) looks for the exact word “KeyWord”, stored as group 1, due to the parentheses

    • Then, the modifier (?s) implies that, from now on, the dot matches any single character, even End of line characters

    • Finally, the part (.*) stores, as group 2, all the text, after the current keyword, till the very end of the file

    • In replacement, the current keyword \1 is rewritten, followed by a # character, followed, itself, by the remaining of text \2


    To end, @viktoria, I hope that you’re quite aware that the order of search of the different keywords may change the sentences found by the scripts !

    Indeed, if, for instance, the string das Glück is searched, BEFORE the string die Polizei, the sentence Rufen Sie die Polizei! will be found and NOT the sentence Das Glück war ihm hold. !! Yeah, not easy to get all pieces of information, in one go :-((

    Best Regards,

    guy038



  • @guy038

    Very impressive, guy038, thank you very much. Though I have my beautiful scripts now thanks to Claudia, I worked through your regex-based solution. Along with your notes, so much can be learnt, really.

    Take care,
    Viktória


Log in to reply