• Login
Community
  • Login

I would like to group all similar domains, not by alphabet.

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
37 Posts 4 Posters 7.9k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G
    guy038
    last edited by guy038 Oct 7, 2023, 6:53 PM Oct 7, 2023, 6:37 PM

    Hi, @mohammad-al-thobiti, @peterjones, @coises and All,

    Ah, ah ah… I’m very happy to announce that I’ve found out a general regex which can handle any number of sections ;-))


    So, let’s begin with this simple INPUT text containing from 2 to 13 sections ( one of each ), pasted in a new tab :

    abc.def
    abc.def.ghi
    abc.def.ghi.jkl
    abc.def.ghi.jkl.mno
    abc.def.ghi.jkl.mno.pqr
    abc.def.ghi.jkl.mno.pqr.stu
    abc.def.ghi.jkl.mno.pqr.stu.vwx
    abc.def.ghi.jkl.mno.pqr.stu.vwx.yz0
    abc.def.ghi.jkl.mno.pqr.stu.vwx.yz0.123
    abc.def.ghi.jkl.mno.pqr.stu.vwx.yz0.123.456
    abc.def.ghi.jkl.mno.pqr.stu.vwx.yz0.123.456.789
    abc.def.ghi.jkl.mno.pqr.stu.vwx.yz0.123.456.789.€±¶
    abc.def.ghi.jkl.mno.pqr.stu.vwx.yz0.123.456.789.€±¶.Ø÷ß
    
    • Move to the very beginning of the file ( Ctrl + Home )

    First, we add the |. string at the beginning of every line, with the following regex S/R :

    • SEARCH (?x-s) ^ (?= . )

    • REPLACE |.

    Thus, we get :

    |.abc.def
    |.abc.def.ghi
    |.abc.def.ghi.jkl
    |.abc.def.ghi.jkl.mno
    |.abc.def.ghi.jkl.mno.pqr
    |.abc.def.ghi.jkl.mno.pqr.stu
    |.abc.def.ghi.jkl.mno.pqr.stu.vwx
    |.abc.def.ghi.jkl.mno.pqr.stu.vwx.yz0
    |.abc.def.ghi.jkl.mno.pqr.stu.vwx.yz0.123
    |.abc.def.ghi.jkl.mno.pqr.stu.vwx.yz0.123.456
    |.abc.def.ghi.jkl.mno.pqr.stu.vwx.yz0.123.456.789
    |.abc.def.ghi.jkl.mno.pqr.stu.vwx.yz0.123.456.789.€±¶
    |.abc.def.ghi.jkl.mno.pqr.stu.vwx.yz0.123.456.789.€±¶.Ø÷ß
    
    • Move to the very beginning of the file ( Ctrl + Home )

    Now, this is the main regex S/R :

    • SEARCH (?x-s) ^ ( .* \| ) ( (?: \. (?: (?! \| ) \S )+ )+ ) ( \. (?: (?! \. ) \S )+ )

    • REPLACE \1\3|\2

    • Click 14th times on the Replace All button, till you get the message Replace All: 0 occurrences were replaced from caret to end-of-file

    => You should get the temporary text :

    |.def|.abc
    |.ghi|.def|.abc
    |.jkl|.ghi|.def|.abc
    |.mno|.jkl|.ghi|.def|.abc
    |.pqr|.mno|.jkl|.ghi|.def|.abc
    |.stu|.pqr|.mno|.jkl|.ghi|.def|.abc
    |.vwx|.stu|.pqr|.mno|.jkl|.ghi|.def|.abc
    |.yz0|.vwx|.stu|.pqr|.mno|.jkl|.ghi|.def|.abc
    |.123|.yz0|.vwx|.stu|.pqr|.mno|.jkl|.ghi|.def|.abc
    |.456|.123|.yz0|.vwx|.stu|.pqr|.mno|.jkl|.ghi|.def|.abc
    |.789|.456|.123|.yz0|.vwx|.stu|.pqr|.mno|.jkl|.ghi|.def|.abc
    |.€±¶|.789|.456|.123|.yz0|.vwx|.stu|.pqr|.mno|.jkl|.ghi|.def|.abc
    |.Ø÷ß|.€±¶|.789|.456|.123|.yz0|.vwx|.stu|.pqr|.mno|.jkl|.ghi|.def|.abc
    
    • Move to the very beginning of the file ( Ctrl + Home )

    Now, we just to get rid of all the | chars as well as the FIRST dot of each line, with the regex S/R :

    SEARCH (?x) ^ \| \. | \|

    REPLACE Leave EMPTY

    And we get our expected OUTPUT text :

    def.abc
    ghi.def.abc
    jkl.ghi.def.abc
    mno.jkl.ghi.def.abc
    pqr.mno.jkl.ghi.def.abc
    stu.pqr.mno.jkl.ghi.def.abc
    vwx.stu.pqr.mno.jkl.ghi.def.abc
    yz0.vwx.stu.pqr.mno.jkl.ghi.def.abc
    123.yz0.vwx.stu.pqr.mno.jkl.ghi.def.abc
    456.123.yz0.vwx.stu.pqr.mno.jkl.ghi.def.abc
    789.456.123.yz0.vwx.stu.pqr.mno.jkl.ghi.def.abc
    €±¶.789.456.123.yz0.vwx.stu.pqr.mno.jkl.ghi.def.abc
    Ø÷ß.€±¶.789.456.123.yz0.vwx.stu.pqr.mno.jkl.ghi.def.abc
    

    So, @mohammad-al-thobiti :

    • Apply all the above steps against your real file

    • Run the Edit > Line Operations > Sort Lines Lexicographically Ascending option ( No [rectangular] selection is needed as you just keep the addresses )

    • Possibly, add back the 127.0.0.1 IPV4 address, followed with two space chars with the regex S/R :

      • SEARCH (?x-s) ^ (?= . )

      • REPLACE (127.0.0.1 )

    Best Regards,

    guy038

    P.S. :

    I’ve just seen the @coises’s solution. I’m going to have a look at its solution which could be more simple than my regex solution !

    1 Reply Last reply Reply Quote 1
    • G
      guy038
      last edited by guy038 Oct 8, 2023, 9:20 AM Oct 7, 2023, 10:30 PM

      Hi, @mohammad-al-thobiti, @peterjones, @coises and All,

      @mohammad-al-thobiti, you should use the @coises’s approach, which works much better than mine !!

      In addition, in my previous post, I omitted to add the reverse regex to use, once your sort would be done :-((

      While the @coise’s method avoids any additional S/R and correctly mentions the reverse regex to run after the sort operation

      Note that I slightly Modified the first two @coises’s regexes to get more rigorous ones ( See, at the end of this post the reason for these changes )


      Thus, I would propose this road map :

      • First, from the @coises’s post, I would use this alternate regex formulation :

        • SEARCH (?x) ( [^.\r\n]+ ) \. ( [^!\r\n]+ )

        • REPLACE \2!\1

        • Click 14th times on the Replace All button

      • Run the Edit > Line Operations > Sort Lines Lexicographically Ascending option ( No [rectangular] selection is needed as you just keep the addresses )

      • Thirdly, once the sort done, from the @coises’s post, use the alternate reverse regex S/R :

        • SEARCH (?x) ( [^!\r\n]+ ) ! ( [^.\r\n]+ )

        • REPLACE \2.\1

        • Click 14th times on the Replace All button

      => You should get all your addresses back, in the right order

      Best Regards,

      guy038

      P.S. :

      @coises uses the \s class of characters, which is equivalent to any of the 25 characters, below, with the regex :

      (?x) \t | \n | \x{000B} | \x{000C} | \r | \x{0020} | \x{0085} | \x{00A0} | x{1680} | [\x{2000}-\x{200B}] |\x{2028} | \x{2029} | \x{202F} | \x{3000}

      In the highly unlikely event that one of these characters is included in some addresses, I preferred to use the [\r\n] regex, which ONLY avoids these 2 EOL chars in addresses, instead of using the \s regex !

      1 Reply Last reply Reply Quote 2
      • M
        Mohammad Al Thobiti
        last edited by Oct 8, 2023, 6:24 AM

        Thank you for your efforts, my friends.
        I would like to tell you that the result is excellent.
        H5.png
        But an idea came to me: why not just delete the subdomain and keep only the main domain and then delete the similar ones?
        Is there a way to delete long link extensions? And keep the main domain?

        Example:

        M 1 Reply Last reply Oct 8, 2023, 6:25 AM Reply Quote 0
        • M
          Mohammad Al Thobiti @Mohammad Al Thobiti
          last edited by Oct 8, 2023, 6:25 AM

          4c731646-06c6-46ae-9da6-090c683c1e75-image.png

          1 Reply Last reply Reply Quote 0
          • G
            guy038
            last edited by guy038 Oct 8, 2023, 9:13 AM Oct 8, 2023, 9:05 AM

            Hi, @mohammad-al-thobiti and All,

            May I rephrase your question ? Let’s see if we mean the same goal !

            So, for example, from the INPUT text, below :

            abc.def.ghi.jkl.example.com
            all.net
            abc.def.example.com
            abc.my_site.com
            abc.def.ghi.all.net
            my_site.com
            abc.def.all.net
            abc.def.ghi.jkl.mno.opq.my_site.com
            example.com
            

            With the following regex S/R :

            SEARCH (?x) ^ (?: [\w-]+ \. )* ( [\w-]+ \. [\w-]+ ) $

            REPLACE \1

            We would get that text :

            example.com
            all.net
            example.com
            my_site.com
            all.net
            my_site.com
            all.net
            my_site.com
            example.com
            

            Then, using the Edit > Line Operations > Remove Duplicates Lines option, we would end up with this OUTPUT :

            example.com
            all.net
            my_site.com
            

            If this is exactly what you expect to, just go ahead !

            BR

            guy038

            1 Reply Last reply Reply Quote 1
            • M
              Mohammad Al Thobiti
              last edited by Oct 8, 2023, 3:36 PM

              I would like to thank you for your useful information and those who contributed to this topic.

              This article has become my reference. It works nicely. Yes, we mean the same goal !
              e1cc5900-4c79-4442-b4b0-f6e05814193c-image.png
              Right now. If you can sort or collect them from the most similar domains, please let me know.

              There are too many lines, starting with the most similar domain.

              Reason: To find the most “worried” domains because it takes many lines. I will delete them later, as you told us. However, the goal is to discover more domains with many characters or long URLs and I will block them in other programs.

              How can this be done?

              C 1 Reply Last reply Oct 8, 2023, 5:02 PM Reply Quote 0
              • C
                Coises @Mohammad Al Thobiti
                last edited by Coises Oct 8, 2023, 5:03 PM Oct 8, 2023, 5:02 PM

                @Mohammad-Al-Thobiti said in I would like to group all similar domains, not by alphabet.:

                If you can sort or collect them from the most similar domains, please let me know.

                Here’s a way; enter:

                Find what : ^(!*([^\r\n]+))\R(!*)\2$
                Replace with : !\3\1

                and Replace All repeatedly until there are no more changes.

                Each line in the result will have an exclamation point at the beginning for each additional occurrence of the following text; so:

                argh.com
                argh.com
                asdf.net
                asdf.net
                asdf.net
                asdf.net
                ef.org
                ef.org
                ef.org
                ef.org
                ef.org
                fasde.com
                fasde.com
                fasde.com
                fasde.com
                gorch.net
                gorch.net
                gorch.net
                

                would become:

                !argh.com
                !!!asdf.net
                !!!!ef.org
                !!!fasde.com
                !!gorch.net
                

                You can then sort that to put them in order by the number of exclamation points at the beginning.

                M 1 Reply Last reply Oct 8, 2023, 8:27 PM Reply Quote 1
                • M
                  Mohammad Al Thobiti @Coises
                  last edited by Oct 8, 2023, 8:27 PM

                  Thank you. I will clarify the issue.
                  H7.png

                  C 1 Reply Last reply Oct 8, 2023, 9:28 PM Reply Quote 0
                  • C
                    Coises @Mohammad Al Thobiti
                    last edited by Coises Oct 8, 2023, 9:29 PM Oct 8, 2023, 9:28 PM

                    @Mohammad-Al-Thobiti said in I would like to group all similar domains, not by alphabet.:

                    Thank you. I will clarify the issue.

                    Using the method I suggested, if you start with:

                    zindova.net
                    zindova.net
                    zinfandelreviews.net
                    zinfandelreviews.net
                    zingardi.net
                    zingcoach.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    zinoiosijek031.net
                    ziobaweek.net
                    ziomik.net
                    zioninfosystems.net
                    zioninfosystems.net
                    zip-archive.net
                    

                    and do the repeated Replace Alls, then sort, you’ll get:

                    !!!!!!!!!!!!!!!!!!!!!!!!zinoiosijek031.net
                    !zindova.net
                    !zinfandelreviews.net
                    !zioninfosystems.net
                    zingardi.net
                    zingcoach.net
                    ziobaweek.net
                    ziomik.net
                    zip-archive.net
                    

                    The first line represents 25 occurrences — there are 24 leading exclamation points. Each of the next three lines represent two occurrences (one leading exclamation point). The remaining lines occurred only once.

                    Is that not what you needed to accomplish?

                    1 Reply Last reply Reply Quote 2
                    • G
                      guy038
                      last edited by guy038 Oct 8, 2023, 11:05 PM Oct 8, 2023, 11:01 PM

                      Hello, @mohammad-al-thobiti, @peterjones, @coises and All,

                      Oh, @coises, your method of finding out how many times each address occurs, is very elegant and really clever ! I’d never have thought of such sophistication on my own :-((


                      So, @mohammad-al-thobiti, starting with this INPUT file :

                      zindova.net
                      zindova.net
                      zinfandelreviews.net
                      zinfandelreviews.net
                      zingardi.net
                      zingcoach.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      zinoiosijek031.net
                      ziobaweek.net
                      ziomik.net
                      zioninfosystems.net
                      zioninfosystems.net
                      zip-archive.net
                      zip-archive.net
                      zip.net
                      zip.net
                      zip.net
                      zipaphoto.net
                      zipbah.net
                      zipexpose.net
                      ziph.next
                      ziph.next
                      

                      I’m using the @coises’s regex, that I slightly modified :

                      SEARCH (?x-s) ^ ( !* ( .+ ) ) \R ( !* ) \2 $

                      REPLACE !\3\1

                      • Run it several times, in order tp get the message Replace All: 0 occurrences were replaced

                      => You should get this text :

                      !zindova.net
                      !zinfandelreviews.net
                      zingardi.net
                      zingcoach.net
                      !!!!!!!!!!!!!!!!!!!!!!!!zinoiosijek031.net
                      ziobaweek.net
                      ziomik.net
                      !zioninfosystems.net
                      !zip-archive.net
                      !!zip.net
                      zipaphoto.net
                      zipbah.net
                      zipexpose.net
                      !ziph.next
                      

                      Now, we just add one ! character to get the exact number of occurrences of each address

                      SEARCH (?x-s) ^ (?= . )

                      REPLACE !

                      !!zindova.net
                      !!zinfandelreviews.net
                      !zingardi.net
                      !zingcoach.net
                      !!!!!!!!!!!!!!!!!!!!!!!!!zinoiosijek031.net
                      !ziobaweek.net
                      !ziomik.net
                      !!zioninfosystems.net
                      !!zip-archive.net
                      !!!zip.net
                      !zipaphoto.net
                      !zipbah.net
                      !zipexpose.net
                      !!ziph.next
                      
                      • Then, run the Edit > Line Operations > Sort Lines Lexicographically Ascending

                      => We get this sorted text :

                      !!!!!!!!!!!!!!!!!!!!!!!!!zinoiosijek031.net
                      !!!zip.net
                      !!zindova.net
                      !!zinfandelreviews.net
                      !!zioninfosystems.net
                      !!zip-archive.net
                      !!ziph.next
                      !zingardi.net
                      !zingcoach.net
                      !ziobaweek.net
                      !ziomik.net
                      !zipaphoto.net
                      !zipbah.net
                      !zipexpose.net
                      

                      Finally, runing the four regex S/R, below :

                      SEARCH  (?x-s) ^ !                         (?= [^!\r\n]+ )        REPLACE   1\t
                      SEARCH  (?x-s) ^ !!                        (?= [^!\r\n]+ )        REPLACE   2\t
                      SEARCH  (?x-s) ^ !!!                       (?= [^!\r\n]+ )        REPLACE   3\t
                      SEARCH  (?x-s) ^ !!!!!!!!!!!!!!!!!!!!!!!!! (?= [^!\r\n]+ )        REPLACE  25\t
                      

                      => You should end up with this OUTPUT text :

                      25	zinoiosijek031.net
                      3	zip.net
                      2	zindova.net
                      2	zinfandelreviews.net
                      2	zioninfosystems.net
                      2	zip-archive.net
                      2	ziph.next
                      1	zingardi.net
                      1	zingcoach.net
                      1	ziobaweek.net
                      1	ziomik.net
                      1	zipaphoto.net
                      1	zipbah.net
                      1	zipexpose.net
                      

                      Which should be a practical document to exploit !

                      Best Regards,

                      guy038

                      M C 2 Replies Last reply Oct 9, 2023, 5:49 AM Reply Quote 1
                      • M
                        Mohammad Al Thobiti @guy038
                        last edited by Oct 9, 2023, 5:49 AM

                        @ guy038 and All,

                        Thank you my friends for the amazing results.

                        We have received the correct information. The most commonly used domains cleverly.

                        In any case, as you can see in the picture, the outcome is wonderful.

                        H8.png

                        How do I write the code like this that you mentioned:

                        SEARCH  (?x-s) ^ !                         (?= [^!\r\n]+ )        REPLACE   1\t
                        SEARCH  (?x-s) ^ !!                        (?= [^!\r\n]+ )        REPLACE   2\t
                        SEARCH  (?x-s) ^ !!!                       (?= [^!\r\n]+ )        REPLACE   3\t
                        SEARCH  (?x-s) ^ !!!!!!!!!!!!!!!!!!!!!!!!! (?= [^!\r\n]+ )        REPLACE  25\t
                        

                        Did I write until I reached Col: 6,597? I think it is difficult and takes a lot of time. And I want to thank all of you for all your efforts on this fascinating topic.

                        1 Reply Last reply Reply Quote 0
                        • C
                          Coises @guy038
                          last edited by Oct 9, 2023, 6:16 AM

                          @guy038 OK, I’ll take the bait… If you really want to count exclamation points:

                          Add the missing exclamation point, but also add a separator:

                          Find what : ^(!*)
                          Replace with : $1!/\t

                          Now, group by tens:

                          Find what : (!{10})+
                          Replace with : $0/

                          followed by:

                          Find what : !{10}
                          Replace with : !

                          Repeat the above two steps until the first step finds nothing.

                          Now, count the exclamation points in each digit and remove the forward slashes:

                          Find what : (?:(!{9})|(!{8})|(!{7})|(!{6})|(!{5})|(!{4})|(!{3})|(!{2})|(!{1})|())/
                          Replace with : (?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)(?{10}0)

                          This assumes there are no exclamation points or forward slashes elsewhere in the text. Of course, the forward slash (/) can be replaced with any character that is not in use.

                          1 Reply Last reply Reply Quote 3
                          • G
                            guy038
                            last edited by guy038 Oct 9, 2023, 9:46 PM Oct 9, 2023, 9:41 PM

                            Hi, @mohammad-al-thobiti and All,

                            Let’s recapitulate from the very beginning !


                            If we start with this kind of INPUT text :

                            127.0.0.1   a.z.xy.dummy-hyphen.org
                            127.0.0.1   a.example.com
                            127.0.0.1   cdef.x.example.com
                            127.0.0.1   my_site.net
                            127.0.0.1   b.dummy-hyphen.org
                            127.0.0.1   b.cde.fgh.example.com
                            127.0.0.1   abc.defgji.kkkkk.my_site.net
                            127.0.0.1   cd.xyztuv.ab-cd.4567.example.com
                            127.0.0.1   dummy-hyphen.org
                            127.0.0.1   example.com
                            

                            With the following regex S/R :

                            SEARCH (?x) ^ \Q127.0.0.1 \E \h+ (?: [\w-]+ \. )* ( [\w-]+ \. [\w-]+ ) $

                            REPLACE \1

                            => We just keep the main domain ;

                            dummy-hyphen.org
                            example.com
                            example.com
                            my_site.net
                            dummy-hyphen.org
                            example.com
                            my_site.net
                            example.com
                            dummy-hyphen.org
                            example.com
                            

                            Of course, your present file deals with about 181,170 lines !

                            So, instead of using the last @coises’s method to find out the different occurrences of each line ( again, a very clever method ! ), I will simplify the goal by using a Python script to get the job done more quickly !

                            This script is an adaptation from a @alan-kilborn’s script. I named this script Count_Strings_Occurences.py

                            # -*- coding: utf-8 -*-
                            
                            '''
                            
                            Adapted from :  https://community.notepad-plus-plus.org/topic/20598/show-a-list-of-same-word-before-replacement/2  and  .../20
                            
                            
                            By DEFAULT, this script PASTES, in a NEW tab,   a SORTED list of ALL the STRINGS of the CURRENT file, with their NUMBER of occurrences
                            
                            IF a NORMAL selection EXISTS, the script PASTES a SORTED list of ALL the STRINGS of the SELECTION,    with their NUMBER of occurrences
                            
                            
                            NOTES : 
                            
                            - The CURRENT file processed DO NOT need to be SORTED, in any way !
                            
                            - If you want a SORTED list of ALL the LINES  with their NUMBER of occurrences, don't FORGET to INCLUDE all the POSSIBLE chars of the lines in the REGEX !
                            
                                For example, if file may contain the line 'zip-archive.net', the REGEX, after editor.research, should be  r'[\w.-]+', which includes the DOT and the DASH !
                            
                            '''
                            
                            from Npp import editor
                            
                            sel_start = 0
                            sel_end = editor.getLength()
                            # Refer to :  https://community.notepad-plus-plus.org/topic/22378/pythonscript-ops-on-selection-if-any-all-text-otherwise/3
                            
                            sel_start, sel_end = editor.getUserCharSelection()
                            
                            word_matches = []
                            def match_found(m): word_matches.append(editor.getTextRange(m.span(0)[0], m.span(0)[1]))
                            
                            editor.research(r'[\w.-]+', match_found, 0 , sel_start, sel_end)
                            
                            histogram_dict = {}
                            
                            for word in word_matches:
                                if word not in histogram_dict:
                                    histogram_dict[word] = 1
                                else:
                                    histogram_dict[word] += 1
                            
                            output_list = []
                            
                            for k in histogram_dict: output_list.append('{0:.<50} {1}'.format(k, histogram_dict[k]))
                            
                            #for k in histogram_dict: output_list.append('{}={}'.format(k, histogram_dict[k]))   # INITIAl format of Alan Kilborn
                            
                            # For SPECIFICATIONS on the OUTPUT format, refer to :
                            
                            # https://doc.python.org/2.7/library/string.html#format-specification-mini-language
                            # https://doc.python.org/2.7/library/string.html#format-examples
                            
                            output_list.sort()
                            editor.copyText('\r\n'.join(output_list))
                            
                            notepad.new()
                            editor.paste()
                            
                            # console.clear() ; editor.research (r'\w+', lambda m: console.write (m.group(0) + '\n'))
                            

                            • So, select the random list of 168 lines, below.

                            Note that I suppose that the IPV4 addresses and the sub-domains were previously deleted

                            zioninfosystems.net
                            zingcoach.net
                            ziph.net
                            zinoiosijek031.net
                            zindova.net
                            zip.net
                            zinoiosijek031.net
                            zip-archive.net
                            zinfandelreviews.net
                            zindova.net
                            zip.net
                            zinoiosijek031.net
                            zioninfosystems.net
                            zip-archive.net
                            ziomik.net
                            zioninfosystems.net
                            zip.net
                            zindova.net
                            zindova.net
                            ziph.net
                            ziph.net
                            zinfandelreviews.net
                            zinoiosijek031.net
                            zindova.net
                            zioninfosystems.net
                            zindova.net
                            zip.net
                            zindova.net
                            ziph.net
                            zinfandelreviews.net
                            zinoiosijek031.net
                            ziph.net
                            zinfandelreviews.net
                            zinoiosijek031.net
                            zinfandelreviews.net
                            zinfandelreviews.net
                            ziobaweek.net
                            zinoiosijek031.net
                            zinfandelreviews.net
                            zindova.net
                            zindova.net
                            zinoiosijek031.net
                            zinoiosijek031.net
                            zipaphoto.net
                            zinfandelreviews.net
                            zinfandelreviews.net
                            zingardi.net
                            zip.net
                            zipexpose.net
                            zindova.net
                            zip-archive.net
                            zip-archive.net
                            zindova.net
                            zioninfosystems.net
                            zipexpose.net
                            zipaphoto.net
                            ziph.net
                            zipbah.net
                            zinoiosijek031.net
                            zinfandelreviews.net
                            zip.net
                            zindova.net
                            zip.net
                            zindova.net
                            zingcoach.net
                            zinoiosijek031.net
                            zip.net
                            ziomik.net
                            zindova.net
                            zinoiosijek031.net
                            zioninfosystems.net
                            ziph.net
                            zioninfosystems.net
                            zinfandelreviews.net
                            zingardi.net
                            zinoiosijek031.net
                            zingardi.net
                            zingardi.net
                            ziph.net
                            zingardi.net
                            zinoiosijek031.net
                            zinoiosijek031.net
                            zingcoach.net
                            zindova.net
                            zip.net
                            zindova.net
                            zip-archive.net
                            ziph.net
                            ziobaweek.net
                            zinfandelreviews.net
                            zip.net
                            zinoiosijek031.net
                            zip.net
                            ziomik.net
                            zingardi.net
                            zindova.net
                            zinfandelreviews.net
                            ziph.net
                            ziobaweek.net
                            zinoiosijek031.net
                            zindova.net
                            zinfandelreviews.net
                            zip.net
                            zingcoach.net
                            zip-archive.net
                            zip-archive.net
                            zindova.net
                            zinfandelreviews.net
                            zingardi.net
                            zioninfosystems.net
                            zinoiosijek031.net
                            ziph.net
                            zioninfosystems.net
                            ziobaweek.net
                            zingcoach.net
                            ziph.net
                            zinoiosijek031.net
                            ziobaweek.net
                            zinfandelreviews.net
                            zip.net
                            zinoiosijek031.net
                            ziph.net
                            zinfandelreviews.net
                            zindova.net
                            zindova.net
                            zindova.net
                            zip-archive.net
                            zip.net
                            ziph.net
                            zindova.net
                            zioninfosystems.net
                            zinoiosijek031.net
                            zinoiosijek031.net
                            ziph.net
                            zinfandelreviews.net
                            zip.net
                            zingcoach.net
                            zinfandelreviews.net
                            zinoiosijek031.net
                            zingardi.net
                            zip-archive.net
                            zip.net
                            zinfandelreviews.net
                            zinoiosijek031.net
                            zindova.net
                            zinfandelreviews.net
                            zip.net
                            zioninfosystems.net
                            zingardi.net
                            zioninfosystems.net
                            zip-archive.net
                            zingcoach.net
                            zinoiosijek031.net
                            ziomik.net
                            zip.net
                            zingardi.net
                            zinfandelreviews.net
                            zip-archive.net
                            zindova.net
                            ziomik.net
                            zinoiosijek031.net
                            zindova.net
                            zinoiosijek031.net
                            zinfandelreviews.net
                            zinfandelreviews.net
                            ziph.net
                            zinoiosijek031.net
                            zingardi.net
                            
                            • Run the Plugins > Python Script > Scripts > Count_Strings_Occurrences.py Python script

                            => At once, a new tab will open with all the results :

                            zindova.net....................................... 26
                            zinfandelreviews.net.............................. 24
                            zingardi.net...................................... 11
                            zingcoach.net..................................... 7
                            zinoiosijek031.net................................ 28
                            ziobaweek.net..................................... 5
                            ziomik.net........................................ 5
                            zioninfosystems.net............................... 12
                            zip-archive.net................................... 11
                            zip.net........................................... 18
                            zipaphoto.net..................................... 2
                            zipbah.net........................................ 1
                            zipexpose.net..................................... 2
                            ziph.net.......................................... 16
                            

                            Note that the entries are sorted by the line contents, to easily access any of these !

                            • Thus, do a zero-length RECTANGULAR selection of all the numbers of this new tab, at column 52

                            • Run the Edit > Line Operations > Sort Lines As Integers Descending option

                            => You should get your expected OUTPUT :

                            zinoiosijek031.net................................ 28
                            zindova.net....................................... 26
                            zinfandelreviews.net.............................. 24
                            zip.net........................................... 18
                            ziph.net.......................................... 16
                            zioninfosystems.net............................... 12
                            zingardi.net...................................... 11
                            zip-archive.net................................... 11
                            zingcoach.net..................................... 7
                            ziobaweek.net..................................... 5
                            ziomik.net........................................ 5
                            zipaphoto.net..................................... 2
                            zipexpose.net..................................... 2
                            zipbah.net........................................ 1
                            

                            Here you are !


                            • Proceed, in the same way, with your present file

                            • Switch to your file tab ( selection and sort are not required )

                            • Run again the Plugins > Python Script > Scripts > Count_Strings_Occurrences.py Python script

                            • In the opened new tab, do a zero-length RECTANGULAR selection of all the numbers, at column 52

                            • Run the Edit > Line Operations > Sort Lines As Integers Descending option

                            Bingo !

                            Best Regards,

                            guy038

                            M 1 Reply Last reply Oct 10, 2023, 10:13 AM Reply Quote 1
                            • M
                              Mohammad Al Thobiti @guy038
                              last edited by Oct 10, 2023, 10:13 AM

                              Hello @ guy038 & All

                              Thank you. everything is ok, I need to add more than one domine to the Find what:

                              as you can see
                              5c203023-6172-4dfa-ad8b-8e404222160d-image.png
                              this is the code:

                              \.real-news-online.com, myvnc.com, 1example.com, 2example.com\s
                              

                              is this the correct method?
                              How can I do this?

                              M PeterJonesP 3 Replies Last reply Oct 10, 2023, 10:24 AM Reply Quote 0
                              • M
                                Mohammad Al Thobiti @Mohammad Al Thobiti
                                last edited by Oct 10, 2023, 10:24 AM

                                8fc6c18d-b7fd-4c96-993d-d93f1cd3d202-image.png

                                1 Reply Last reply Reply Quote 0
                                • M
                                  Mohammad Al Thobiti @Mohammad Al Thobiti
                                  last edited by Oct 10, 2023, 10:41 AM

                                  .example.com|.example2.com|.example3.com\s

                                  OK

                                  M 1 Reply Last reply Oct 10, 2023, 10:59 AM Reply Quote 0
                                  • M
                                    Mohammad Al Thobiti @Mohammad Al Thobiti
                                    last edited by Oct 10, 2023, 10:59 AM

                                    e090ecbe-d21f-4d02-83f0-f6c59c3ee670-image.png

                                    1 Reply Last reply Reply Quote 0
                                    • M
                                      Mohammad Al Thobiti
                                      last edited by Oct 10, 2023, 11:20 AM

                                      c1b1e91a-6465-4333-b905-6fe87431ce9d-image.png

                                      1 Reply Last reply Reply Quote 0
                                      • PeterJonesP
                                        PeterJones @Mohammad Al Thobiti
                                        last edited by Oct 10, 2023, 12:57 PM

                                        @Mohammad-Al-Thobiti said in I would like to group all similar domains, not by alphabet.:

                                        How can I do this?

                                        At some point, you need to take the lessons you’ve been taught through the dozens of regexes that people have handed you throughout this discussion, and try to figure it out yourself.

                                        ----

                                        Useful References

                                        • Please Read Before Posting
                                        • Template for Search/Replace Questions
                                        • Formatting Forum Posts
                                        • Notepad++ Online User Manual: Searching/Regex
                                        • FAQ: Where to find other regular expressions (regex) documentation

                                        ----

                                        Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.

                                        M 1 Reply Last reply Oct 10, 2023, 12:58 PM Reply Quote 1
                                        • M
                                          Mohammad Al Thobiti @PeterJones
                                          last edited by Oct 10, 2023, 12:58 PM

                                          @ PeterJones
                                          Ok, Thank you

                                          1 Reply Last reply Reply Quote 0
                                          26 out of 37
                                          • First post
                                            26/37
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors