Community
    • 登入

    How to find and remove duplicate strings of alphanumeric characters from multiple files?

    已排程 已置頂 已鎖定 已移動 Help wanted · · · – – – · · ·
    27 貼文 5 Posters 8.5k 瀏覽
    正在載入更多貼文
    • 從舊到新
    • 從新到舊
    • 最多點贊
    回覆
    • 在新貼文中回覆
    登入後回覆
    此主題已被刪除。只有擁有主題管理權限的使用者可以查看。
    • Ramanand JhingadeR
      Ramanand Jhingade @Ramanand Jhingade
      最後由 編輯

      @Ramanand-Jhingade …for both alphanumeric strings on consecutive rows as well as if they are not on consecutive rows

      1 條回覆 最後回覆 回覆 引用 0
      • Terry RT
        Terry R
        最後由 編輯

        @Ramanand-Jhingade said in How to find and remove duplicate strings of alphanumeric characters from multiple files?:

        How to find and remove duplicate strings of alphanumeric characters from multiple files?

        That is a question that has so many answers. You need to provide us with an example of what you are looking for.

        1. How long is the string of characters, you seem to suggest a complete line?
        2. Is the string case sensitive, so a how does not equal a How?
        3. Is there just 1 string to look for or multiple?
        4. What do you want to do to the duplicates, remove all, first or last one?

        There may be other questions as well. It’s just that it’s not worth anyone’s time wasting it to help you if you don’t provide the necessary information.

        Terry

        Ramanand JhingadeR 1 條回覆 最後回覆 回覆 引用 1
        • Ramanand JhingadeR
          Ramanand Jhingade @Terry R
          最後由 Ramanand Jhingade 編輯

          @Terry-R Answers to your questions:-

          • Yes, complete lines
          • Yes, it is case sensitive
          • Multiple lines but I want to do one line at a time so that I can check after each command (to find and remove) is executed
          • Remove all duplicates but keep the original
            Thanks for your time and help!
          1 條回覆 最後回覆 回覆 引用 0
          • Terry RT
            Terry R
            最後由 編輯

            @Ramanand-Jhingade said in How to find and remove duplicate strings of alphanumeric characters from multiple files?:

            Remove all duplicates but keep the original

            Keeping the original will pose problems. However there is a solution. If you first read the excellent solution provided by @guy038 here. With his solution it will leave the last copy ONLY. The only change to make is to have (?-is) instead of (?-s) as this allows you to identify with case sensitivity enabled.

            But if you reverse the lines, see the following posts to the above link. Again @guy038 comes up with adding a line number to the start of each line then sorting numerically descending, then removing the duplicates and lastly re-ordering numerically ascending.

            Just remember to use the “Replace” button, not replace all as that allows you to see (and remove) the duplicates one by one.

            Give that a go.

            Terry

            Ramanand JhingadeR 1 條回覆 最後回覆 回覆 引用 0
            • Ramanand JhingadeR
              Ramanand Jhingade @Terry R
              最後由 編輯

              @Terry-R I did what @guy038 typed there but it is removing even the brackets in the CSS section of my .html files/pages (CSS can have just one bracket in a line). Any better suggestion?

              Terry RT 1 條回覆 最後回覆 回覆 引用 0
              • Terry RT
                Terry R
                最後由 編輯

                @Ramanand-Jhingade said in How to find and remove duplicate strings of alphanumeric characters from multiple files?:

                but it is removing even the brackets in the CSS section of my .html files/pages

                I asked the question is it complete lines to which you said yes. So now you have changed the request.

                How can you expect us to help you if you don’t provide enough information and examples.

                At this point I’m out.

                Terry

                Ramanand JhingadeR 2 條回覆 最後回覆 回覆 引用 0
                • Ramanand JhingadeR
                  Ramanand Jhingade @Terry R
                  最後由 編輯

                  @Terry-R Complete lines, yes, but it should not remove the brackets from the CSS sections of the webpages - some of those lines are just a single bracket (in the CSS section). @guy038 can probably help.

                  1 條回覆 最後回覆 回覆 引用 0
                  • Ramanand JhingadeR
                    Ramanand Jhingade @Terry R
                    最後由 Ramanand Jhingade 編輯

                    @Terry-R @guy038 Let me be more specific. I want to check for matter between ```
                    <

                    and 
                    
                    
                    
                    and replace any duplicated matter but keep the  original. I don't mind using the "Replace" function  instead of the "Replace all" function so that I can check what happens each time
                    Ramanand JhingadeR 2 條回覆 最後回覆 回覆 引用 0
                    • Ramanand JhingadeR
                      Ramanand Jhingade @Ramanand Jhingade
                      最後由 Ramanand Jhingade 編輯

                      此回覆已被刪除!
                      1 條回覆 最後回覆 回覆 引用 0
                      • Ramanand JhingadeR
                        Ramanand Jhingade @Ramanand Jhingade
                        最後由 Ramanand Jhingade 編輯

                        @guy038 @Terry-R Sorry, I was unable to post it correctly. It should find and replace duplicate matter between “<” and “>” and keep the original

                        1 條回覆 最後回覆 回覆 引用 0
                        • Terry RT
                          Terry R @Ramanand Jhingade
                          最後由 編輯

                          @Ramanand-Jhingade said in How to find and remove duplicate strings of alphanumeric characters from multiple files?:

                          but it is removing even the brackets in the CSS section of my .html files/pages (CSS can have just one bracket in a line). Any better suggestion?

                          You have yet to supply any examples. Regardless I will give you some help.

                          Since you are using the “Replace” button to see changes 1 at a time, when it gets to a line you don’t want to change press the “Find Next” button instead as this means you don’t change what is currently selected, instead it finds the next occurrence.

                          It might be possible to adjust the regex to ignore lines based on the characters on it but that needs examples provided.

                          Terry

                          Ramanand JhingadeR 1 條回覆 最後回覆 回覆 引用 0
                          • Ramanand JhingadeR
                            Ramanand Jhingade @Terry R
                            最後由 編輯

                            @Terry-R Suppose I want to avoid removing the flower brackets,

                            {
                            

                            and

                            }
                            

                            how to do so?

                            PeterJonesP 1 條回覆 最後回覆 回覆 引用 0
                            • PeterJonesP
                              PeterJones @Ramanand Jhingade
                              最後由 編輯

                              @Ramanand-Jhingade ,

                              As has been explained to you before, this process works best if you show two black boxes, one with the way your data is now (“have”); and the second with the way you want the data to look (“want”)

                              have:

                              blah
                              {
                              blah
                              }
                              

                              want:

                              something
                              <
                              yada
                              >
                              

                              Since you don’t seem to remember this guidance, I will re-post it. If you don’t follow these guidelines, you will find it hard to convince people to help you, especially since you’re asking so many questions.

                              ----

                              Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

                              Ramanand JhingadeR 1 條回覆 最後回覆 回覆 引用 0
                              • Ramanand JhingadeR
                                Ramanand Jhingade @PeterJones
                                最後由 Ramanand Jhingade 編輯

                                @PeterJones @guy038 I would like all the matter/characters between the < and > which have duplicates to be removed.
                                For example,

                                <link rel="stylesheet" type="text/css" href="engine1/style.css" media="screen">
                                <link href="css/style.css" rel="stylesheet" type="text/css" media="all">
                                <link rel="stylesheet" type="text/css" href="engine1/style.css" media="screen">
                                <link href="css/style.css" rel="stylesheet" type="text/css" media="all">
                                

                                should become:-

                                <link rel="stylesheet" type="text/css" href="engine1/style.css" media="screen">
                                <link href="css/style.css" rel="stylesheet" type="text/css" media="all">```
                                Alan KilbornA 1 條回覆 最後回覆 回覆 引用 1
                                • Terry RT
                                  Terry R
                                  最後由 編輯

                                  @Ramanand-Jhingade said in How to find and remove duplicate strings of alphanumeric characters from multiple files?:

                                  I would like all the matter/characters between the < and > which have duplicates to be removed.

                                  Thank you for finally giving us some examples. From those examples I can say that the original solution I linked from @guy038 along with his subsequent line numbering and reversing should do that. But I think you probably know that already.

                                  What is missing is examples of the Suppose I want to avoid removing the flower brackets, question you asked earlier. Can you provide examples of the lines they are in or is it that these brackets are on lines by themselves, possibly with spaces before the bracket? This is very important information.

                                  Terry

                                  Ramanand JhingadeR 1 條回覆 最後回覆 回覆 引用 1
                                  • Ramanand JhingadeR
                                    Ramanand Jhingade @Terry R
                                    最後由 編輯

                                    @Terry-R In the CSS section of my webpages (just after my meta tags right at the top), some lines are just { or } and I don’t want those to be removed or replaced

                                    Terry RT 1 條回覆 最後回覆 回覆 引用 0
                                    • Alan KilbornA
                                      Alan Kilborn @Ramanand Jhingade
                                      最後由 Alan Kilborn 編輯

                                      @Ramanand-Jhingade said:

                                      I would like all the matter/characters between the < and > which have duplicates to be removed.
                                      For example…

                                      @Terry-R said

                                      From those examples I can say that the original solution I linked … along with … subsequent line numbering and reversing should do that.

                                      I’m confused why this command was not recommended for this task, given the example data provided:

                                      21d1f4cf-22d2-483c-8ee6-ed609c1b4b27-image.png

                                      Terry RT 1 條回覆 最後回覆 回覆 引用 1
                                      • Terry RT
                                        Terry R @Alan Kilborn
                                        最後由 編輯

                                        @Alan-Kilborn OP wants to use “Replace” in single mode, presumably to verify each removal. And considering he also has the {} characters to contend with I don’t blame him.

                                        Terry

                                        1 條回覆 最後回覆 回覆 引用 0
                                        • Terry RT
                                          Terry R @Ramanand Jhingade
                                          最後由 編輯

                                          @Ramanand-Jhingade said in How to find and remove duplicate strings of alphanumeric characters from multiple files?:

                                          some lines are just { or } and I don’t want those to be removed or replaced

                                          As you never showed me examples of these lines I have to assume some things.

                                          Consider adding ^(?=[^\{\}]+$) to the regex immediately behind the (?-is) if that’s what you have. This means look forward at the line and make sure there are NO { or } characters on it. If none then process the line to check if a duplicate.

                                          Terry

                                          Ramanand JhingadeR 1 條回覆 最後回覆 回覆 引用 1
                                          • Ramanand JhingadeR
                                            Ramanand Jhingade @Terry R
                                            最後由 Ramanand Jhingade 編輯

                                            @Terry-R Thanks. I don’t want to mess up things, so please clarify if I should use (?-is)^(?=[^\{\}]+$)^\d+\h+(.+\R)(?=(?is:.*)^\d+\h+\1) in the Regular expression mode with the “new line” checked/ticked and click on the Find button?
                                            If everything is done correctly, can I use Replace all for all files of the folder. @Alan-Kilborn says that searchingbackwards will not give accurate results in the post you linked to, but since we are reversing the order of the lines and searching, that should not be a problem, right (it will not be searching backwards)?

                                            1 條回覆 最後回覆 回覆 引用 0
                                            • 第一個貼文
                                              最後的貼文
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors