Community
    • Login

    How to find specific string (in my case its a link) that end with quotation mark.

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    15 Posts 3 Posters 3.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Fujosej FujoF
      Fujosej Fujo
      last edited by

      I have a wall of text like this.
      https://imgur.com/ZyahUCT

      I wanna mark all the twitch links with a channel names and copy them , thats a basic idea.

      The only solution i have, is to search for this
      https://imgur.com/m4RdOIf

      After that (is where i stuck) it should mark up until the quotation mark
      like that
      https://imgur.com/oNneMqd

      I am sure there is a way to achive this using regex. Please help me guys.

      1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R
        last edited by Terry R

        Hello @Fujosej-Fujo
        The following regex will capture (highlight) the remainder of the text you seek. I’m not sure how it will help you though as you suggest you need to further manipulate the text which is highlighted.

        So on the Mark function use:
        Find What:https://www.twitch.tv/[^"]+

        As I was not able to determine the exact quote character you use this may need adjusting. If my regex does NOT get the right text, copy the closing quote character from your file and replace mine (in the regex). Quotes can be problematic as there isn’t just 1 kind.

        Let me know how you get on and especially if you require further help.

        Just so you know, how the regex works, look for the first bit (which you already had), followed by ANY character as long as it isn’t a quote, for as many characters as can be found. Therefore it will stop just short of the quote character.

        Terry

        1 Reply Last reply Reply Quote 1
        • Fujosej FujoF
          Fujosej Fujo
          last edited by

          Works perfectly , thank you.

          Now i need to copy all this text.(370 matches)
          https://imgur.com/zcKjxAS

          Is there a way to do this, i think that shouldn’t be too difficult.

          1 Reply Last reply Reply Quote 0
          • Terry RT
            Terry R
            last edited by

            That was what I was referring to. Marking just shows you the occurances. It doesn’t help with additional processing.

            What I think you really want to do is to use a copy of the file which you will selectively destroy some of text in, leaving only the references you want.

            Here’s what I’d do. Make a copy of the file (into another tab of Notepad++).
            Use the following regex to remove all unwanted text. This is only a rough job. You may still need to remove lines where the text does NOT occur.
            Find What:^.+?(https://www.twitch.tv/[^"]+).+
            Replace With:\1

            So ANY line with the text you want will ONLY have that text remaining on the line. All other lines without it will be unchanged. I’d then use a line sort function (Edit, Line Operations, Sort Lines Lexicographically Ascending). This will put all the lines you want to keep together. Remove all others. I could spend more time on a regex that would do ALL this but this is quick and easy to do.

            See how you go with this.

            Terry

            1 Reply Last reply Reply Quote 1
            • Terry RT
              Terry R
              last edited by

              @Fujosej-Fujo

              I probably should ask the question, does any occurance of the text you want crossover lines. I see one of the highlighted instances was right at the end of a line. Currently I haven’t catered for that situation, so check (manually if need) for any other twitch.tv occurrences that did NOT get marked.

              Terry

              1 Reply Last reply Reply Quote 0
              • Fujosej FujoF
                Fujosej Fujo
                last edited by

                In my file i have only 10 lines, like that
                https://imgur.com/s7JCMfP

                Maybe bacause of that i cant achive what i want.

                After Find and replace
                https://imgur.com/ESaV0VU
                it deletes almost everything i need and leaves me with 8 matches of twitch.

                Like this
                https://imgur.com/SGHI8W2

                Here is a txt file that i am using , maybe this way it will be easier to figure out.
                http://rgho.st/6TlbcFmlG

                1 Reply Last reply Reply Quote 1
                • Terry RT
                  Terry R
                  last edited by

                  @Fujosej-Fujo
                  Your file was a great help. Definitely needed to see that as it showed me that the lines were very long, with multiple occurences on each line of the text you want.

                  So here is my revised set of steps.

                  1. Make a copy in another tab of Notepad++
                  2. Use the Replace Function to remove all line endings (carriage return line feeds).
                    Find What:\R
                    Replace with:empty field <—nothing in this field

                  Now everything is on 1 line (it may not look that way if you have word wrap turned on)

                  1. Use Replace function to remove ALL unwanted text.
                    Find What:.+?(https://www.twitch.tv/[^"]+)
                    Replace With:\1

                  This will remove all unwanted text except for the last occurance.

                  1. Now to put all occurances of the text we want on different lines.
                    Find What:(https://www.twitch.tv/)
                    Replace With:\r\n\1

                  Once this step is completed, go to the last line and remove the extra text behind the portion you want to keep.

                  Again this is a quick process, I haven’t spent much time on making it do everything. Sometimes quick and easy steps are better than trying to cover ALL bases and using a long winded approach.

                  Have a go and let us know.

                  Terry

                  1 Reply Last reply Reply Quote 2
                  • Terry RT
                    Terry R
                    last edited by Terry R

                    @Fujosej-Fujo
                    I have had a slightly longer look at the file you provided. I note that you mentioned about quotes, however your initial regex did NOT include those. In the file it would appear there are some instances of twitch.tv without quotes. I’m not sure you actually intend to capture those as well.

                    I’ve made a revised regex which doesn’t need so many steps, however it will still require the final file to be edited a bit. Once you try it you will see what I mean. Some of the lines stick out very easily as not being correct.

                    Find What:.+?"(https://www.twitch.tv/[^"]+)
                    Replace With:\1\r\n

                    So no need to remove carriage returns, but you will need to remove those lines that DON’T start with “https”. This can be done with Mark, also ticking bookmark, which can then be used to remove lines bookmarked.
                    Find What:^[^h]

                    Terry

                    1 Reply Last reply Reply Quote 2
                    • Fujosej FujoF
                      Fujosej Fujo
                      last edited by

                      This is amazing, thank you very much , works beautifully.

                      1 Reply Last reply Reply Quote 0
                      • guy038G
                        guy038
                        last edited by guy038

                        Hello @fujosej Fujo, @rerry-r and All,

                        Thanks, @fujosej-fujo, for your new 6.txt text file. It’s always better to work on “real” data ;-))

                        I think, @terry-r, that all work can be reduced to an unique, regex S/R, only ;-))


                        So, @fujosej-fujo, basically, you’re searching for any area of text :

                        • Beginning with "https://www.twitch.tv

                        • Ending at the first next quote char "

                        This regex, which searches for such an area, is :

                        (?s-i)"https://www.twitch.tv.+?"

                        Notes :

                        • The (?s-i) modifiers, at beginning, means that :

                          • Any meta-character dot ( . ) represents, absolutely, any single character, even EOL ones ( (?s) )

                          • The regex engine will search in a non-insensitive way ( (?-i) )

                        • Then, it searches the literal string "https://www.twitch.tv

                        • Finally the part .+?" finds the shortest area of any character, till the first next quote char "


                        Now that we built this first regex to match the zones to extract, we create a second regex which contains this first regex, using the syntax, where your regex is surrounded with parentheses, in order to store its value as group1, for future replacement :

                        SEARCH (?s-i).+?(Your regex)|.+

                        Thus, this leads to the correct regex S/R, below :

                        SEARCH (?s-i).+?("https://www.twitch.tv.+?")|.+

                        REPLACE \1\r\n

                        => From your new 6.txt file, 366 replacements occur and you get a neat list of 365 links ;-))

                        Notes :

                        • After the modifiers, the part .+? matches the shortest part from, either, the beginning of file or the end of the previous match, until the expression "https://www.twitch.tv.+?"

                        • In replacement, we rewrite, only, the expected group1, which must be extracted, followed with a line-break

                        • Near the end of the file, when no more "https://www.twitch.tv can be found, the regex engine uses the second alternative .+, after the alternation symbol |, which will grab all text till the very end of the file, as the (?s) modifier is always active !

                        • This time, as group1 is not defined, the replacement simply delete this last non-wanted part


                        Refer also to this more complete post, on that topic ( How to extract all the results matched ) :

                        https://notepad-plus-plus.org/community/topic/12710/marked-text-manipulation/8

                        Best Regards,

                        guy038

                        1 Reply Last reply Reply Quote 2
                        • Terry RT
                          Terry R
                          last edited by Terry R

                          @guy038
                          You’re mostly correct, except that I believe the OP didn’t want the quote characters included, they were just to delimit the text he DID want. Thus your regex should be
                          Find What:(?s-i).+?"(https://www.twitch.tv.+?)"|.+
                          the Replace With field is as stated.

                          Terry

                          1 Reply Last reply Reply Quote 3
                          • Fujosej FujoF
                            Fujosej Fujo
                            last edited by

                            Once again , thank you guys! Works like a charm.

                            1 Reply Last reply Reply Quote 0
                            • Fujosej FujoF
                              Fujosej Fujo
                              last edited by

                              I have one more question , sorry)

                              so lets say i have this list
                              http://rgho.st/6k9G4rHSh
                              i wanna add ‘s’ to http and also a ‘www.’ before twitch.
                              Is there a way to do this with regex?

                              1 Reply Last reply Reply Quote 0
                              • Terry RT
                                Terry R
                                last edited by

                                @Fujosej-Fujo
                                this is actually very easy. You only need to search for http:// and have the replacement as https://www..
                                You could even do this with the Replace function set to “normal” mode as there aren’t any special characters as used previously (.+?) etc.

                                Find What:http://
                                Replace With:https://www.

                                As I said this can be either as normal mode or regular expression mode, it won’t matter.

                                Terry

                                1 Reply Last reply Reply Quote 2
                                • Fujosej FujoF
                                  Fujosej Fujo
                                  last edited by

                                  Thank you very much.That was easier than i thought.

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  The Community of users of the Notepad++ text editor.
                                  Powered by NodeBB | Contributors