Community
    • Login

    How to find specific string (in my case its a link) that end with quotation mark.

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    15 Posts 3 Posters 3.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Fujosej FujoF
      Fujosej Fujo
      last edited by

      In my file i have only 10 lines, like that
      https://imgur.com/s7JCMfP

      Maybe bacause of that i cant achive what i want.

      After Find and replace
      https://imgur.com/ESaV0VU
      it deletes almost everything i need and leaves me with 8 matches of twitch.

      Like this
      https://imgur.com/SGHI8W2

      Here is a txt file that i am using , maybe this way it will be easier to figure out.
      http://rgho.st/6TlbcFmlG

      1 Reply Last reply Reply Quote 1
      • Terry RT
        Terry R
        last edited by

        @Fujosej-Fujo
        Your file was a great help. Definitely needed to see that as it showed me that the lines were very long, with multiple occurences on each line of the text you want.

        So here is my revised set of steps.

        1. Make a copy in another tab of Notepad++
        2. Use the Replace Function to remove all line endings (carriage return line feeds).
          Find What:\R
          Replace with:empty field <—nothing in this field

        Now everything is on 1 line (it may not look that way if you have word wrap turned on)

        1. Use Replace function to remove ALL unwanted text.
          Find What:.+?(https://www.twitch.tv/[^"]+)
          Replace With:\1

        This will remove all unwanted text except for the last occurance.

        1. Now to put all occurances of the text we want on different lines.
          Find What:(https://www.twitch.tv/)
          Replace With:\r\n\1

        Once this step is completed, go to the last line and remove the extra text behind the portion you want to keep.

        Again this is a quick process, I haven’t spent much time on making it do everything. Sometimes quick and easy steps are better than trying to cover ALL bases and using a long winded approach.

        Have a go and let us know.

        Terry

        1 Reply Last reply Reply Quote 2
        • Terry RT
          Terry R
          last edited by Terry R

          @Fujosej-Fujo
          I have had a slightly longer look at the file you provided. I note that you mentioned about quotes, however your initial regex did NOT include those. In the file it would appear there are some instances of twitch.tv without quotes. I’m not sure you actually intend to capture those as well.

          I’ve made a revised regex which doesn’t need so many steps, however it will still require the final file to be edited a bit. Once you try it you will see what I mean. Some of the lines stick out very easily as not being correct.

          Find What:.+?"(https://www.twitch.tv/[^"]+)
          Replace With:\1\r\n

          So no need to remove carriage returns, but you will need to remove those lines that DON’T start with “https”. This can be done with Mark, also ticking bookmark, which can then be used to remove lines bookmarked.
          Find What:^[^h]

          Terry

          1 Reply Last reply Reply Quote 2
          • Fujosej FujoF
            Fujosej Fujo
            last edited by

            This is amazing, thank you very much , works beautifully.

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hello @fujosej Fujo, @rerry-r and All,

              Thanks, @fujosej-fujo, for your new 6.txt text file. It’s always better to work on “real” data ;-))

              I think, @terry-r, that all work can be reduced to an unique, regex S/R, only ;-))


              So, @fujosej-fujo, basically, you’re searching for any area of text :

              • Beginning with "https://www.twitch.tv

              • Ending at the first next quote char "

              This regex, which searches for such an area, is :

              (?s-i)"https://www.twitch.tv.+?"

              Notes :

              • The (?s-i) modifiers, at beginning, means that :

                • Any meta-character dot ( . ) represents, absolutely, any single character, even EOL ones ( (?s) )

                • The regex engine will search in a non-insensitive way ( (?-i) )

              • Then, it searches the literal string "https://www.twitch.tv

              • Finally the part .+?" finds the shortest area of any character, till the first next quote char "


              Now that we built this first regex to match the zones to extract, we create a second regex which contains this first regex, using the syntax, where your regex is surrounded with parentheses, in order to store its value as group1, for future replacement :

              SEARCH (?s-i).+?(Your regex)|.+

              Thus, this leads to the correct regex S/R, below :

              SEARCH (?s-i).+?("https://www.twitch.tv.+?")|.+

              REPLACE \1\r\n

              => From your new 6.txt file, 366 replacements occur and you get a neat list of 365 links ;-))

              Notes :

              • After the modifiers, the part .+? matches the shortest part from, either, the beginning of file or the end of the previous match, until the expression "https://www.twitch.tv.+?"

              • In replacement, we rewrite, only, the expected group1, which must be extracted, followed with a line-break

              • Near the end of the file, when no more "https://www.twitch.tv can be found, the regex engine uses the second alternative .+, after the alternation symbol |, which will grab all text till the very end of the file, as the (?s) modifier is always active !

              • This time, as group1 is not defined, the replacement simply delete this last non-wanted part


              Refer also to this more complete post, on that topic ( How to extract all the results matched ) :

              https://notepad-plus-plus.org/community/topic/12710/marked-text-manipulation/8

              Best Regards,

              guy038

              1 Reply Last reply Reply Quote 2
              • Terry RT
                Terry R
                last edited by Terry R

                @guy038
                You’re mostly correct, except that I believe the OP didn’t want the quote characters included, they were just to delimit the text he DID want. Thus your regex should be
                Find What:(?s-i).+?"(https://www.twitch.tv.+?)"|.+
                the Replace With field is as stated.

                Terry

                1 Reply Last reply Reply Quote 3
                • Fujosej FujoF
                  Fujosej Fujo
                  last edited by

                  Once again , thank you guys! Works like a charm.

                  1 Reply Last reply Reply Quote 0
                  • Fujosej FujoF
                    Fujosej Fujo
                    last edited by

                    I have one more question , sorry)

                    so lets say i have this list
                    http://rgho.st/6k9G4rHSh
                    i wanna add ‘s’ to http and also a ‘www.’ before twitch.
                    Is there a way to do this with regex?

                    1 Reply Last reply Reply Quote 0
                    • Terry RT
                      Terry R
                      last edited by

                      @Fujosej-Fujo
                      this is actually very easy. You only need to search for http:// and have the replacement as https://www..
                      You could even do this with the Replace function set to “normal” mode as there aren’t any special characters as used previously (.+?) etc.

                      Find What:http://
                      Replace With:https://www.

                      As I said this can be either as normal mode or regular expression mode, it won’t matter.

                      Terry

                      1 Reply Last reply Reply Quote 2
                      • Fujosej FujoF
                        Fujosej Fujo
                        last edited by

                        Thank you very much.That was easier than i thought.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors