Community
    • Login

    regex: Match everything up to linebreak but not linebreak

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    13 Posts 3 Posters 4.2k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Neil SchipperN Offline
      Neil Schipper @Hellena Crainicu
      last edited by

      @Hellena-Crainicu It looks like you are asking about usage of Python’s regex machinery, and not the regex within Notepad++. Is this correct?

      1 Reply Last reply Reply Quote 0
      • Hellena CrainicuH Offline
        Hellena Crainicu
        last edited by

        I work only with notepad++, just running the code in Python.

        Neil SchipperN 1 Reply Last reply Reply Quote 0
        • Neil SchipperN Offline
          Neil Schipper @Hellena Crainicu
          last edited by

          @Hellena-Crainicu But you’re asking about a regex to feed into a call to re.findall(), correct? Or are you asking how to convert lines of text that look like your <title>..<\title> example that are in a text file loaded in the np++ editor?

          If it’s the latter, I have a solution but I’m confused.

          1 Reply Last reply Reply Quote 0
          • Hellena CrainicuH Offline
            Hellena Crainicu
            last edited by

            @Neil-Schipper I am using \w+ as you can see. But I need to stop selecting on the linebreak |, othewise I will get my-name-is-peter-prince-justin.html instead of my-name-is-peter.html

            Neil SchipperN 1 Reply Last reply Reply Quote 0
            • Neil SchipperN Offline
              Neil Schipper @Hellena Crainicu
              last edited by

              @Hellena-Crainicu I’m not getting the clarity I’m hoping for. Here are two very different things people do on computers:

              1. running a python program that processes an input file, and maybe changes it or produces an output file, etc.

              2. having a file loaded in an editor, and running a search and replace operation on it

              Which of these are you trying to do (that requires regex assistance as you described)?

              1 Reply Last reply Reply Quote 0
              • Hellena CrainicuH Offline
                Hellena Crainicu
                last edited by

                it is just about the regex… maybe @guy038 will can help me. He is the master of regex.

                1 Reply Last reply Reply Quote 0
                • Neil SchipperN Offline
                  Neil Schipper
                  last edited by

                  For my own amusement, I solved the problem in the editor.

                  I broke the problem into:

                  1. consume from start line to first ‘>’
                  2. capture everything up to and excluding (space followed by literal ‘|’) into group 1
                  3. consume everything else up to and including EOL

                  The search phrase ^.*?>(.+?)(?= \|).*?$ does this. Then replace with \1.html. Then a separate S&R can convert all spaces to ‘-’.

                  But I still don’t know what you’re asking for, because you refuse to tell me!

                  1 Reply Last reply Reply Quote 0
                  • Neil SchipperN Offline
                    Neil Schipper
                    last edited by

                    Again, for my own amusement (since I’ve never used re.sub() before, only match & split):

                    >>> t1 = re.sub(r"^.*?>(.+?)(?= \|).*?$", r"\1.html", "<title>My name is Peter | Prince Justin (en)</title>")
                    >>> t2 = re.sub(r"\s", r"-", t1)
                    >>> t2
                    'My-name-is-Peter.html'
                    >>>
                    
                    1 Reply Last reply Reply Quote 0
                    • Hellena CrainicuH Offline
                      Hellena Crainicu
                      last edited by

                      I must split all html files, not just one. I don’t think I can use the replacement…

                          new_filename = title.get_text() 
                          new_filename = new_filename.lower()
                          words = re.findall(r'\w+', new_filename)
                          new_filename = '-'.join(words)
                          new_filename = new_filename + '.html'
                          print(new_filename)
                      
                      1 Reply Last reply Reply Quote 0
                      • Hellena CrainicuH Offline
                        Hellena Crainicu
                        last edited by

                        I try now this regex: \w+.*(?= \|)

                        words = re.findall(r"\w+.*(?= \|)", new_filename)

                        almost works, but I get: my name is peter.html (but without little dash)

                        1 Reply Last reply Reply Quote 0
                        • Alan KilbornA Online
                          Alan Kilborn
                          last edited by

                          You guys are OFF-TOPIC.
                          This is not an appropriate place to discuss Python’s regular expression engine.
                          Please find a more appropriate forum for that and confine discussions here to Notepad++ related topics.
                          Just because you write Python code in Notepad++ doesn’t make discussion of that code a Notepad++ topic.

                          1 Reply Last reply Reply Quote 0
                          • Hellena CrainicuH Offline
                            Hellena Crainicu
                            last edited by Hellena Crainicu

                            I find the regex which I needed: \b\w+\b(?=[\w\s]+\|)

                            and in Python should be:

                            words = re.findall(r'\b\w+\b(?=[\w\s]+\|)', new_filename)

                            thanks @Neil-Schipper You give me a good ideea ;)

                            1 Reply Last reply Reply Quote 0

                            Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                            Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                            With your input, this post could be even better 💗

                            Register Login
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors