regex: Match everything up to linebreak but not linebreak
-
@Hellena-Crainicu It looks like you are asking about usage of Python’s regex machinery, and not the regex within Notepad++. Is this correct?
-
I work only with notepad++, just running the code in Python.
-
@Hellena-Crainicu But you’re asking about a regex to feed into a call to re.findall(), correct? Or are you asking how to convert lines of text that look like your
<title>..<\title>example that are in a text file loaded in the np++ editor?If it’s the latter, I have a solution but I’m confused.
-
@Neil-Schipper I am using
\w+as you can see. But I need to stop selecting on the linebreak|, othewise I will getmy-name-is-peter-prince-justin.htmlinstead ofmy-name-is-peter.html -
@Hellena-Crainicu I’m not getting the clarity I’m hoping for. Here are two very different things people do on computers:
-
running a python program that processes an input file, and maybe changes it or produces an output file, etc.
-
having a file loaded in an editor, and running a search and replace operation on it
Which of these are you trying to do (that requires regex assistance as you described)?
-
-
it is just about the regex… maybe @guy038 will can help me. He is the master of regex.
-
For my own amusement, I solved the problem in the editor.
I broke the problem into:
- consume from start line to first ‘>’
- capture everything up to and excluding (space followed by literal ‘|’) into group 1
- consume everything else up to and including EOL
The search phrase
^.*?>(.+?)(?= \|).*?$does this. Then replace with\1.html. Then a separate S&R can convert all spaces to ‘-’.But I still don’t know what you’re asking for, because you refuse to tell me!
-
Again, for my own amusement (since I’ve never used re.sub() before, only match & split):
>>> t1 = re.sub(r"^.*?>(.+?)(?= \|).*?$", r"\1.html", "<title>My name is Peter | Prince Justin (en)</title>") >>> t2 = re.sub(r"\s", r"-", t1) >>> t2 'My-name-is-Peter.html' >>> -
I must split all html files, not just one. I don’t think I can use the replacement…
new_filename = title.get_text() new_filename = new_filename.lower() words = re.findall(r'\w+', new_filename) new_filename = '-'.join(words) new_filename = new_filename + '.html' print(new_filename) -
I try now this regex:
\w+.*(?= \|)words = re.findall(r"\w+.*(?= \|)", new_filename)almost works, but I get:
my name is peter.html(but without little dash) -
You guys are OFF-TOPIC.
This is not an appropriate place to discuss Python’s regular expression engine.
Please find a more appropriate forum for that and confine discussions here to Notepad++ related topics.
Just because you write Python code in Notepad++ doesn’t make discussion of that code a Notepad++ topic. -
I find the regex which I needed:
\b\w+\b(?=[\w\s]+\|)and in Python should be:
words = re.findall(r'\b\w+\b(?=[\w\s]+\|)', new_filename)thanks @Neil-Schipper You give me a good ideea ;)
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login