Remove "duplicate" lines based on only a specific part of the line
-
Hi everyone,
how can I remove “duplicate” lines based on only a specific part of the lines?
Example:
I have these lines / this text:
23.03.2025 09:10:01
23.03.2025 09:20:01
23.03.2025 09:40:01
…
23.03.2025 17:50:01
23.03.2025 18:00:01
23.03.2025 18:10:01
24.03.2025 07:30:06
24.03.2025 07:40:01
24.03.2025 07:50:01
…
24.03.2025 18:50:01
24.03.2025 19:00:01
24.03.2025 19:10:01
25.03.2025 06:30:04
25.03.2025 06:40:01
25.03.2025 06:50:01
…
25.03.2025 09:50:01
25.03.2025 10:00:01
25.03.2025 10:10:01
26.03.2025 06:51:54
26.03.2025 07:00:01
26.03.2025 07:10:01
…and so on.
The “…” is only a placeholder in this example, each line is always the same: a date followed by a time.
What I want to do is, to keep the bold lines which is the first and the last line of each date/day?
How can I do this?
Using a plugin?
Using regex?
If it is regex I need the exact expression because I do not know the regex syntax at all!Thank you very much and
kind regards
Thomas -
@inkognito said in Remove "duplicate" lines based on only a specific part of the line:
If it is regex I need the exact expression because I do not know the regex syntax at all!
This regular expression seems like it will help you.
Find What:
(?-s)^(([^ ]+).+\R)(.+\R)*(\2.+\R?)
Replace With:${1}${4}
As this is a regular expression your search mode needs to be set to “regular expression”. Make sure the cursor is at the start of the first line as it needs to work with the first of these sorts of lines to correctly identify the “matching” pairs of dates. It will also work if there are only 2 lines with the same date, it returns the 2 lines in this instance.
Just so you have a bit of background in how it works.
([^ ]+)
works to get the date. Since your example didn’t show how a date like 3.03.2025 would actually show (could be 3.3.2025 through to 03.03.2025) I capture “non-blanks”. Now this might be an issue if you have other sort of lines (text?) in amongst the date lines, but I assumed not.You should run this on a copy of your file initially and spot check that it works as expected. If not, then likely you have other details you did not explain which will cause issues. Let us know how it goes and if there are changes needed, explain where it failed.
Terry
-
@Terry-R said in Remove "duplicate" lines based on only a specific part of the line:
(?-s)^(([^ ]+).+\R)(.+\R)*(\2.+\R?)
This is REALLY GREAT!
Before:
After:
How can you do something like that?
Really amazing, many, many thank you!But I have the next question:
How can I apply it to the text file without opening it in Notepad++?
Are there any corresponding options over the command line?Thank you and kind regards
Thomas -
@inkognito said in Remove "duplicate" lines based on only a specific part of the line:
How can I apply it to the text file without opening it in Notepad++?
Are there any corresponding options over the command line?Two very leading but somewhat loosely defined questions.
Find in Files allows you to select a number of files based on filters, such as folders (and hidden & sub-folders), file type, partial filename. So the files are edited by Notepad++ (NPP) but don’t show in the NPP view(s). You could say they aren’t opened by NPP if you mean “do I see them in the NPP view”.
I am aware that there is the ability to have an “auto start” functionality upon loading NPP. The information is in the FAQ section here. So it uses the PythonScript plugin along with an “auto-start” script file called “startup.py”. Upon NPP loading it immediately starts processing the commands in this script file.
There is another PlugIn called NppExec and a post late last year suggested it might have a method on autostarting once NPP is loaded, see post #26006. Unfortunately there isn’t a lot else I have seen on NppExec, it doesn’t tend to get much exposure.
If you are referring to editing these files without NPP itself loading, then that is a whole other question which resides outside of this forum. Other command line editors such as AWK, SED, GREP etc might offer you that functionality but as I say that is not for this forum to answer. You would need to visit forums based on which ever editor you wish to use. The regular expression I provided might work as is in another editor, but likely would require some massaging to perform correctly.
Terry