Search yields irrelevant results. I'm stuck :)



  • Hi all, first post here. I’ve looked all over but found no answer.

    I’m trying to edit the XML version of a Microsoft Word file to remove a date stamp, using a 2015 solution from the Microsoft forum. It suggests using Notepad++ to find and replace, using RegEx:

    • Choose Search>Replace.
    • In the Find what: field, paste: w:date="[\d\W]\w[\d\W]\w"
    • In the Replace with: field, leave it blank.
    • In Search Mode, check Regular expression.
    • Click on Replace All

    My problem is that when I search for w:date="[\d\W]\w[\d\W]\w" I get totally irrelevant results. It finds short phrases like <w:del or <w:ins, but never w:date.

    An example of one it should find: w:date=“2021-06-17T14:34:00Z”.
    That would be buried within an expression like
    <w:ins w:id=“8” w:author=“Author Name” w:date=“2021-06-17T14:34:00Z”>.

    Even when I delete the rest of the phrase and search for just the word date, it doesn’t find the right things.

    Any idea what I might be doing wrong? Thanks for any help!



  • @Jeremy-Sorkin ,

    I don’t know. But you don’t show us what you’re doing, so it’s hard to tell.

    For example, the regex you showed in this forum doesn’t match the one that was in the MS forum: w:date="[\d\W]*\w[\d\W]*\w" . The expression shown there cannot possibly match <w:del or <w:ins. If you are matching those, there is something you haven’t shown us or explained correctly.

    Next, your example data doesn’t match the regex: your example data shows “smart quotes” rather than "ASCII quotes", but your regex shows it is looking for ASCII quotes.

    Finally, when I use the regex from the other forum, it matches what I would expect. (I used Mark rather than Find or Replace, so that it would show all the matches.)

    5bcf620d-8a1a-46be-bb21-82733b43ae83-image.png

    So if done the way that I’ve interpreted, it does work.

    If it’s not working for you, then you need to do a better job of describing what you’re doing, and showing example data that’s not misrepresented by the forum. I will give you my generic regex-question advice; take it to heart and show that you’ve read and understood the advice by replying in a way that is consistent with that advice, especially in presenting data and giving enough examples and/or screenshots so that we know what you’re actually doing.

    ----

    Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

    PS:

    that’s what I mean by </> above



  • I forgot to include the example data that I used. Bad me.

    <w:ins w:id="8" w:author="Author Name" w:date="2021-06-17T14:34:00Z">
    <w:ins w:id="4" w:author="Something Else" w:dateddontmatch="2021-06-17T14:34:00Z">
    <w:ins w:id="4" w:author="Something Else" w:date="2021-06-17T14:34:00Z">
    <w:ins w:id="4" w:author="Something Else" w:ins="2021-06-17T14:34:00Z"><!-- no match -->
    <w:ins w:id="4" w:author="Something Else" w:del="2021-06-17T14:34:00Z"><!-- no match -->
    


  • @PeterJones
    Thank you so much for your detailed response and attempts to pick apart where I might have gone wrong!

    I just spent a couple of hours struggling and struggling… But after all that, I realized that Notepad++ was highlighting the beginning of the phrase containing the search results in purple, as well as the desired timestamp within in orange. My eye was only going to that first highlight.

    0142a56e-d883-4c9a-bfd5-91e41702f95c-image.png

    I really, really appreciate that you took the time to try to solve my problem, and now I’ll know what to look for in the future. Have a wonderful day!



  • @Jeremy-Sorkin said in Search yields irrelevant results. I'm stuck :):

    But after all that, I realized that Notepad++ was highlighting the beginning of the phrase containing the search results in purple, as well as the desired timestamp within in orange.

    Based on that, I can tell that you have Settings > Preferences > Highlight Matching Tags > ☑ Enable checked on. Un-checking that toggle will turn off the highlight of the <w:ins (and the later closing </w:ins> which you haven’t shown in the screenshot) – with that enable checked, anytime your cursor or selection is inside a given HTML/XML start or end tag, both the start and end tag sequences will be highlighted, so you know where you are.



  • @PeterJones
    Good to know, noted for next time! In my screenshot, there’s one purple </w:ins tag hiding in there ;)

    Thanks again and be well.



  • I’m kind of unclear on this thread…

    But I wanted to ask: Does Notepad++ really make a Find Next search result unclear in this situation? I’d think that the selection marking it uses for this would always be clear, above and beyond any other type of highlighting that is going on. Not so?


Log in to reply