Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    De-Deuplicate chunks of text? (screenshot)

    Help wanted · · · – – – · · ·
    2
    2
    507
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • John Drachenberg
      John Drachenberg last edited by John Drachenberg

      Hi. I’m trying to remove duplicates and sort a huge number of Vivaldi browser bookmarks. This screenshot shows what each bookmark looks like individually. I’m looking for a way to find all chunks of text beginning with { \n “date_added”: until }, and treat them as individual entities, then somehow analyze if any duplicate chunks exist… then do the same thing for every single unique chunk… automagically…

      Essentially, I need duplicate file finder software but for chunks of text.

      Any ideas? Thanks much.

      1 Reply Last reply Reply Quote 0
      • Terry R
        Terry R last edited by

        Well @John-Drachenberg I’d try and grab the records in the group of 5 lines from the date added to the url line, combining them all into 1 line (so replacing CR/LF) with some other delimiter. I’d then create a "key at the start of the line, possibly the main part of the url, excluding any /folder names. Sort all the lines so that it would easily match up possible duplicate urls. To my mind a duplicate is any url where even if the /folder names/ portion was different would warrant further inspection.

        At this point either eyeball the duplicates, or another regex could mark possible duplicates for further inspection.

        Not sure if you actually want a regex to just remove duplicates in the original file, or would be happy just getting a list of possible duplicates which you could then check against the original and remove manually.

        Terry

        1 Reply Last reply Reply Quote 1
        • First post
          Last post
        Copyright © 2014 NodeBB Forums | Contributors