• Login
Community
  • Login

Replacing in specific columns (More difficult than you'd think)

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
regexcolumnregex
6 Posts 2 Posters 607 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L
    LingEd
    last edited by Jun 25, 2021, 2:42 PM

    I have some twitter data in the form of TXT files that I’m using for academic purposes. For example, here’s some tweets by Senator Chuck Schumer:
    Chuck Schumer
    As you can see there’s a tweet code, date and time, time zone code, twitter handle, and the content of the tweet, and the sections are delimited by space, which is obviously an issue as the contents of the tweets contain many spaces. I tried column select but replace in selection is greyed out
    ChuckSchumer2.png
    Replace all works fine here but I can’t select the specific columns (option is not greyed out if normal select is used). I was thinking about using Regex for this but I’m not sure how to get it to do what I want. Ideally code where I could simply replace the first five spaces in a line into commas would be ideal, since tweet code lengths can vary a little as you go back, but I don’t know how to do that. Maybe a Macro?

    P 1 Reply Last reply Jun 25, 2021, 3:00 PM Reply Quote 0
    • P
      PeterJones @LingEd
      last edited by Jun 25, 2021, 3:00 PM

      @LingEd ,

      data:

      123456786 1999-12-31 23:59:57 -0400 <username> Three! with more spaces
      123456787 1999-12-31 23:59:58 -0400 <username> Two! with more spaces
      123456788 1999-12-31 23:59:59 -0400 <username> One! with more spaces
      123456789 1900-01-01 00:00:00 -0400 <username> Happy Y2K Bug!
      
      • FIND = ^(\S+) (\S+) (\S+) (\S+)\x20
        (I used a \x20, which is equivalent to a space character at the end to make it obvious that there’s something at the end, so you will get it when you copy/paste; if typing the regex, you could just use a space after the last parentheses)
      • REPLACE = $1,$2,$3,$4,
      • Search Mode = Regular expression
      123456786,1999-12-31,23:59:57,-0400,<username> Three! with more spaces
      123456787,1999-12-31,23:59:58,-0400,<username> Two! with more spaces
      123456788,1999-12-31,23:59:59,-0400,<username> One! with more spaces
      123456789,1900-01-01,00:00:00,-0400,<username> Happy Y2K Bug!
      
      L 1 Reply Last reply Jun 25, 2021, 3:32 PM Reply Quote 1
      • L
        LingEd
        last edited by Jun 25, 2021, 3:17 PM

        @PeterJones Thanks! Works like a charm

        1 Reply Last reply Reply Quote 0
        • L
          LingEd @PeterJones
          last edited by Jun 25, 2021, 3:32 PM

          @PeterJones OK, sorry to bother, but new problem. When I import the altered files into Excel with comma as delimiter it creates too many columns when the tweets contain commas themselves. I was thinking what I could do is replace the 6th+ instances of commas (I altered the regex code to replace the first 5 commas instead of the first 4) in a line with a very uncommon character like “ɤ” and then search/replace that character in excel after the fact. The thing is, I don’t know how to write the Regex code to replace not just the 6th instance, but the 7th, 8th, 9th, etc instances. Thanks for the help!

          P 1 Reply Last reply Jun 25, 2021, 3:47 PM Reply Quote 0
          • P
            PeterJones @LingEd
            last edited by PeterJones Jun 25, 2021, 3:47 PM Jun 25, 2021, 3:47 PM

            @LingEd said in Replacing in specific columns (More difficult than you'd think):

            I altered the regex code to replace the first 5 commas instead of the first 4

            Congratulations. That means you understood what was going on. Knowing that people learn from what I write, rather than just copy/pasting and moving on, is always a good feeling.

            @PeterJones OK, sorry to bother, but new problem. When I import the altered files into Excel with comma as delimiter it creates too many columns when the tweets contain commas themselves. I was thinking what I could do is replace the 6th+ instances of commas (…) in a line with a very uncommon character like “ɤ” and then search/replace that character in excel after the fact. The thing is, I don’t know how to write the Regex code to replace not just the 6th instance, but the 7th, 8th, 9th, etc instances. Thanks for the help!

            That’s one good idea. If I were to do it that way, step 1 would be to just replace all commas with ɤ. Step 2 would be your 5-space-to-comma replacement from above.

            But since you’re trying to make valid CSV to open in Excel, CSV has a way of putting quotes around a field so that any commas inside will be treated as part of the text, not as a field separator. But that would mean that if you have any text with quotes in it, that will get messed up. But there’s a way around that by escaping the quote by changing any " to "". So my procedure for what I think what you want with your data:

            1. Search Mode = regular expression for all of this
            2. FIND = "
              REPLACE = "" to escape the quotes
            3. FIND = ^(\S+) (\S+) (\S+) (\S+) (\S+) (.*$)
              REPLACE = $1,$2,$3,$4,$5,"$6" to change spaces to commas and to put quotes around the text.
            L 1 Reply Last reply Jun 25, 2021, 5:15 PM Reply Quote 1
            • L
              LingEd @PeterJones
              last edited by Jun 25, 2021, 5:15 PM

              @PeterJones can’t believe I didn’t think of that. Thanks!

              1 Reply Last reply Reply Quote 0
              1 out of 6
              • First post
                1/6
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors