• Login
Community
  • Login

A help with regex

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
8 Posts 3 Posters 1.3k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • ?
    A Former User
    last edited by Dec 21, 2018, 8:21 PM

    Guys,

    A little help?

    My database is exporting data jumping some lines of code.

    It’s exporting a CSV like this (first line is the fields):

    /--------------------/
    id, location, data, name, description
    ObjectId(“662626”), rio, 12/12/2018,
    this is a name, this is a description
    ObjectId(“2342342626”, saopaulo, 14/12/2018,
    this is also a name, another description
    /--------------------/

    I need it to be:

    /--------------------/
    id, location, data, name, description
    ObjectId(“662626)”, rio, 12/12/2018, this is a name, this is a description
    ObjectId(“2342342626”), saopaulo, 14/12/2018, this is also a name, another description
    /--------------------/

    Another catch is that not always this happens. Sometimes is in the same line, sometimes it skips 3 lines.

    The good thing is that, except the header, all lines should begin with ObjectId("

    Any suggestion?

    Thanks!

    1 Reply Last reply Reply Quote 0
    • P
      PeterJones
      last edited by Dec 21, 2018, 9:20 PM

      I can get one that works when each record is only broken into 2 lines…

      • (?s)(ObjectID\(.*?)(\R+)(.*?)(?=\R+ObjectID|\R+$)
      • \1 \3
      • Search Mode = ☑ Regular Expression

      But that won’t work when the record is broken into 3 or more lines…

      This seems to work with any number of lines, when run repeatedly (until all are merged)

      • (?-s)(ObjectID\(.*?)\h*\R+(?!^ObjectID\()
      • \1\x20
      • Search Mode = ☑ Regular Expression

      I don’t like that you have to run it multiple times, but it’s better than nothing, and will work for all multiline (and won’t mess up the single-line versions)

      • The search says “dot doesn’t match newlines”, “match literal ObjectID( followed by 0 or more non-newline characters and capture into group \1”, “match zero or more horizontal spaces (space, tab), followed by one or more newlines, but don’t capture/keep those”, and “make sure that the match is followed by another literal ObjectID(, but don’t consume that”
      • The replace says “replace everything matched (except the not-consumed stuff) with the first group, followed by the space character (aka \x20)”

      I am sure @guy038 could get it in one run of the regex engine. But I don’t have such awesome regex skills

      The second one worked on this example text using just four runs of the regex:

      ObjectId(“662626”), rio, 12/12/2018,
      this is a name, this is a description
      ObjectId(“2342342626”, saopaulo, 14/12/2018,
      this is also a name, another description
      
      ObjectId(“2342342626”, 
      saopaulo, 
      14/12/2018,
      this is also a name, 
      another description
      

      --------
      FYI: if you have further regex needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting the data so that the forum doesn’t mangle it (so that it shows “exactly”, as I said earlier), see this help-with-markdown post , where @Scott-Sumner gives a great summary of how to use Markdown for this forum’s needs.
      Please note that for all “regex” queries – or queries where you want help “matching” or “marking” or “bookmarking” a certain pattern, which amounts to the same thing – it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.

      1 Reply Last reply Reply Quote 1
      • T
        Terry R
        last edited by Dec 21, 2018, 9:52 PM

        @Edgard etc

        I’m not at a PC currently so cannot provide a tested solution but if your example is accurate it would seem that the splitting occurs at a delimiter, namely the , comma. If that is the case then all we need to do is search for a comma immediately before a end of line marker, and replace both with just the comma.
        So the regex might look something like
        Find what::,$
        Replace with:,

        See if that works and let us know.

        Terry

        1 Reply Last reply Reply Quote 1
        • T
          Terry R
          last edited by Dec 21, 2018, 10:08 PM

          Actually ,\r\n for the find what might be better, that’s what I normally use. I suspect the $ amounts to a zero width position, thus it won’t actually replace the carriage return line feed.

          Terry.

          1 Reply Last reply Reply Quote 1
          • P
            PeterJones
            last edited by Dec 21, 2018, 10:21 PM

            @Terry-R,

            I thought about limiting it to comma-EOL, but the OP wasn’t clear when it split to multiple lines, whether it would always split on a comma, or if it was possible to split in the middle of a field, like:

            ObjectId(“662626”), rio, 12/12/2018,
            this is a name, this is a description
            ObjectId(“2342342626”, saopaulo, 14/12/2018,
            this is also a name, another description
            
            ObjectId(“2342342626”, 
            saopaulo, 
            14/12/2018,
            this 
            is 
            also 
            a 
            name, 
            another 
            description
            

            If it could never be in the middle of a field, then ,\h*\R -> ,\x20 will work

            1 Reply Last reply Reply Quote 0
            • P
              PeterJones
              last edited by PeterJones Dec 21, 2018, 10:26 PM Dec 21, 2018, 10:24 PM

              But that gave me another idea, if it could break inside a field: look for one or more newline, as long as it’s not followed by literal ObjectID(, and replace with a space:

              • \R+(?!ObjectID\() \h*\R+(?!ObjectID\()
              • \x20
              • Search Mode = ☑ Regular Expression

              This does it in one go, even for my most-recent data example

              edit: change to \h*\R+(?!ObjectID\(), so that any whitespace before the newline is also collapsed into the single space (\x20) in the replace string.

              1 Reply Last reply Reply Quote 2
              • ?
                A Former User
                last edited by Dec 21, 2018, 11:09 PM

                The problem is, I noticed, that not always the lines has commas. like:

                /--------------------/
                id, location, data, name, description
                ObjectId(“662626”), rio, 12/12/2018,
                this is a name, this is a description
                description continues
                more description
                ObjectId(“2342342626”, saopaulo, 14/12/2018,
                this is also a name, another description
                still description
                more
                and more
                /--------------------/

                Thats why it wont work, Terry :-(

                1 Reply Last reply Reply Quote 0
                • ?
                  A Former User
                  last edited by Dec 22, 2018, 1:20 AM

                  @PeterJones said:

                  \h*\R+(?!ObjectID()

                  Looks like it works!!! Amazing, Peter!!! I’ll test if everything is OK!!!

                  1 Reply Last reply Reply Quote 1
                  7 out of 8
                  • First post
                    7/8
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors