Community
    • Login

    A help with regex

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 3 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ?
      A Former User
      last edited by

      Guys,

      A little help?

      My database is exporting data jumping some lines of code.

      It’s exporting a CSV like this (first line is the fields):

      /--------------------/
      id, location, data, name, description
      ObjectId(“662626”), rio, 12/12/2018,
      this is a name, this is a description
      ObjectId(“2342342626”, saopaulo, 14/12/2018,
      this is also a name, another description
      /--------------------/

      I need it to be:

      /--------------------/
      id, location, data, name, description
      ObjectId(“662626)”, rio, 12/12/2018, this is a name, this is a description
      ObjectId(“2342342626”), saopaulo, 14/12/2018, this is also a name, another description
      /--------------------/

      Another catch is that not always this happens. Sometimes is in the same line, sometimes it skips 3 lines.

      The good thing is that, except the header, all lines should begin with ObjectId("

      Any suggestion?

      Thanks!

      1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones
        last edited by

        I can get one that works when each record is only broken into 2 lines…

        • (?s)(ObjectID\(.*?)(\R+)(.*?)(?=\R+ObjectID|\R+$)
        • \1 \3
        • Search Mode = ☑ Regular Expression

        But that won’t work when the record is broken into 3 or more lines…

        This seems to work with any number of lines, when run repeatedly (until all are merged)

        • (?-s)(ObjectID\(.*?)\h*\R+(?!^ObjectID\()
        • \1\x20
        • Search Mode = ☑ Regular Expression

        I don’t like that you have to run it multiple times, but it’s better than nothing, and will work for all multiline (and won’t mess up the single-line versions)

        • The search says “dot doesn’t match newlines”, “match literal ObjectID( followed by 0 or more non-newline characters and capture into group \1”, “match zero or more horizontal spaces (space, tab), followed by one or more newlines, but don’t capture/keep those”, and “make sure that the match is followed by another literal ObjectID(, but don’t consume that”
        • The replace says “replace everything matched (except the not-consumed stuff) with the first group, followed by the space character (aka \x20)”

        I am sure @guy038 could get it in one run of the regex engine. But I don’t have such awesome regex skills

        The second one worked on this example text using just four runs of the regex:

        ObjectId(“662626”), rio, 12/12/2018,
        this is a name, this is a description
        ObjectId(“2342342626”, saopaulo, 14/12/2018,
        this is also a name, another description
        
        ObjectId(“2342342626”, 
        saopaulo, 
        14/12/2018,
        this is also a name, 
        another description
        

        --------
        FYI: if you have further regex needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting the data so that the forum doesn’t mangle it (so that it shows “exactly”, as I said earlier), see this help-with-markdown post, where @Scott-Sumner gives a great summary of how to use Markdown for this forum’s needs.
        Please note that for all “regex” queries – or queries where you want help “matching” or “marking” or “bookmarking” a certain pattern, which amounts to the same thing – it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.

        1 Reply Last reply Reply Quote 1
        • Terry RT
          Terry R
          last edited by

          @Edgard etc

          I’m not at a PC currently so cannot provide a tested solution but if your example is accurate it would seem that the splitting occurs at a delimiter, namely the , comma. If that is the case then all we need to do is search for a comma immediately before a end of line marker, and replace both with just the comma.
          So the regex might look something like
          Find what::,$
          Replace with:,

          See if that works and let us know.

          Terry

          1 Reply Last reply Reply Quote 1
          • Terry RT
            Terry R
            last edited by

            Actually ,\r\n for the find what might be better, that’s what I normally use. I suspect the $ amounts to a zero width position, thus it won’t actually replace the carriage return line feed.

            Terry.

            1 Reply Last reply Reply Quote 1
            • PeterJonesP
              PeterJones
              last edited by

              @Terry-R,

              I thought about limiting it to comma-EOL, but the OP wasn’t clear when it split to multiple lines, whether it would always split on a comma, or if it was possible to split in the middle of a field, like:

              ObjectId(“662626”), rio, 12/12/2018,
              this is a name, this is a description
              ObjectId(“2342342626”, saopaulo, 14/12/2018,
              this is also a name, another description
              
              ObjectId(“2342342626”, 
              saopaulo, 
              14/12/2018,
              this 
              is 
              also 
              a 
              name, 
              another 
              description
              

              If it could never be in the middle of a field, then ,\h*\R -> ,\x20 will work

              1 Reply Last reply Reply Quote 0
              • PeterJonesP
                PeterJones
                last edited by PeterJones

                But that gave me another idea, if it could break inside a field: look for one or more newline, as long as it’s not followed by literal ObjectID(, and replace with a space:

                • \R+(?!ObjectID\() \h*\R+(?!ObjectID\()
                • \x20
                • Search Mode = ☑ Regular Expression

                This does it in one go, even for my most-recent data example

                edit: change to \h*\R+(?!ObjectID\(), so that any whitespace before the newline is also collapsed into the single space (\x20) in the replace string.

                1 Reply Last reply Reply Quote 2
                • ?
                  A Former User
                  last edited by

                  The problem is, I noticed, that not always the lines has commas. like:

                  /--------------------/
                  id, location, data, name, description
                  ObjectId(“662626”), rio, 12/12/2018,
                  this is a name, this is a description
                  description continues
                  more description
                  ObjectId(“2342342626”, saopaulo, 14/12/2018,
                  this is also a name, another description
                  still description
                  more
                  and more
                  /--------------------/

                  Thats why it wont work, Terry :-(

                  1 Reply Last reply Reply Quote 0
                  • ?
                    A Former User
                    last edited by

                    @PeterJones said:

                    \h*\R+(?!ObjectID()

                    Looks like it works!!! Amazing, Peter!!! I’ll test if everything is OK!!!

                    1 Reply Last reply Reply Quote 1
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors