Filter lines

Jose Emilio Osorio

How to filter lines with less or more than “X” number of characters ?. Thanks

Terry R

Use the Search, Mark option from the menu and type the following into the find what window.

^(?|.{1,37}|.{39,200})\R

In this example we will mark every line that isn’t 38 characters long. So subtract 1 from x, then replace the 37. Then add 1 to x, and replace the 39. You may also need to make the number 200 larger if the line might exceed it.

I’ve tried to use a different method so you can just use the number x once in the expression, but currently that eludes me, this would come in as a good 2nd choice.

Terry

Terry R

My previous example will NOT grab the last line in a file unless it has another carriage return\line feed after it. A revised regex would be:

^(?|.{1,37}|.{39,200})(?|\R|\z)

So the ‘\z’ takes care of that last line if it’s needed.

I also have another option, better than this but still not exactly what I was searching for. You could use:

^(.{38})(?|\R|\z)

to mark the lines that DO meet the criteria. So replace 38 with x, your number. You would also tick the box to ‘bookmark line’. Once you have done that close, then back under the search menu option is bookmark. Under this is the ability to ‘inverse bookmark’, so you de-select the ones which DO meet the criteria and instead bookmark those which do NOT meet it. From the same menu option you could remove those lines, or cut them out for pasting elsewhere.

Terry

Jose Emilio Osorio

Thank you very much.

guy038

Hi, @terry-r and All,

In your last post, Terry, the (?|\R|\z) regex syntax is, for instance, a branch reset group structure. However, as it is said here :

If you don’t use any alternation or capturing groups inside the branch reset group, then its special function doesn’t come into play. It then acts as a non-capturing group.

So, I don’t think that, in that specific case, the branch reset group syntax was necessary ;-))

And for everybody, to, clearly, show the difference between a capturing list of alternatives and a branch reset group, let’s consider the two following regexes, which matches, either, one of the 5-chars strings : axyzw, apqrw or atuvw

A) with a list of consecutive alternatives, in a capturing group :

(a)(x(y)z|(p(q)r)|(t)u(v))(w)

- When the regex matches axyzw, group 1 = a, group 2 = x(y)z, group 3 = y and group 8 = w
- When the regex matches apqrw, group 1 = a, group 2 = (p(q)r), group 4 = p(q)r, group 5 = q and group 8 = w
- When the regex matches atuvw, group 1 = a, group 2 = (t)u(v), group 6 = t, group 7 = v and group 8 = w
B) with a branch reset group ( NOT a capturing group itself ! ) :

(a)(?|x(y)z|(p(q)r)|(t)u(v))(w)

- When the regex matches axyzw, group 1 = a, group 2 = y, group 3 is undefined and group 4 = w
- When the regex matches apqrw, group 1 = a, group 2 = p(q)r, group 3 = q and group 4 = w
- When the regex matches atuvw, group 1 = a, group 2 = t, group 3 = v and group 4 = w

An other example. Given that text :

abcdefg
hijklmn
opqrstu

The regex S/R :

SEARCH (abcdefg)|(hijklmn)|(opqrstu)

REPLACE ==\1\2\3== OR, also, ==$0==

would change the text as :

==abcdefg==
==hijklmn==
==opqrstu==

Note that, in the syntax ==\1\2\3==, only one group is defined, at a time and the two others are just “empty” groups !

Now, with the same initial text, the regex S/R, below :

SEARCH ab(cde)fg|hi(jkl)mn|op(qrs)tu

REPLACE ==\1\2\3==

gives :

==cde==
==jkl==
==qrs==

whereas the following regex S/R, with a branch reset group and only group 1, in replacement :

SEARCH (?|ab(cde)fg|hi(jkl)mn|op(qrs)tu)

REPLACE ==\1==

would produce the same results

…and the regex S/R :

SEARCH (?|ab(cde)fg|hi(jkl)mn|op(qrs)tu)

REPLACE ==\1\1\1==

would give :

==cdecdecde==
==jkljkljkl==
==qrsqrsqrs==

Best Regards,

guy038