replace \r\n with ' ' for lines enclosed in " "
-
Re: replace \r\n with ’ ’ for lines enclosed in " "
I have some sql queries in a text doc that needs some editing. most of the queries are on a single line. But some of them have \r\n between clauses. these are enclosed in " "
Example below are 3 sql queries. the second query (enclosed in " ") is on multiple lines. I want all sql queries on a single line
select * from table1\r\n
“select *\r\n
from table1\r\n
where column1 in (1,2,3)”\r\n
select * from table3\r\n -
@brenda-gross Please let us know how the output/desired result should be
-
@brenda-gross said in replace \r\n with ' ' for lines enclosed in " ":
I want all sql queries on a single line
I will first point out that you needed to display the example in a “code box”, reference the FAQ post here.
In your example you used double quotes and this forum tends to muck around with these so what actually posted isn’t likely to be what is copied for us to test against. Secondly, I presume you typed the
\r\n
in the example in an attempt to portray the actual line feed and carriage return shown in your real data. I also presumed that the 2nd (and 3rd) line of the multi line query are left adjusted in the real data, so no leading space, just as your example shows. 3 good reasons for using the code box to show examples.Also as the other post asks, showing the example after the data has been changed would be very helpful. For example the below solution does NOT remove the double quotes. No doubt that may be something you wanted. If so, you should be able to achieve that in a separate search/replace step.
There is a generic regex provided, again in the FAQ section here. Unfortunately if you had used it, it would still have not worked entirely correctly. As it stands, my test suggests it would have consumed the line feed/carriage return immediately following the multi-line query. I had to adjust it slightly to compensate. I will acknowledge @guy038 as the originator of that regex in the hopes he may check it and consider if it really should be adjusted for a
\R
immediately following the ESR (end search region) as I believe it should.If I have read the question correctly I think the following regex should work for you. Obviously test it on a copy of your data just in case my assumptions are incorrect or you have more information on the situation which may alter the outcome.
Using the Replace function (search mode is
regular expression
) we have:
Find What:(?-si:"(?!\R)|(?!\A)\G)(?s-i:(?!").)*?\K(?-si:\R)
Replace With: there is a single space here. If my assumption above on the leading space is incorrect, remove this space, so an empty field.Terry
-
desired output.
select * from table1\r\n “select * from table1 where column1 in (1,2,3)”\r\n select * from table3\r\n
-
@Terry-R I have a lot of testing to do (my input file is 30mb). But that seems to work for my test sample. Thanks!
-
@brenda-gross said in replace \r\n with ' ' for lines enclosed in " ":
I have a lot of testing to do (my input file is 30mb). But that seems to work for my test sample.
Good that you have shown the desired output in a code box. Unfortunate that the double quotes aren’t what I used in my solution but from your post it seems you did adjust for that. I will say though, that you still entered the
\r\n
. Was that intentional as in are these characters at the end of the line, as well as the actual line feed/carriage return characters?I would suggest turning on View, Show Symbol, show all characters, or at least show end of line. If you do, you will see something like this at the end of every line:
This is the actual carriage return and line feeds which you duplicated by typing in the \r and \n characters. It is advisable not to enter additional control characters such as these unless they are actually in the line, not as I suspect, the end of line/new line control string.As you have a very large file to process, you will want to consider some other steps:
- Before processing count the number of multi-line queries. That might seem hard but using the Find function, you will see another button called count. If you use the following in the Find What field and press count it will tell you the number.
Find What:(?-s)^“[^“”\r\n]*$
This looks for any opening double quotes without a following ending double quote on the same line, so a multi-line query. After processing you could repeat this count to confirm none left. The count in step 1 should not be larger than the processing count. If it was then definitely some multi-line queries were not changed.
There may be other steps you can take such as spot checks to confirm before accepting the results. Use the last regex in the Mark function and also click on “bookmark line”. Click Mark All. If the number is 0 then great, otherwise bookmarking will allow you to quickly progress through the file using the F2 and Shift-F2 keys to jump from one bookmark to the next.
I would suggest that if the double quotes are to be eliminated, leave that as the last step once you have completed the steps above to verify validity of the changed data.
Good luck
Terry - Before processing count the number of multi-line queries. That might seem hard but using the Find function, you will see another button called count. If you use the following in the Find What field and press count it will tell you the number.
-
Hello, @brenda-gross, @dr-ramaanand, @terry-r and All,
@terry-r, your use of the search regex
(?-si:"(?!\R)|(?!\A)\G)(?s-i:(?!").)*?\K(?-si:\R)
is just clever, because :-
Any string may occur after the first
"
character ( the word Select or else ) -
It ensure that, if the global replacement is re-run, no change will occur again, thanks to the negative look-ahead
(?!\R)
after the double-quote character
Now, Terry, we still can tune this search regex !
-
In all parts of this regex, alphabetic characters are not involved. So, we can omit the
-i
modifiers -
In the first and last part of this regex, the regex
.
character is not involved, too. Thus, we can omit the-s
modifiers as well !
Thus, the shortest syntax of the generic regex S/R is simply :
FIND
(?:"(?!\R)|(?!\A)\G)(?s:(?!").)*?\K(?:\R)
REPLACE
\x20
Note that, if the OP speak of the smart quotes
“
and”
, of Unicode valueU+201C
andU+201D
, this regex S/R becomes :FIND
(?:“(?!\R)|(?!\A)\G)(?s:(?!”).)*?\K(?:\R)
REPLACE
\x20
So, @brenda-gross, just follow this road map :
-
Open your file in Notepad++
-
Add an empty line at the very beginning of your file, for security
-
Hit the
Ctrl + Home
shortcut to move at the very beginning of your file ( IMPORTANT ) -
Open the Replace dialog
-
Uncheck all box options
-
According to your case, use, either :
-
FIND
(?:"(?!\R)|(?!\A)\G)(?s:(?!").)*?\K(?:\R)
-
FIND
(?:“(?!\R)|(?!\A)\G)(?s:(?!”).)*?\K(?:\R)
-
-
REPLACE
\x20
-
Select the
Regular expression
search mode -
Click, once only, on the
Replace All
button
=> All the lines between regular or smart quotes should be displayed in single lines !
Now, if you want to get rid of these quote characters, at beginning and end of some lines :
-
Move at the very beginning of your file ( IMPORTANT )
-
Find
^"|"$
or^“|”$
-
REPLACE
Leave EMPTY
-
Select the
Regular expression
search mode -
Click once on the
Replace All
button
Best Regards,
guy038
-