Tuesday, April 12, 2011

Converting Unix newlines (LF) to Windows newlines (CR\LF)

In one of my earlier projects, I faced an issue with newline characters in csv file. The csv file generated on a unix system was uploaded to Windows system through sftp. However, the program on the Windows server was not able to parse the file properly as the csv file contained Unix newlines (LF) instead of Windows newlines (CR\LF) at the end of each line.

The problem is UNIX/Linux uses a Line Feed character (\n) as a line terminator while Windows uses Carriage Return\Line Feed pairs (\r\n).

For those who want to know what is the exact difference between CR, LF and EOL characters, here is a brief description :-

The Carriage Return (CR) character (0x0D, \r) moves the cursor to the beginning of the line without advancing to the next line. This character is used as a new line character in Commodore and Early Macintosh operating systems (OS-9 and earlier).

The Line Feed (LF) character (0x0A, \n) moves the cursor down to the next line without returning to the beginning of the line. This character is used as a new line character in UNIX based systems (Linux, Mac OSX, etc)

The End of Line (EOL) character (0x0D0A, \r\n) is actually two ASCII characters and is a combination of the CR and LF characters. It moves the cursor both down to the next line and to the beginning of that line. This character is used as a new line character in most other non-Unix operating systems including Microsoft Windows, Symbian OS and others.

I was able to resolve this issue with 'Sed' (Stream Editor). (http://www.grymoire.com/Unix/Sed.html)

The command matches the regexp pattern ($ - ending position of line or the position just before a string-ending newline) and replaces it with '\r' (Carriage Return). So the end result would be conversion from '\n' to '\r\n' which will support windows new line format.

1 comment: