Thursday, December 23, 2010

Converting File Encodings

MacOSX

2010/09/28

Recently I had downloaded a csv file with the intention of extracting some data to satisfy my curiousity about something. I wrote a little Perl script to slice and dice the data, and that would have been that - except I wanted to know something quickly from the original file, so I did something like "grep whatever 2003.csv".

I got back nothing.

That's odd, I thought. I know that "whatever" is in there. So I fired up vim and did "/whatever" and, sure enough, there it was.

So why couldn't I extract it with grep?

Hmm. Let's do a "more". Ooops! After warning my that "2003.csv" may be a binary file. See it anyway?, "more" showed me a mess.

more of 16 bit file

Well, duh, that's why I couldn't grep from the file - the darn thing is utf-16!

So, what can you do if faced with this situation? You have a few choices. You could ask vim to rewrite it. That's easy:

 :w ++enc=latin1 

Vim can do all sorts of file encoding rewriting; see Using another encoding in the VIM docs.

You could use Perl to rewrite the file, though Perl has some funny ideas about what utf8 means, plus some other oddities here and there.

At the Terminal command line, you can use "iconv":

  iconv -f utf-16 -t utf-8 2003.csv  | grep whatever 

Though that gets old fast, so I just converted the file.

Wouldn't it have been nice if we never had 7 or 8 bit encodings?

Comments: Click Here.

Want to showcase your product to our audience? Check our advertising options.



Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them.

I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. If you have any question, please do feel free to contact me.


Source: http://aplawrence.com/MacOSX/convert-encodings.html

colt brennan accident cell phone eva longoria ipad iphone

No comments:

Post a Comment