Why UTF-8 and Not ASCII for Portuguese? (PART II)

2012-08-01Nilo Menezes

Continuation of the post, originally made in the Python-Brasil list:

I’ll try again, as the thread has already discussed three different things:

Code to use in Python programs: why UTF-8 is highly recommended
Encodings in general and problems caused and solved by it
A bug in Python on Windows, when the prompt is set to page 65001

I’ll try to explain for everyone, as it’s a recurring topic.

But before getting back to these topics, we have to go back to files.

Why UTF-8 and Not ASCII for Portuguese? (PART II)

2012-08-01Nilo Menezes

#utf8 #python

Continuation of the post, originally made in the Python-Brasil list:

I’ll try again, as the thread has already discussed three different things:

Code to use in Python programs: why UTF-8 is highly recommended
Encodings in general and problems caused and solved by it
A bug in Python on Windows, when the prompt is set to page 65001

I’ll try to explain for everyone, as it’s a recurring topic.

But before getting back to these topics, we have to go back to files.

Why UTF-8 and Not ASCII for Portuguese? (PART I)

2012-07-29Nilo Menezes

#utf8 #python

A fellow blogger on Python-Brasil:

The colleagues have already talked about why UTF-8.

I just want to remind that the subject is more complicated than it seems, for example in Python 2.7:

# -*- coding: utf-8 -*-
print "Accents: áéíóúãõç"
print u"Accents2: áéíóúãõç"

Run the program above on Windows, either through IDLE or console:

C:\Users\nilo\Desktop>\Python27\python.exe test.py
Accents: ├í├®├¡├│├║├º├ú├Á
Accents2: áéíóúçãõ

You should have obtained good results only on the Accents2 line. If the string is not marked with unicode, it will be simply printed as a sequence of bytes, without translation. If you have u in front, like in accents2, Python gets that it needs to translate from unicode to cp850, in this case of console here at home. Already on Linux, both lines produce correct results!

Why UTF-8 and Not ASCII for Portuguese? (PART I)

2012-07-29Nilo Menezes

#utf8 #python

A fellow blogger on Python-Brasil:

The colleagues have already talked about why UTF-8.

I just want to remind that the subject is more complicated than it seems, for example in Python 2.7:

# -*- coding: utf-8 -*-
print "Accents: áéíóúãõç"
print u"Accents2: áéíóúãõç"

Run the program above on Windows, either through IDLE or console:

C:\Users\nilo\Desktop>\Python27\python.exe test.py
Accents: ├í├®├¡├│├║├º├ú├Á
Accents2: áéíóúçãõ

You should have obtained good results only on the Accents2 line. If the string is not marked with unicode, it will be simply printed as a sequence of bytes, without translation. If you have u in front, like in accents2, Python gets that it needs to translate from unicode to cp850, in this case of console here at home. Already on Linux, both lines produce correct results!

Posts for: #Utf8

Why UTF-8 and Not ASCII for Portuguese? (PART II)

Why UTF-8 and Not ASCII for Portuguese? (PART II)

Why UTF-8 and Not ASCII for Portuguese? (PART I)

Why UTF-8 and Not ASCII for Portuguese? (PART I)