python - string.decode() function in python2 -
so converting code python2 python3. don't understand python2 encode/decode functionality enough determine should doing in python3
in python2, can following things:
>>> c = '\xe5\xb8\x90\xe6\x88\xb7' >>> print c 帐户 >>> c.decode('utf8') u'\u5e10\u6237'
what did there? doesn't 'u' prefix mean unicode? shouldn't utf8 '\xe5\xb8\x90\xe6\x88\xb7' since input in first place?
your variable c not declared unicode (with prefix 'u'). if decode using 'latin1' encoding same result:
>>> c.decode('latin1') u'\xe5\xb8\x90\xe6\x88\xb7'
note result of decode
unicode string:
>>> type(c) <type 'str'> >>> type(c.decode('latin1')) <type 'unicode'>
if declare c unicode , keep same input, not print same characters:
>>> c=u'\xe5\xb8\x90\xe6\x88\xb7' >>> print c å¸æ·
if use input '\u5e10\u6237', print initial characters:
>>> c=u'\u5e10\u6237' >>> print c 帐户
encoding , decoding matter of using table of correspondence value<->character. thing same value not render same character according encoding (ie table) used.
the main difficulty when don't know encoding of input string have handle. tools can try guess it, not successful (see https://superuser.com/questions/301552/how-to-auto-detect-text-file-encoding).
Comments
Post a Comment