python - Converting string.decode('utf8') from python2 to python3 -
i converting code python2 python3.
in python2, can following things:
>>> c = '\xe5\xb8\x90\xe6\x88\xb7' >>> print c 帐户 >>> c.decode('utf8') u'\u5e10\u6237'
how can same output (u'\u5e10\u6237') in python3?
edit
for else problem, realized after looking @ the responses make use of result each character needs treated individual element. escaped unicode representation '\u5e10\u6237' string not naturally divide parts correspond original chinese characters.
>>> c = '帐户' >>> type(c.encode('unicode-escape').decode('ascii')) <class 'str'> >>> [l l in c.encode('unicode-escape').decode('ascii')] ['\\', 'u', '5', 'e', '1', '0', '\\', 'u', '6', '2', '3', '7']
you have separate each character in input string , translate separately array unless want parse again in next part of program. solution thus:
>>> [l.encode('unicode-escape').decode('ascii') l in c] ['\\u5e10', '\\u6237']
an alternate solution make each character hex representation:
>>> [hex(ord(l)) l in c] ['0x5e10', '0x6237']
thanks help.
this called "unicode-escape" encoding. here example of how 1 achieve behavior in python3:
in [11]: c = b'\xe5\xb8\x90\xe6\x88\xb7' in [12]: d = c.decode('utf8') in [13]: print(d) 帐户 in [14]: print(d.encode('unicode-escape').decode('ascii')) \u5e10\u6237
if want bytes
, not str
, can rid of .decode('ascii')
.
Comments
Post a Comment