python - Converting string.decode('utf8') from python2 to python3 -

- April 15, 2015

i converting code python2 python3.

in python2, can following things:

>>> c = '\xe5\xb8\x90\xe6\x88\xb7' >>> print c 帐户 >>> c.decode('utf8') u'\u5e10\u6237'

how can same output (u'\u5e10\u6237') in python3?

edit

for else problem, realized after looking @ the responses make use of result each character needs treated individual element. escaped unicode representation '\u5e10\u6237' string not naturally divide parts correspond original chinese characters.

>>> c = '帐户' >>> type(c.encode('unicode-escape').decode('ascii')) <class 'str'> >>> [l l in c.encode('unicode-escape').decode('ascii')] ['\\', 'u', '5', 'e', '1', '0', '\\', 'u', '6', '2', '3', '7']

you have separate each character in input string , translate separately array unless want parse again in next part of program. solution thus:

>>> [l.encode('unicode-escape').decode('ascii') l in c] ['\\u5e10', '\\u6237']

an alternate solution make each character hex representation:

>>> [hex(ord(l)) l in c] ['0x5e10', '0x6237']

thanks help.

this called "unicode-escape" encoding. here example of how 1 achieve behavior in python3:

in [11]: c = b'\xe5\xb8\x90\xe6\x88\xb7'  in [12]: d = c.decode('utf8')  in [13]: print(d) 帐户  in [14]: print(d.encode('unicode-escape').decode('ascii')) \u5e10\u6237

if want bytes , not str, can rid of .decode('ascii').

Search This Blog

M16

python - Converting string.decode('utf8') from python2 to python3 -

Comments

Post a Comment

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

android - CoordinatorLayout, FAB and container layout conflict -