python - string.decode() function in python2 -


so converting code python2 python3. don't understand python2 encode/decode functionality enough determine should doing in python3

in python2, can following things:

>>> c = '\xe5\xb8\x90\xe6\x88\xb7' >>> print c 帐户 >>> c.decode('utf8') u'\u5e10\u6237' 

what did there? doesn't 'u' prefix mean unicode? shouldn't utf8 '\xe5\xb8\x90\xe6\x88\xb7' since input in first place?

your variable c not declared unicode (with prefix 'u'). if decode using 'latin1' encoding same result:

>>> c.decode('latin1') u'\xe5\xb8\x90\xe6\x88\xb7' 

note result of decode unicode string:

>>> type(c) <type 'str'> >>> type(c.decode('latin1')) <type 'unicode'> 

if declare c unicode , keep same input, not print same characters:

>>> c=u'\xe5\xb8\x90\xe6\x88\xb7' >>> print c å¸æ· 

if use input '\u5e10\u6237', print initial characters:

>>> c=u'\u5e10\u6237' >>> print c 帐户 

encoding , decoding matter of using table of correspondence value<->character. thing same value not render same character according encoding (ie table) used.

the main difficulty when don't know encoding of input string have handle. tools can try guess it, not successful (see https://superuser.com/questions/301552/how-to-auto-detect-text-file-encoding).


Comments

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

configurationsection - activeMq-5.13.3 setup configurations for wildfly 10.0.0 -