Python3ä¸çç¼ç é®é¢åï¼ç¬¬ä¸ä¸ªæ®µè½å¯¹åèãASCIIä¸Unicodeä¸UTF-8çè¿è¡åºæ¬ä»ç»ï¼å¦æä¸å¯¹è¿å ç§ç¼ç ç¯å¤´æï¼å¯ç´æ¥è·³è¿ã
ASCIIä¸Unicodeä¸UTF-8ä¸GBK
é¦å
ä»è大å¥è¯´èµ·ãè·å¾å¤äººä¸æ ·ï¼å¤§å¦è¯»äºè¿ä¹ä¹
ï¼ä¹
ä»°ASCIIç¼ç ç大åãè¦è¯´è¿ä¸ªè大å¥ï¼æ们åå
ä»åè说起ãä¸ä¸ªåèå
æ¬å
«ä¸ªæ¯ç¹ä½ï¼æ¯ä¸ªæ¯ç¹ä½è¡¨ç¤º0æ1ï¼ä¸ä¸ªåèå³å¯è¡¨ç¤ºä»00000000å°11111111å
±2^8=256个æ°åãä¸ä¸ªASCIIç¼ç 使ç¨ä¸ä¸ªåèï¼é¤å»åèçæé«ä½ä½ä¸ºä½å¥å¶æ ¡éªä½ï¼ï¼ASCIIç¼ç å®é
使ç¨ä¸ä¸ªåèä¸ç7个æ¯ç¹ä½æ¥è¡¨ç¤ºå符ï¼å
±å¯è¡¨ç¤º2^7=128个å符ãæ¯å¦é£æ¶åCè¯è¨çç¨åºï¼å°±ç»å¸¸è¦èä¸ASCIIç¼ç ä¸ç01000001ï¼å³åè¿å¶ç65ï¼è¡¨ç¤ºå符âAâï¼01000001å ä¸32ä¹åç01100001ï¼å³åè¿å¶ç97ï¼è¡¨ç¤ºå符âaâãç°å¨æå¼Pythonï¼è°ç¨chråordå½æ°ï¼æ们å¯ä»¥çå°Python为æ们对ASCIIç¼ç è¿è¡äºè½¬æ¢ã
第ä¸ä¸ª00000000表示空å符ï¼å æ¤ASCIIç¼ç å®é
ä¸åªå
æ¬äº
åæ¯ãæ ç¹ç¬¦å·ãç¹æ®ç¬¦å·çå
±127个å符ãå 为ASCIIæ¯å¨ç¾å½åºççï¼å¯¹äºç±åæ¯ç»æåè¯è¿èç¨åè¯è¡¨è¾¾çè±ææ¥è¯´ä¹æ¯å¤äºãä½æ¯ä¸å½äººãæ¥æ¬äººã
é©å½äººçå
¶ä»è¯è¨ç人ä¸æäºãä¸ææ¯ä¸ä¸ªåä¸ä¸ªåï¼ASCIIç¼ç ç¨ä¸äºæµèº«è§£æ°256个å符é½ä¸å¤ç¨ã
å æ¤åæ¥åºç°äºUnicodeç¼ç ãUnicodeç¼ç é常ç±ä¸¤ä¸ªåèç»æï¼å
±è¡¨ç¤º256*256个å符ï¼å³æè°çUCS-2ãæäºåå»åè¿ä¼ç¨å°å个åèï¼å³æè°çUCS-4ãä¹å°±æ¯è¯´Unicodeæ åä¹è¿å¨åå±ãä½UCS-4åºç°çæ¯è¾å°ï¼æ们å
è®°ä½ï¼æåå§çASCIIç¼ç 使ç¨ä¸ä¸ªåèç¼ç ï¼ä½ç±äºè¯è¨å·®å¼å符ä¼å¤ï¼äººä»¬ç¨ä¸äºä¸¤ä¸ªåèï¼åºç°äºç»ä¸çãåæ¬å¤å½è¯è¨çUnicodeç¼ç ã
å¨Unicodeä¸ï¼åæ¬ASCIIä¸ç127个å符åªéå¨åé¢è¡¥ä¸ä¸ªå
¨é¶çåèå³å¯ï¼æ¯å¦åæè°å°çå符âaâï¼01100001ï¼å¨Unicodeä¸åæäº00000000 01100001ãä¸ä¹
ï¼ç¾å½äººä¸å¼å¿äºï¼åä¸äºä¸çæ°æä¹æç大é
é¥ï¼åæ¬åªéä¸ä¸ªåèå°±è½ä¼ è¾çè±æç°å¨åæ两个åèï¼é常浪费åå¨ç©ºé´åä¼ è¾é度ã
人们ååæ¥èªæææºï¼äºæ¯åºç°äºUTF-8ç¼ç ãå 为é对çæ¯ç©ºé´æµªè´¹é®é¢ï¼å æ¤è¿ç§UTF-8ç¼ç æ¯å¯åé¿ççï¼ä»è±æåæ¯çä¸ä¸ªåèï¼å°ä¸æçé常çä¸ä¸ªåèï¼åå°æäºçå»åçå
个åèã解å³äºç©ºé´é®é¢ï¼UTF-8ç¼ç è¿æä¸ä¸ªç¥å¥çéå åè½ï¼é£å°±æ¯å
¼å®¹äºè大å¥çASCIIç¼ç ãä¸äºèå¤è£è½¯ä»¶ç°å¨å¨UTF-8ç¼ç ä¸å¯ä»¥ç»§ç»å·¥ä½ã
注æé¤äºè±æåæ¯ç¸åï¼æ±åå¨Unicodeç¼ç åUTF-8ç¼ç ä¸é常æ¯ä¸åçãæ¯å¦æ±åçâä¸âåå¨Unicodeä¸æ¯01001110
00101101ï¼èå¨UTF-8ç¼ç ä¸æ¯11100100 10111000
10101101ã
æ们ç¥å½æ¯äº²èªç¶ä¹æèªå·±çä¸å¥æ åãé£å°±æ¯GB2312åGBKãå½ç¶ç°å¨æºå°çå°ãé常é½æ¯ç´æ¥ä½¿ç¨UTF-8ãè®°å¾æå¯ä¸ä¸æ¬¡çå°GBç¼ç çç½é¡µï¼æ¯ä¸ä¸ªæ人ç½ç«ã
Python3ä¸çé»è®¤ç¼ç
Python3ä¸é»è®¤æ¯UTF-8ï¼æ们éè¿ä»¥ä¸ä»£ç ï¼
import sys
sys.getdefaultencoding()
å¯æ¥çPython3çé»è®¤ç¼ç ã
Python3ä¸çencodeådecode
Python3ä¸å符ç¼ç ç»å¸¸ä¼ä½¿ç¨å°decodeåencodeå½æ°ãç¹å«æ¯å¨æåç½é¡µä¸ï¼è¿ä¸¤ä¸ªå½æ°ç¨ççç»é常æ好å¤ãæçç解ï¼encodeçä½ç¨ï¼ä½¿æ们çå°çç´è§çå符转æ¢æ计ç®æºå
çåèå½¢å¼ãdecodeå好ç¸åï¼æåèå½¢å¼çå符转æ¢ææ们ççæçãç´è§çãâäººæ¨¡äººæ ·âçå½¢å¼ãå¦ä¸å¾ã
\x表示åé¢æ¯åå
è¿å¶ï¼\xe4\xb8\xadå³æ¯äºè¿å¶ç11100100 10111000
10101101ãä¹å°±æ¯è¯´æ±åâä¸âencodeæåèå½¢å¼ï¼æ¯11100100 10111000
10101101ãåçï¼æ们æ¿11100100
10111000 10101101ä¹å°±æ¯\xe4\xb8\xadæ¥decodeåæ¥ï¼å°±æ¯æ±åâä¸âãå®æ´çåºè¯¥æ¯b'\xe4\xb8\xad'ï¼å¨Python3ä¸ï¼ä»¥åèå½¢å¼è¡¨ç¤ºçå符串åå¿
é¡»å ä¸åç¼bï¼ä¹å°±æ¯åæä¸æçb'xxxx'å½¢å¼ã
åæ说çPython3çé»è®¤ç¼ç æ¯UTF-8ï¼æ以æ们å¯ä»¥çå°ï¼Pythonå¤çè¿äºå符çæ¶åæ¯ä»¥UTF-8æ¥å¤ççãå æ¤ä»ä¸å¾å¯ä»¥çå°ï¼å°±ç®æ们éè¿encode('utf-8')ç¹ææå符encode为UTF-8ç¼ç ï¼åºæ¥çç»æè¿æ¯ç¸åï¼b'\xe4\xb8\xad'ã
æç½äºè¿ä¸ç¹ï¼åæ¶æ们ç¥éUTF-8å
¼å®¹ASCIIï¼æ们å¯ä»¥çæ³å¤§å¦æ¶ç»å¸¸è诵çâAâ对åºASCIIä¸ç65ï¼å¨è¿éæ¯ä¸æ¯ä¹è½æ£ç¡®çdecodeåºæ¥å¢ãåè¿å¶ç65转æ¢æåå
è¿å¶æ¯41ï¼æ们å°è¯ä¸ï¼
b'\x41'.decode()
ç»æå¦ä¸ãæç¶æ¯å符âAâ
Python3ä¸çç¼ç 转æ¢
æ®è¯´å符å¨è®¡ç®æºçå
åä¸ç»ä¸æ¯ä»¥Unicodeç¼ç çãåªæå¨å符è¦è¢«åè¿æ件ãåè¿ç¡¬çæè
ä»æå¡å¨åéè³å®¢æ·ç«¯ï¼ä¾å¦ç½é¡µå端ç代ç ï¼æ¶ä¼åæutf-8ãä½å
¶å®ææ¯è¾å
³å¿æä¹æè¿äºå符以Unicodeçåèå½¢å¼è¡¨ç°åºæ¥ï¼é²åºå®å¨å
åä¸çåºå±±æ£é¢ç®çãè¿éæ个ç
§å¦éï¼
xxxx.encode/decode('unicode-escape')
è¾åºå¦ä¸
b'\\u4e2d'è¿æ¯b'\u4e2dï¼ä¸ä¸ªææ è²ä¼¼æ²¡å½±åãåæ¶å¯ä»¥åç°å¨shellçªå£ä¸ï¼ç´æ¥è¾'\u4e2d'åè¾å
¥b'\u4e2d'.decode('unicode-escape')æ¯ç¸åçï¼é½ä¼æå°åºæ±åâä¸âï¼åèæ¯'\u4e2d'.decode('unicode-escape')ä¼æ¥éã说æ说æPython3ä¸ä»
æ¯æUnicodeï¼èä¸ä¸ä¸ªâ\uxxxxâæ ¼å¼çUnicodeå符å¯è¢«è¾¨è¯ä¸è¢«çä»·äºstrç±»åã
å¦ææ们ç¥éä¸ä¸ªUnicodeåèç ï¼æä¹åæUTF-8çåèç å¢ãæäºä»¥ä¸è¿äºï¼ç°å¨æ们就ææè·¯äºï¼å
decodeï¼åencodeã代ç å¦ä¸ï¼
xxx.decode('unicode-escape').encode()
æµè¯å¦ä¸ï¼
å¯ä»¥çå°æåè¾åºçUTF-8åèä¸ä¸é¢çç¸åãå°è¯æåãæ以å
¶ä»çç¼ç ä¹é´ç转æ¢ï¼å¤§æ¦ä¹æ¯å¦æ¤ã
æåçæ©å±
è¿è®°å¾ååé£ä¸ªordåãæ¶ä»£åè¿ï¼è大å¥ASCII被人å并ï¼ä½ordè¿æ¯æç¨æ¦ä¹å°ãè¯è¯ord('ä¸')ï¼è¾åºç»ææ¯20013ã20013æ¯ä»ä¹å¢ï¼æ们åè¯è¯hex(ord('ä¸'))ï¼è¾åºç»ææ¯'0x4e2d'ï¼ä¹å°±æ¯20013æ¯æ们å¨ä¸æè§é¢äºæ æ°æ¬¡çx4e2dçåè¿å¶å¼ãè¿é说ä¸hexï¼æ¯ç¨æ¥è½¬æ¢æåå
è¿å¶çå½æ°ï¼å¦è¿åçæºç人对hexè¯å®ä¸ä¼éçã
æåçæ©å±ï¼å¨ç½ä¸çå°çä»äººçé®é¢ãæ们åä¸ç±»ä¼¼äº'\u4e2d'çå符ï¼Python3ç¥éæ们æ³è¡¨è¾¾ä»ä¹ãä½æ¯è®©Python读åæ个æ件çæ¶ååºç°äº'\u4e2d'ï¼æ¯ä¸æ¯è®¡ç®æºå°±ä¸è®¤è¯å®äºå¢ï¼åæ¥ä¸ææ人ç»åºäºçæ¡ãå¦ä¸ï¼
import codecs
file = codecs.open( "a.txt", "r", "unicode-escape" )
u = file.read()
print(u)
温馨提示:答案为网友推荐,仅供参考