Last time, we started our exploration of how Windows synthesizes text clipboard formats by looking at the conversion between CF_OEMTEXT and CF_TEXT. Today, we’ll look at what happens when CF_UNICODETEXT enters the picture.
The introduction of CF_UNICODETEXT means that we now have three clipboard text formats, and therefore six possible conversions. The four new conversions are
CF_UNICODETEXT to/from CF_TEXT.CF_UNICODETEXT to/from CF_OEMTEXT.
These conversions are done with the assistance of the CF_LOCALE clipboard format, which contains an LCID, which is a 32-bit integer that encodes a primary language (such as German), a sublanguage (such as Swiss-German), and a sort rule (such as phone book). None of these details are directly relevant to character set conversion. The locale is used because both the ANSI and OEM code pages can be derived from the locale, so it’s only one value that needs to be recorded.¹
The system converts to/from CF_UNICODETEXT via the code page obtained from the LCID:
LOCALE_IDEFAULTANSICODEPAGE when converting to/from CF_TEXT.LOCALE_IDEFAULTCODEPAGE when converting to/from CF_OEMTEXT.
Putting all of this into a chart gives us
ToFromCF_TEXTCF_OEMTEXTCF_UNICODETEXTCF_TEXTnopOemToAnsiWC2MB(ANSI CP)CF_OEMTEXTAnsiToOemnopWC2MB(OEM CP)CF_UNICODETEXTMB2WC(ANSI CP)MB2WC(OEM CP)nop
In the above table, “ANSI CP” means “the code page reported by calling GetLocaleInfo with the LCID in the CF_LOCALE clipboard format, and the LOCALE_IDEFAULTANSICODEPAGE locale attribute”. Similarly for “OEM CP”, using LOCALE_IDEFAULTCODEPAGE instead of LOCALE_IDEFAULTANSICODEPAGE.
That’s great, we have all the answers in a table. But that table raises more questions!
We’ll start answering questions next time.
¹ This CF_LOCALE clipboard format existed in 16-bit Windows as well, but it wasn’t really used for anything. The people who added Unicode support to the clipboard realized, “Hey, the thing we need is already here! We just have to start using it.”
The post How does Windows synthesize <CODE>CF_<WBR>UNICODETEXT</CODE> from <CODE>CF_<WBR>TEXT</CODE> and vice versa? appeared first on The Old New Thing.
From The Old New Thing via this RSS feed


