Last time, we started our exploration of how Windows synthesizes text clipboard formats by looking at the conversion between CF_OEM­TEXT and CF_TEXT. Today, we’ll look at what happens when CF_UNICODE­TEXT enters the picture.

The introduction of CF_UNICODE­TEXT means that we now have three clipboard text formats, and therefore six possible conversions. The four new conversions are

CF_UNICODE­TEXT to/from CF_TEXT.CF_UNICODE­TEXT to/from CF_OEM­TEXT.

These conversions are done with the assistance of the CF_LOCALE clipboard format, which contains an LCID, which is a 32-bit integer that encodes a primary language (such as German), a sublanguage (such as Swiss-German), and a sort rule (such as phone book). None of these details are directly relevant to character set conversion. The locale is used because both the ANSI and OEM code pages can be derived from the locale, so it’s only one value that needs to be recorded.¹

The system converts to/from CF_UNICODE­TEXT via the code page obtained from the LCID:

LOCALE_IDEFAULT­ANSI­CODE­PAGE when converting to/from CF_TEXT.LOCALE_IDEFAULT­CODE­PAGE when converting to/from CF_OEM­TEXT.

Putting all of this into a chart gives us

ToFromCF_TEXTCF_OEMTEXTCF_UNICODETEXTCF_TEXTnopOemToAnsiWC2MB(ANSI CP)CF_OEMTEXTAnsiToOemnopWC2MB(OEM CP)CF_UNICODETEXTMB2WC(ANSI CP)MB2WC(OEM CP)nop

In the above table, “ANSI CP” means “the code page reported by calling Get­Locale­Info with the LCID in the CF_LOCALE clipboard format, and the LOCALE_IDEFAULT­ANSI­CODE­PAGE locale attribute”. Similarly for “OEM CP”, using LOCALE_IDEFAULT­CODE­PAGE instead of LOCALE_IDEFAULT­ANSI­CODE­PAGE.

That’s great, we have all the answers in a table. But that table raises more questions!

We’ll start answering questions next time.

¹ This CF_LOCALE clipboard format existed in 16-bit Windows as well, but it wasn’t really used for anything. The people who added Unicode support to the clipboard realized, “Hey, the thing we need is already here! We just have to start using it.”

The post How does Windows synthesize <CODE>CF_<WBR>UNICODE­TEXT</CODE> from <CODE>CF_<WBR>TEXT</CODE> and vice versa? appeared first on The Old New Thing.


From The Old New Thing via this RSS feed