This is my little research today for some work I’m helping out. So you have a Unicode encoded HTML string (or url), e.g. “津津有味”, that you want to convert to an actual unicode string, in this case “津津有味”, but you don’t have the trusty use of HtmlUtility.HtmlDecode or anything on hand (for example you’re using the compact framework like myself). Here’s how you convert those numbers into actual unicode characters.
[sourcecode language="csharp"]
// Convert the number into a short – you can make this a little more safe by using short.TryParse instead
ushort mycode = Convert.ToUInt16("27941");
// Now convert that integer into a byte array
byte[] mybytes = BitConverter.GetBytes(mycode);
// We have our byte array, convert to a string! Tada!
string mystring = Encoding.Unicode.GetString(mybytes);
[/sourcecode]
And there we have it. In order to parse all the unicode encoded numbers out, you just need a simple regular expression which I’m sure you can figure out.
I hope that helps someone out there, because it took me some time inspecting variables in order to get it right.