Binary Serialization To / From String and Encoding

Recently somebody posted on the C# language newsgroup that they couldn't figure out how to convert an object to a string (and the reverse) since all the examples only showed how to write / read to a file.

I chimed in that I thought what the OP really meant was "how to convert a stream to a string" (as in using the BinaryFormatter for serialization), and so I posted the following sample:

Stream to string:

byte[] b = MyMemoryStream.ToArray();
string s = System.Text.Encoding.UTF8.GetString(b);


String to stream:

string s = "whatever";
byte[] b = System.Text.Encoding.UTF8.GetBytes(s);
MemoryStream ms = new MemoryStream(b);



Friend and fellow MVP Jon Skeet, who is pedantic to a fault, responded with this:

"That's a way which is almost guaranteed to lose data. Serialization with BinaryFormatter produces opaque binary data, which may very well not be a valid UTF-8 encoded string.

To convert arbitrary binary data to a string and back, I'd use Convert.ToBase64String and Convert.FromBase64String."

Jon is absolutely correct, and I suspect that many developers are not aware that just by choosing what one would "think" is a broad encoding, that we are guaranteed data integrity. Well, we are not.

The correct way (MSDN documentation links first:)

[MSDN] Convert.ToBase64String:


[MSDN] Convert.FromBase64String:

And, revised code sample:

Stream to string:

byte[] b = MyMemoryStream.ToArray();
string s = Convert.ToBase64String(b);


String to stream:

string s = "whatever";
byte[] b = Convert.FromBase64String(s);
MemoryStream ms = new MemoryStream(b);

Comments

  1. Anonymous12:03 PM

    People should implement their classes using the TextReader interface instead of StreamReader... And then instantiate a StringReader if they want to work with a string.

    ReplyDelete
  2. Erm,
    picky, picky! Does this really add value to the subject at hand?

    ReplyDelete
  3. Anonymous12:54 PM

    But what to do if your string contains characters like $ or £?
    UTF8Encoding can lose data whereas Convert.FromBase64String raises exception "Invalid character in Base-64 string".

    ReplyDelete
  4. Didn't mess it up for me:

    string s = "$ or £?";
    byte[] b = System.Text.Encoding.UTF8.GetBytes(s);
    string s64 = Convert.ToBase64String(b);
    BinaryFormatter bf = new BinaryFormatter();
    MemoryStream ms = new MemoryStream();
    bf.Serialize(ms, s64);
    ms.Seek(0, 0);
    BinaryFormatter bf2 = new BinaryFormatter();
    object o= bf.Deserialize(ms);
    string s8 = (string)o;
    byte[] b4 = Convert.FromBase64String(s8);
    string s5 = System.Text.Encoding.UTF8.GetString(b4);
    Console.WriteLine(s5);
    Console.ReadLine();

    ReplyDelete
  5. Anonymous10:03 AM

    Thanks. It was the solution I was looking for.

    ReplyDelete
  6. Anonymous1:55 PM

    Just what I needed, thanks.

    ReplyDelete
  7. Anonymous2:52 AM

    Thank you, valuable article for years to come :)

    ReplyDelete
  8. Perfect example, as i know, there is a lot of people with a possible "bug" in the code.

    Great.

    ReplyDelete
  9. Anonymous9:00 AM

    It worked for me. Thanks a lot!

    ReplyDelete

Post a Comment

Popular posts from this blog

FIREFOX / IE Word-Wrap, Word-Break, TABLES FIX

Some observations on Script Callbacks, "AJAX", "ATLAS" "AHAB" and where it's all going.

FIX: Requested Registry Access is not allowed (Visual Studio 2008)