Monday, September 2, 2013

String getBytes could lead to difficult bugs

If you execute the following function,  what do you think should be the size of the 'def' byte array?

The logic is really simple: an input as byte array that have two elements, then create a string out of this with 'UTF-8' encoding, then create another byte array using this string with the same UTF-8 encoding.

public static void testStringUTF8() {
byte[] abc = new byte[2];
abc[0] = 31;
abc[1] = -117;

try {
String stringAbc = new String(abc, "UTF-8");
byte[] def = stringAbc.getBytes("UTF-8");
if (def != null) {
System.out.println("size of output byte array:" + def.length);  //print the array size

System.out.println(def[1]);  //print the second element of the output byte array

System.out.println(abc[1]); //print the second element of the input byte array

} catch (Exception e) {


1 comment:

  1. answer:

    size of output byte array:4

    It was run in USA based environment (charset windows-1252).

    The input byte array size is 2, after 'new String' and 'getBytes' calls with UTF-8 encoding, the output size is 4.

    This kind of code, sometimes, could cause bugs in compression and/or encryption methods.