What is the best way to create a String in Java that will take up the least amount of kilobytes?
-
UPDATE: Although this question is a good discussion on String Compression it is a bad idea to try and store a large 64kb record in DynamoDB as you are charged by the kb that you store. The better way to store large data is to have the look up data in DynamoDB and put the large data in S3 and return the 2 sources when you want to bring a result back. I need to get a String (a very, very big block of Json) down to the smallest physical size possible to enable me to get it under 64kb. So I am trying to find the most efficient way to encode and decode a String to reduce its size. As an example let say I had this Donut Json but instead of the 7 toppings there were 1000's, enough to create a String that was more than 64kb. What can I do to reduce the size to less than 64kb. (Apart from removing white space). My motivation to do this is to store it in Amazon DynamoDB that has a limit per item of 64kb -http://aws.amazon.com/dynamodb/faqs/#Is_there_a_limit_on_the_number_of_attributes_an_item_can_have { "id": "0001", "type": "donut", "name": "Cake", "ppu": 0.55, "batters": { "batter": [ { "id": "1001", "type": "Regular" }, { "id": "1002", "type": "Chocolate" }, { "id": "1003", "type": "Blueberry" }, { "id": "1004", "type": "Devil's Food" } ] }, "topping": [ { "id": "5001", "type": "None" }, { "id": "5002", "type": "Glazed" }, { "id": "5005", "type": "Sugar" }, { "id": "5007", "type": "Powdered Sugar" }, { "id": "5006", "type": "Chocolate with Sprinkles" }, { "id": "5003", "type": "Chocolate" }, { "id": "5004", "type": "Maple" } ] }
-
Answer:
I'd probably use java.util.zip. For the kind of string you're talking about, you'll probably get compression of at least 10 and perhaps more. The code is pretty simple: ByteArrayOutputStream buffer = new ByteArrayOutputStream(65536); ZipOutputStream z = new ZipOutputStream(buffer); byte[] bytes = string.getBytes[]; int n = z.write(b, 0, b.length); Now buffer.toByteArray() contains your bytes. If n>65536, then you're screwed and you need to think of a different solution. You reverse the process to get your string back: InputStream is = new ByteArrayInputStream(buffer); ZipInputStream z = new ZipInputStream(is); byte[] bytes = new byte[65536]; int n = z.read(b, 0, 65536); String string = new String(bytes, 0, n); If that's not sufficient, you need to parse the JSON string into a real data structure and store that instead. That'll be even more efficient in both space and time (and a better way to access the data than the error-prone name access, which changes a lot of code every time you change the JSON format) but this is simple, generic, and already solved.
Joshua Engel at Quora Visit the source
Other answers
If your json has a fixed structure, the best way would be using popular binary serialization tools to store your data in DynamoDB. For instance, you can define a protocol buffer message (http://code.google.com/p/protobuf/), or a Thrift struct (http://thrift.apache.org/), and get a very packed binary serialization.
Soheil Hassas Yeganeh
In Java, string are stored as UTF16 in memory. You might be able to reduce a string's size if you create you own class that stores strings as UTF8. But, in this case, you're going to dump the dictionary in DynamoDB, in which case the size of the string in Java language is irrelevant. The language in which you make the processing is irrelevant. The only shortcuts you could take is to remove keys and assume that in every JSON, the order and meaning of every value is the same. You could probably implode the topping part of the dictionary into a string like "(id,type)5001|None,5002|Glazed,5005|Sugar...".
Cristian Andreica
Using Snappy - http://code.google.com/p/snappy-java/ and BASE64 encoding works nicely. public String squash(String squashThis) throws IOException { BASE64Encoder encoder = new BASE64Encoder(); byte[] compressed = Snappy.compress(squashThis.getBytes("UTF-8")); return encoder.encode(compressed); } public String unSquash(String unSquashThis) throws IOException { BASE64Decoder decoder = new BASE64Decoder(); byte[] uncompressed = Snappy.uncompress(decoder.decodeBuffer(unSquashThis)); return new String(uncompressed, "UTF-8"); }
Matt Wood
Related Q & A:
- What is the best way to calculate a date difference?Best solution by Stack Overflow
- What is the best way to sell a timeshare?Best solution by Yahoo! Answers
- What's the best way to get a job in a restaurant?Best solution by Yahoo! Answers
- What's the best way to make a good impression at a job interview?Best solution by Yahoo! Answers
- What is the best way to negotiate a salary for a new position?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.