What is the most efficient way to serialize in Python?
-
e.g. before you write/set an object to memcached from Python, what is the most efficient way to serialize and deserialize the object?
-
Answer:
Efficiency is a bit overloaded term. Original question mentions memcached usage so my take on efficiency will be purely around resulting serialized object size for I/O latency and shared resource usage concerns. I can't agree with cPickle recommendations because their view of efficiency is around time spent for serialization and deserialization. In the context of this question, CPU time for ser/deser is least of your concerns because network I/O latency will always be the single major contributor to your overall latency. The most space efficient way will always be to create a custom per-data-structure type serialization system but obviously this would be both a maintenance head ache and will require significant development time overhead for every different data structure. Thus I'd suggest choosing a well known, time tested and cross-language serialization scheme for decreased mgmt costs and still being able consume data (or contribute) from another platform. Both [1] and [2] serializations can offer smaller serialized sizes than Python Pickle Procotol 2. (pickle or cPickle, doesn't matter). You can also compress the resulting serialized object for additional space savings. DEFLATE or gzip comes to mind for acceptable CPU and memory footprint while providing significant compression. If the CPU usage is a concern then at least LZO can be used. OTOH, Protobuf and Avro implementations for Python are implemented in Python and especially Protobuf can be sometimes notoriously slow. The cJSON implementation[3] for Python is native (as name suggests) and really fast (author claims up to 250X difference to pure Python implementation). Post-compression output will be still a tad larger than both Protobuf or Avro outputs but the run-time difference can reach up to 70X difference. This may make significance if you're serializing somewhat large objects (> 128K) very frequently. [1] http://code.google.com/apis/protocolbuffers/docs/overview.html [2] http://avro.apache.org/docs/current/ [3] http://pypi.python.org/pypi/python-cjson
Berk D. Demir at Quora Visit the source
Other answers
Look at cPickle: http://docs.python.org/release/2.5/lib/module-cPickle.html
Giorgi Lekveishvili
http://msgpack.org/ â Sounds promising, but have no first-hand experience with it. From its homepage: MessagePack is a binary-based efficient object serialization library. It enables to exchange structured objects between many languages like JSON. But unlike JSON, it is very fast and small.
Joonathan Mägi
Just something to think about that is often overlooked, json is an ok way to serialize things sometimes. It's human readable, multi-language usable, and plain text so it can be compressed and stored pretty efficiently. The limitations are in getting back the original object instances. If you serialize a SqlAlchemy object instance by casting it to a dict() then you'll get back a dict, and not a SA object. So not perfect for all use cases, but something to think about.
Rick Harding
A problem I was just forced to figure out today. If you care about latency for large dictionaries (on the order of 500 MB serialized) I found that simplejson was about twice as fast as cPickle. I also found that lists are 5 times faster to serialize than sets with either of the methods.
Guido Bartolucci
For larger data sizes (10-100G) I found marshal ( http://docs.python.org/library/marshal.html ) to be significantly faster than cPickle. However, as Berk points out for memcache you probably care more about size than encode/decode time. I'd give all of the built-in methods (cPickle, marshal, and json) a good look to see if they're good enough; you might as well avoid an external dependency if you can.
Parand Tony Darugar
Related Q & A:
- Is there a way of putting the Python Shell output in a tkinter window?Best solution by Stack Overflow
- What would be a good way to get into Julliard?Best solution by answers.yahoo.com
- What are some energy efficient products?Best solution by Quora
- What is the fastest and most efficient way to heal a canker sore?Best solution by Yahoo! Answers
- What is the cheapest efficient all in one printer? UK?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.