Twitter API: What is the best data storage mechanism and client library for analysing tweets using Python?
-
This is my first Quora-Question, so please have patience ;): I am currently doing some (minor) research on twitter. Therefore i need to provide some statistics for a set of twitter accounts and also record their tweets for a given time period. Right now I am using python and tweepy to get me some information about a specific account. But I haven't yet figured out how to record the tweets. provided i'll stick with python: - Is the database provided by python (sqlite) the right tool to save all these data? - which twitter-api library/wrapper would you recommend for my plans? Twython? Tweepy?
-
Answer:
Tiny-scale: sqllite3 Small-scale: MySQL, or whatever fancy no-SQL solution you'd like to use such as tokyo cabinet or redis Medium-scale: More Mysql servers, multiple redis servers, cassandara. Large-scale: HDFS + Hadoop with Pig Jobs on many EC2 servers. The last option is one of the many ways that we analyze tweets.
John Adams at Quora Visit the source
Other answers
It depends on how much data you intend to collect, and how you intend to then share that data. SQLite should be fine if you are collecting thousands of tweets - if you are collecting millions it will probably be OK but you might be better off with something else. If you plan to put a web interface in front of stage app which other people get to use you should go for a database designed for concurrent access (MySQL, MongoDB, PostgreSQL etc) instead of SQLite. From your question it sounds like SQLite will be fine though. I haven't used tweepy myself, but if it works for you then stick with it. The Twitter HTTP APIs are pretty painless to talk to directly though provided you have a good OAuth library (I use python-oauth2) and that might give you more flexibility, but don't bother with that unless the library you are already using starts to get in your way.
Simon Willison
With regards to storage of tweets, several options are available. It can be as simple as storing the JSON to (gzipped) text files, one tweet per line. Actually I would strongly recommend this since you can always import the data into a SQL or NoSQL type of database later. And if the chosen DB does not cut it anymore, you can import the data into a more suitable database by simply going back to the source: your text based files. Having said this, I would also recommend looking into ElasticSearch as a storage system + search index in one. ElasticSearch can provide you some statistics out of the box (look into facets) and will offer you a simple and very scalable search index and storage system. There are also some great interfaces that will allow you to browse your ElasticSearch data almost out of the box. Take a look at https://github.com/elasticsearch/kibana and https://github.com/okfn/facetview/.
Erik-Jan van Baaren
Disclaimer: I am one of the contributor to Twython You can use twython and I have recently rewritten twython for python 2.x using requests. It is dead simple if you use requests to create your own twitter libraray. I also advice to use async libraries while using twitter search option. Tweetdeck was written in python + Twisted.
Kr Ace Kumar Ramaraju
My experience: we used Tweepy at Adly to download/analyze information for millions of users and tweets. We built our own custom async client to deal with our streaming needs, and stored aggregate and user information in Redis. Storage-wise, I'd have to agree with Simon that SQLite will do it for you if you have a single process dealing with some few thousands up to a couple million tweets (and have no remote access requirements). Beyond that, I'd recommend PostgreSQL (or MySQL if you have experience with it) if you want a relational interface, or Riak if you want a document store (Riak has secondary indexes now, which should simplify some of the querying). If you are looking to aggregate data, perform analysis, etc., I would generally recommend Redis.
Josiah Carlson
I will make it simple 1. sql - for storage , millions of tweet won't be a problem and you don't need to go fancy on storage for minor research, unless you are trying to develop big systems. 2. Tweepy - Tweepy is simple and sleek .
Harshit Sharma
Related Q & A:
- What is the best data model and database systems to store social graph?Best solution by Quora
- What's wrong with this PHP Twitter API POST?Best solution by Stack Overflow
- What is the best way to learn how to build websites and web applications with Python?Best solution by Quora
- What's the best way to get followers on Twitter?Best solution by michaelhyatt.com
- What is the best type of video storage for a skateboarding camcorder?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.