Multiple domains for similar content?

How do social networking sites(Facebook, twitter) block domains or urls in status updates / post to control spam or porn? Is it based any criteria, patterns or just mere content of the blocked site which is the reason for  blocking ?

  • What are products or software which provide feeds for such filtering. Do these products crawl content for building such feeds or because of any content on the sites are these sites blacklisted. Is their any open source or free feed available for controlling spam in social networks.

  • Answer:

    Most companies' anti-spam efforts are proprietary -- and we know the bad guys are watching -- so we can't give you their exact "secret sauce," but hopefully the following gives you a good indication of where to begin (full disclosure: I am the co-founder of Impermium (http://www.impermium.com), a social spam protection service): There also isn't a one-size-fits-all solution, because • Each social site has a different architecture and unique features • The structure of content -- from status updates to wall posts to tweets -- is highly variable • Attacks/campaigns are often launched from the inside, from fraudulent and/or compromised user accounts • The overlap with traditional e-mail spam is nebulous Given these complexities, here are some tactical approaches that should prove effective for you, whether you build them yourself or employ a service like ours: • To answer your specific question about URLs, yes, feeds to exist from the e-mail anti-spam world, including http://Spamhaus.org and http://URIBL.org (and others). Unfortunately, the overlap between e-mail spam and social spam is fairly weak, so coverage will vary depending on your specific use case. These lists are created mostly from e-mail spamtrap hits (e-mail accounts that receive only spam), and are then crawled or otherwise analyzed to determine blacklist-worthy content. • Beyond feeds of known-bad URLs, your service must proactively analyze new URLs across many dimensions -- not just the target and whether it's been seen before, but age, geography, inlinks/outlinks, frequency, and commonality with previous examples -- to judge reputation and perniciousness • The content surrounding URLs must also be analyzed closely, and must be correlated with the user posting it: While the link is certainly important, so is what else he said ("Buy Cheap ___" versus "Here's a viral video: ___") and so is what else we know about the user who posted it. Every post must be compared with a realtime database of emerging attacks along a number of pivots, and weighed against a global user reputation database to detect deviations that might indicate an novel attack pattern • Behavioral circumstances and metadata provide invaluable clues: There are many attributes and details of each post that point to the type, so analysis of the who/what/when/where/why and how a post was created is essential. The methodology the bad guys use -- the botnet, scripts, and other forms of automation -- all leave "fingerprints" which can be detected in the content Hopefully this provides you some guidance. Please let me know if you have any follow-up questions. Good luck!

Mark Risher at Quora Visit the source

Was this solution helpful to you?

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.