Freely Accessible Etymology Database? Or tools to help create one?
-
I have an idea for a project that would require the ability to search a dictionary of words and find the year of it's known introduction (as close as possible).I am aware of etymology-online (love that site), but since, as far as I'm aware, it's just a site, and the compilers don't have a publicly accessible database, I was wondering if anybody knows of any site that actually WOULD have a freely available database (either query via an API through the web, or downloadable to self-host)? If there isn't any, does anybody have an idea who I might be able to contact? Would it be prudent to access the etymology online folks? I feel like scraping their pages for the data would be reckless and kind of jerky in terms of bandwidth, but then again - it's mostly text and these days bandwidth is fairly cheap... But I suppose it'd be possible to do that as a last resort and script something to pull the data into a database? Any ideas on what would be good tools to do such a thing?
-
Answer:
Ask Before. And i recommend python for scrapping. Maybe you can ask on opendata on reddit and stack exchange too :) regards and good luck.
symbioid at Ask.Metafilter.Com Visit the source
Other answers
Oh actually that doesn't have years. so it might NOT be what you want. But it's still awesome.
aubilenon
The http://developer.wordnik.com/ (http://developer.wordnik.com/docs.html) has etymologies, though they don't (as far as I can tell) generally have an easily extractable year of introduction. For relatively recent words, you could use http://storage.googleapis.com/books/ngrams/books/datasetsv2.html to find the first year a given word appeared in published books.
aparrish
http://www1.icsi.berkeley.edu/~demelo/etymwn/ is probably what you want. (https://ask.metafilter.com/241101/List-of-simple-word-roots)
aubilenon
http://ask.metafilter.com/270788/Freely-Accessible-Etymology-Database-Or-tools-to-help-create-one#3931441 For relatively recent words, you could use Google Ngram data to find the first year a given word appeared in published books. Bad idea; Google's metadata is notoriously unreliable.
languagehat
I think my best bet is to ask the etymonline guy if he has data he's willing to share or I could pay for, or if not, if he minds if I scrape his page. Looks like most etymology stuff doesn't really have dates. And it doesn't have to be super accurate, just close enough. I wonder how he ended up getting dates, perhaps he used Google's ngram stuff and it's just as unreliable?
symbioid
Related Q & A:
- How to create a boot CD from an already burned one?Best solution by Super User
- Can I create a second filestream container on an existing SQL Server 2008 database without going offline?Best solution by Database Administrators
- How to Restrict Database for One User in SQL Server 2008 R2?Best solution by Database Administrators
- How to help make one of my dreams come true?Best solution by wikihow.com
- How to create a Database in FoxPro and how to retrieve and sort it out?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.