Looking for mostly text .tif files
-
I'm looking to download a large number, at least 500, of mostly text .tif files to use in developing a piece of software. Optimal would be a wikileaks email data set, but I haven't been able to find that as .tif files
-
Answer:
I have over 6000 pages of evidence from the investigation into Lisa McPherson's death at http://www.lisafiles.com/ (formerly http://projects.metafilter.com/3202/The-Lisa-McPherson-Files), all in TIF format. Most are typed, but some (especially the Scientology-produced documents) are handwritten. You'd find mostly typed materials in the http://www.lisafiles.com/police/index.html. For example, this http://www.lisafiles.com/0708.html has a link to http://www.lisafiles.com/lf.cgi?http://www.kristi-wachter.com/lisafiles/07/0708.tif. If these would be useful to you, feel free to MeMail me if I can make them easier for you to download. If you have Dropbox or an FTP directory, I'd be happy to send you over the whole collection of TIFs, or the subset that'd be most useful to you.
rakish_yet_centered at Ask.Metafilter.Com Visit the source
Other answers
Scanned? I meant no...
rakish_yet_centered
Call your county recorder -- most have gone digital, they're public records available to anyone so there's no privacy issues about disclosure, so they may be able to just dump a bunch of a thumbdrive or CD for you.
AzraelBrown
Also, the free version of http://www.bullzip.com/products/pdf/info.php will let you print almost anything to a TIF, with the option to save as Group4 and 1-bit scans, like a fax and like most OCR enjoys. So, load a large Wikileaks file, print it to a TIF using bullzip, and you'll have your TIFF version.
AzraelBrown
What are the parameters of what you need? Does it need to be scanned? If not, there are tons of utilities that will export a PDF as a series of TIFFs, and PDFs abound.
supercres
Getting PDF's first? Could do that, but it isn't optimal. When I poked around wikileaks I only found emails embedded in HTML pages. I like the county recorder idea though Scanned? Yes. It's a document review program that converts tif to txt, creates a thumbnail, that sort of thing
rakish_yet_centered
If you will drop the "TIFF" requirement up front, you might get more sources. ImageMagick will convert a batch of images to whatever format you want.
cmiller
The .tif is not really a requirement, I use imagemagick, or graphicsmagick, I forget which, to convert from image to text, and image, But tiff is better, because that what the people I know actually used for document review projects. You're right though, in the end I'll probably be using PDF's
rakish_yet_centered
http://archive.org/details/opensource_English They are in pdf, but it should not be difficult to separate the pdfs into individual images.
demiurge
Internet Archive is a good idea, not exactly what I was looking for, but it might have to do
rakish_yet_centered
Related Q & A:
- How to list all text files in a directory?Best solution by Stack Overflow
- What do harp seals mostly eat?Best solution by ehow.com
- Is it true that space is mostly an empty void?Best solution by Yahoo! Answers
- Are Indian nationals mostly pretty disgraceful?Best solution by Yahoo! Answers
- In which country are mostly solar panels used?Best solution by pureenergies.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.