Is there any pdf reader that supports word searching with regular expression?

How to copy/paste Cyrillic characters from PDF?

  • For years, I've accepted as fact that Russian-language PDFs don't play nice with other programs. That if you tried to paste copied text into Word or Notepad, you'd get gibberish characters. (I work on two laptops, one with Windows Vista and Acrobat, the other Windows 7 and Reader.) Are there any clever workarounds or programs I should know about? Googling this issue only confused me more. Someone on one forum mentioned installing "freeware" Cyrillic fonts, but searching for that led me to some sketchy sites with (admittedly cool-looking) skateboarder-style graphic fonts, but I can't imagine that would help...

  • Answer:

    Just as a general answer to your question, a clever program for working with Unicode is http://www.babelstone.co.uk/Software/BabelPad.html. If you install the latest version, open it, and go to Tools → Font Analysis... then in the top right of the dialog under "List All Unicode Blocks Covered by this Font" will be a dropdown you can pick any of the fonts on your system from. If you pick one "Cyrillic" will show up in the list beneath if it's available.

lily_bart at Ask.Metafilter.Com Visit the source

Was this solution helpful to you?

Other answers

It means that the PDFs you're having trouble with are probably in a pre-Unicode 8-bit Cyrillic encoding like http://en.wikipedia.org/wiki/Windows-1251. So you'd just need a tool to convert from that to Unicode; I'm noticing that at the end of the above Wikipedia article there's a link to something called the http://2cyr.com/decode/.

XMLicious

If you go to a Russian language web site like http://ru.wikipedia.org/ in Internet Explorer and cut and paste into Word or Notepad do you get gibberish?

XMLicious

Can you link to a PDF that is giving you trouble? Most commonly used fonts contain the full Cyrillic alphabet. You might just be having encoding problems.

hyperbovine

XMLicious, I don't have any problem using Cyrillic fonts elsewhere, it's just trying to copy/paste from PDF. hyperbovine, they're client files so unfortunately I can't share them, but your question made me look for other PDFs to try. I found http://www.minfin.ru/common/img/uploaded/library/2012/12/struktura_dolga_1-01-13.pdf at random, and I can successfully paste it into Word! So does this mean it's an encoding issue on their end?

lily_bart

Something I've done when trying to brute-force figure out encoding problems is to cut-and-paste into a plain old text file (i.e. use Notepad, not Word), and then open that file up in a browser. Some browsers will auto-detect some of the funky encodings and if not they have a menu option somewhere that lets you select from a list of encodings. Then just keep trying different encodings until you find the right one. You should then probably be able to cut-and-paste from the browser into word and have it work.

zengargoyle

http://www.artlebedev.ru/tools/decoder/ automatically converts weird old encodings (cp1251, koi8-r) to UTF-8. Also, tell your client they should upgrade their software and stop using old encodings.

floatboth

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.