how to extract characters of a language?

to extract characters of a particular language

  • how can i extract only the characters in a particular language from a file containing language characters, alphanumeric character english alphabets

  • Answer:

    This depends on a few factors: Is the string encoded with UTF-8? Do you want all non-English characters, including things like symbols and punctuation marks, or only non-symbol characters from written languages? Do you want to capture characters that are non-English or non-Latin? What I mean is, would you want characters like é and ç or would you only want characters outside of Romantic and Germanic alphabets? and finally, What programming language are you wanting to do this in? Assuming that you are using UTF-8, you don't want basic punctuation but are okay with other symbols, and that you don't want any standard Latin characters but would be okay with accented characters and the like, you could use a string regular expression function in whatever language you are using that searches for all non-Ascii characters. This would elimnate most of what you probably are trying to weed out. In php it would be: $string2 = preg_replace('/[^(\x00-\x7F)]*/','', $string1); However, this would remove line endings, which you may or may not want.

user160002 at Stack Overflow Visit the source

Was this solution helpful to you?

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.