How do I use WordNet in Python?

Numpy? How do I use NumPy in my python UDFs for pig ? Assume that I don't have it installed on Hadoop. Nor will I have permission to do that. I m not looking for Pip install, but some workaround for adding Numpy as python classes?

  • How do I use NumPy in my python UDFs for pig ? Assume that I don't have it installed on Hadoop. Nor will I have permission to do that. I m not looking for Pip install, but some workaround for adding Numpy as python classes.

  • Answer:

    Numpy is written in C. Python UDF has limited functionalities, and one of them is: almost no C-library can be used, because jython calling in Pig doesn't support external C-library for python UDF. So, there is no good way of using numpy in a python UDF for Pig, for now. Reference http://blog.mortardata.com/post/62334142398/hadoop-python-pig-trunk However, if your problem can be solved by hadoop streaming (https://www.google.com/search?q=hadoop+streaming+python), for example, a single large dataset on HDFS with some small files which can be shipped to hadoop streaming, life is much easier: I use it everyday. One can install Numpy in a normal python installation, NLTK is good too. Shipping large files to hadoop streaming can be tricky, but one can use hadoop soft link as a workaround. CPython can be an alternative choice and it has been published recently, please refer to the above link. In short, Python UDF in Pig using jython has difficulties using numpy, but hadoop streaming works like a charm. CPython is another choice of CPython UDF in Pig.

Hongliang Liu at Quora Visit the source

Was this solution helpful to you?

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.