python - importing user defined library in redshift UDF -
here trying import library inside user defined python function in redshift
i have created library called nltk follows
[create or replace library nltk language plpythonu 's3://nltk.zip' credentials 'aws_access_key_id=*****;aws_secret_access_key=****';] once created tried import in function as
create or replace function f_function (sentence varchar) returns varchar stable $$ nltk import tokenize token = nltk.word_tokenize(sentence) return token $$ language plpythonu; tokenize sub directory inside nltk library
but when try run function calling on table as
select f_function(text) table_txt; i getting error such
amazon invalid operation: importerror: no module named nltk. please @ svl_udf_log more information
details:
-----------------------------------------------
error: importerror: no module named nltk. please @ svl_udf_log more information
code: 10000
context: udf
query: 69145
location: udf_client.cpp:298
process: query0_21 [pid=3165]
can me doing wrong?
first, there obvious problem python code: never importing nltk, , calling nltk.word_tokenize.
second, after downloading nltk package, need zip module folder inside package , upload zip redshift.
nltk-x.y.zip ├─ setup.py ├─ requirements.txt ├─ nltk <- folder should zipped , uploaded s3 ... ├─ __init__.py ├─ tokenize.py redshift can load modules -- root folder should have __init__.py file. http://docs.aws.amazon.com/redshift/latest/dg/udf-python-language-support.html
Comments
Post a Comment