UDF
Last updated
Last updated
Sometimes there could be a case when available spark transformation operations are not sufficient to fulfil a requirement, and you need some complex logic to be executed on data to get desired value. So here we are with the UDFs that allows a user to write their own custom logic code that will be executed in Pipeline on input data. Flex83 platform gives you the power to write UDFs in a variety of programming languages such as Python, Nodejs, JavaScript, and many more to come.
Let's understand this with the help of an example, converting Temperature from Celsius to Fahrenheit using Python UDF. To create a UDF, go to BigData > UDF and click on Add New.
Enter the Basic Information first - Name (name should be same as python function name), Description, and select Interpreter (programming language) which is Python in our case.
Now create a function in code editor with same name as in Basic Information, write your transformation logic in function body.
To check the validity of this function click on Debug button, it will ensure the function is having any syntax errors or not.
Also you can check the execution of your function whether it is producing correct result or not in the following way (add print statement at the end and call the function with some input value, and remove the print statement before saving the UDF).
Once you are satisfied with the UDF results then it's time to save the UDF and use it in Pipeline.
Let's use this UDF in pipeline created in previous section (Pipeline Section).
To use this UDF in pipeline first the stop and edit the pipeline from actions available against it on pipeline page.
Add the UDF operation in Select Definition of pipeline as shown in below screenshot.
Now Debug the pipeline and check for the output of this UDF. See the last column named as fahrenheit containing the temperature in Fahrenheit.
Syntax for using UDFs in pipeline:
Python:
execute_python ('< python_function_name >', < function_param >) as < output_column_name >
JavaScript:
execute_js ('< js_function_name >', < function_param >) as < output_column_name >
Nodejs:
execute_node ('< nodejs_function_name >', < function_param >) as < output_column_name >
Note: UDFs can accept only one parameter. If you want to pass more than one parameter then pass a nested object (struct OR json object).