Creating custom SQL UDFs on Spark (Hive) and Databricks with Gradle and Java


Recently, I was looking to find a good working example on how to create a custom Spark SQL User-Defined-Function (UDF) on Java on build that using a modern tool like Gradle.

Usually, my go-to choice for UDFs is Python but only using SQL-based functions

I didn’t manage to find a quick started quickly on Google so I’ll share one here.

Gradle config

Testing with Gradle version 6.8 .

plugins {
    // to bundle a jar with all dependencies we will be using shadow plugin
    id 'com.github.johnrengelman.shadow' version '6.1.0'
    id 'java'

group ''
version '1.1-SNAPSHOT'

repositories {

dependencies {
    // these dependencies are (or should be) available on the cluster itself so no
    // need to bundle these
    compileOnly 'org.apache.hadoop:hadoop-client:3.3.0'
    compileOnly 'org.apache.hive:hive-exec:3.1.2'

    // some external lib that we need to include in the JAR
    implementation group: 'com.neovisionaries', name: 'nv-i18n', version: '1.28' 

    testImplementation 'org.junit.jupiter:junit-jupiter-api:5.7.0'
    // hive-exec is needed to make the tests run
    testImplementation 'org.apache.hive:hive-exec:3.1.2'
    testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.7.0'

test {

Java UDF

Specific (simple) UDF example that can work with one data type.


import com.neovisionaries.i18n.CountryCode;
import org.apache.hadoop.hive.ql.exec.UDF;

public class StarterUdf extends UDF {
    public String evaluate(String code) {
        return CountryCode.getByAlpha2Code(code).getAlpha3();

Java Generic UDF

Using GenericUDF it’s possible to make the UDF handle multiple different types, process complex (nested) types and etc.

Showing the full example is outside of the scope, but here is a copy-pastable example of the Generic-version UDF.

public class StarterGenericUdf extends GenericUDF {
    public String evaluate(String code) {
        return CountryCode.getByAlpha2Code(code).getAlpha3();

    public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
        return null;

    public Object evaluate(DeferredObject[] arguments) throws HiveException {
        return CountryCode.getByAlpha2Code(arguments[0].get().toString()).getAlpha3();

    public String getDisplayString(String[] children) {
        return null;


A few simple tests should be in order too

import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.*;

class SampleTest {
    StarterUdf fun = new StarterUdf();

    public void two_letter_code_should_be_converted_to_iso_code() {
        String outcome = fun.evaluate("CA");
        assertEquals("CAN", outcome);


Using in SQL

To build a final JAR we will be using Shadow plugin:

gradlew shadowJar
ls build\libs

Finally, we can upload the JAR to the FileStore (on DBFS) and use the following on Spark SQL (Databricks) to create a function:

CREATE OR REPLACE FUNCTION convert_country_code AS ''
    USING JAR 'dbfs:/FileStore/spark_sql_udf_starter_1_5_SNAPSHOT_all.jar';
--REFRESH  FUNCTION convert_country_code;

select convert_country_code("CA") as iso;


The example here can be found on GitHub here