Skip to content

HIVE-29419: Provide a Hive-specific docker image for Tez AM#6435

Open
abstractdog wants to merge 1 commit intoapache:masterfrom
abstractdog:HIVE-29419-tez-am-image
Open

HIVE-29419: Provide a Hive-specific docker image for Tez AM#6435
abstractdog wants to merge 1 commit intoapache:masterfrom
abstractdog:HIVE-29419-tez-am-image

Conversation

@abstractdog
Copy link
Copy Markdown
Contributor

@abstractdog abstractdog commented Apr 15, 2026

What changes were proposed in this pull request?

Make hive image able to start a TezAM in LLAP mode that can assign tasks to the LLAP daemons.

Why are the changes needed?

Because it's the next step to have a fully distributed, Dockerized environment for Hive.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually, steps are below.

Needs Hive 4.3.0-SNAPSHOT jars that contain recent changes (specifically HIVE-29477):

# assuming that you're standing on hive master or rather the code of this PR:
mvn clean install -DskipTests -Pdist # this will take a long, sorry, we need a snapshot jar
cp packaging/target/apache-hive-4.3.0-SNAPSHOT-bin.tar.gz packaging/cache

start cluster:

export HIVE_VERSION=4.3.0-SNAPSHOT
export POSTGRES_LOCAL_PATH=...your_postgres_driver_path...

# hive: 4.3.0-SNAPSHOT is mandatory
# tez: 1.0.0-SNAPSHOT is mandatory to override released tez jars with the currently unreleased 1.0.0 tez jars that can talk to unmanaged tez sessions
./build.sh -hive 4.3.0-SNAPSHOT -hadoop 3.4.1 -tez 0.10.5 -tez-snapshot 1.0.0-SNAPSHOT

./start-hive.sh --llap
docker compose --profile llap logs -f

test:

 beeline -u 'jdbc:hive2://localhost:10000/' -e "DROP table IF EXISTS iceberg_table; CREATE TABLE iceberg_table (id BIGINT) STORED BY iceberg; INSERT INTO iceberg_table VALUES(1);"

see logs that queries go through tezam and daemons:

tezam         | 2026-04-15T16:08:05,906 INFO  DAGAppMaster - Starting DAG submitted via RPC: INSERT INTO iceberg_table VALUES(1) (Stage-1)

...

llapdaemon-1  | 2026-04-15T16:08:06,368  INFO [Task-Executor-0 (1776269274583_0000_1_00_000000_0)] impl.LlapTaskReporter: Registered counters for fragment: 1776269274583_0000_1_00_000000_0 vertexName: Map 1

...

hiveserver2   | 2026-04-15T16:08:07,267  INFO [HiveServer2-Background-Pool: Thread-117] SessionState: Status: DAG finished successfully in 1.23 seconds

very important test case is that the container layout implemented here in docker-compose is compatible with the already existing and working hs2+llapdeamon setup (no tezam), which is confirmed as:

 ./stop-hive.sh --cleanup
 
export POSTGRES_LOCAL_PATH=~/.m2/repository/org/postgresql/postgresql/42.7.3/postgresql-42.7.3.jar
./build.sh -hive 4.2.0 -hadoop 3.4.1 -tez 0.10.5
./start-hive.sh --llap
docker compose --profile llap logs -f

beeline -u 'jdbc:hive2://localhost:10000/' -e "DROP table IF EXISTS iceberg_table; CREATE TABLE iceberg_table (id BIGINT) STORED BY iceberg; INSERT INTO iceberg_table VALUES(1);"

in which case tezam will simply fail to start, and the cluster works exactly the same way as post-HIVE-29411
be aware of the difference, in case of:

./build.sh -hive 4.2.0 -hadoop 3.4.1 -tez 0.10.5

the tez zookeeper-based registry and external sessions code are simply not present, that's why - regardless of the config - here it can work

@sonarqubecloud
Copy link
Copy Markdown

@deniskuzZ
Copy link
Copy Markdown
Member

FYI, there is a docker profile to create new Hive image: -Pdocker

</property>
<property>
<name>hive.server2.tez.initialize.default.sessions</name>
<value>false</value>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not initialize ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a yarn-based thing that HS2 initializes a few sessions on startup
in the standalone mode, HS2 has no control over them, just discovers them via ZK

<!--
A registry namespace prefix is a hardcoded prefix for Tez external sessions.
The actual tez.am.registry.namespace value is appended to this prefix.
Once hive can use the registry client that Tez provides (ZkAMRegistryClient), this property will be removed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a ticket for that? needs Tez upgrade?

Copy link
Copy Markdown
Contributor Author

@abstractdog abstractdog Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,5 @@
log4j.rootLogger=INFO, console
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, simply [tez-log4j.properties]

</property>
<property>
<name>hive.llap.daemon.umbilical.port</name>
<value>33333</value>
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default is 0, do we even need to touch it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, it works but needs a minor fix in LlapTaskCommunicator to handle the default properly:

-          conf.get(HiveConf.ConfVars.LLAP_TASK_UMBILICAL_SERVER_PORT.varname)
+          HiveConf.getVar(conf, HiveConf.ConfVars.LLAP_TASK_UMBILICAL_SERVER_PORT)

otherwise I get:

tezam         | 2026-04-17T12:23:28,727 INFO  AbstractService - Service org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator failed in state STARTED
tezam         | java.lang.NullPointerException: Cannot invoke "String.split(String)" because the return value of "org.apache.hadoop.conf.Configuration.get(String)" is null
tezam         | 	at org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.startRpcServer(LlapTaskCommunicator.java:261)
tezam         | 	at org.apache.tez.dag.app.TezTaskCommunicatorImpl.start(TezTaskCommunicatorImpl.java:140)
tezam         | 	at org.apache.hadoop.hive.llap.tezplugins.LlapTaskCommunicator.start(LlapTaskCommunicator.java:237)
tezam         | 	at org.apache.tez.dag.app.ServicePluginLifecycleAbstractService.serviceStart(ServicePluginLifecycleAbstractService.java:41)
tezam         | 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:195)
tezam         | 	at org.apache.tez.dag.app.TaskCommunicatorManager.serviceStart(TaskCommunicatorManager.java:165)
tezam         | 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:195)
tezam         | 	at org.apache.tez.dag.app.DAGAppMaster$ServiceWithDependency.start(DAGAppMaster.java:1857)
tezam         | 	at org.apache.tez.dag.app.DAGAppMaster$ServiceThread.run(DAGAppMaster.java:1878)

good catch, fixing it now

tezam:
profiles:
- llap
image: apache/hive:${HIVE_VERSION}
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it Ok to reuse hive image for TezAM? IDK just asking. is is the same downstream?

Copy link
Copy Markdown
Contributor Author

@abstractdog abstractdog Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to reuse, actually, we build both (Hive specific TezAM image + Tez AM image in Tez project), see the motivation on HIVE-29419
downstream we have only a hive image for TezAM, but upstream, the Tez project needs to have its own, TEZ-4682, most probably mimics the Tez Container mode (DAGAppMaster + TezChiild), not the LLAP


# LLAP daemon discovery
HIVE_ZOOKEEPER_QUORUM: zookeeper:2181
LLAP_SERVICE_HOSTS: '@llap0'
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, it's not correct, the only reason why it works is that I take care of it one level deeper in entrypoint:
https://github.com/apache/hive/pull/6435/changes#diff-b7d5fbeab2c6af92616ab371fa3f237867d60d7c81072045342cb44a8981bf90R45

fixing this

ARG TEZ_SNAPSHOT_REPO_URL=https://repository.apache.org/content/repositories/snapshots

# When snapshot jars are included, client version must match the snapshot version.
ENV TEZ_CLIENT_VERSION=${TEZ_SNAPSHOT_VERSION:-$TEZ_VERSION}
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used somewhere? can't find any usages here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed or another config to disable client version check in TezAM, otherwise we get:

tezam         | 2026-04-17T12:50:58,952 INFO  DAGAppMaster - Comparing client version with AM version, clientVersion=Unknown, AMVersion=1.0.0-SNAPSHOT
tezam         | 2026-04-17T12:50:58,953 ERROR DAGAppMaster - Incompatible versions found, clientVersion=Unknown, AMVersion=1.0.0-SNAPSHOT

I'm fine with disabling the check:
https://github.com/apache/tez/blob/2ca44b4ba3839ff4c2c2ab2ec95e34d687f61c09/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java#L455-L476

in which case tez site xml has to contain:

    <property>
        <name>tez.am.disable.client-version-check</name>
        <value>true</value>
    </property>

it's fine to keep TEZ_CLIENT_VERSION, but the comment should be changed then to:

Client version check is enabled by default in Tez AM, which is picked up from TEZ_CLIENT_VERSION env var.

to reflect that this is not because of the optional snapshot override

wdyt?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if version check makes sense, let's keep the env var

HIVE_VERSION=
HADOOP_VERSION=
TEZ_VERSION=
TEZ_SNAPSHOT_VERSION=
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we keep one required input in [build.sh]: -tez
If it ends with -SNAPSHOT, treat it as snapshot channel automatically - otherwise release (for example 0.10.5)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a release tarball must always be fetched (as the structural foundation, mostly because it contains all the dependencies for Tez), and snapshot jars are an optional overlay, placed on top to override the release-version tez jars while leaving lib/ intact
a snapshot version alone cannot form a self-contained, runnable Tez installation, as we don't have SNAPSHOT tez tarballs downloadable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to support SNAPSHOT overlays? Currently, the Docker image only allows released versions.

Copy link
Copy Markdown
Contributor Author

@abstractdog abstractdog Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short term, yes, otherwise the whole TezAM initiative cannot be tried out as here
Tez 1.0.0 could be months (or more) away, and I would like to be able to proceed with cloud-native initiative in the meantime
released/published Docker images must not contain SNAPSHOT jars, I agree, but a temporary Hive image built with Tez 1.0.0-SNAPSHOT can be utilized for distributed/perf test even in the next 1-2 months

Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense splitting this PR in 2 parts so we could revert the SNAPSHOT workaround once Tez 1.0.0 is released?

Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a cleaner approach could be to build a custom Tez Docker image on top of the nightly Tez image, extending it with the required Hive jars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants