Clean Room Usability Guide

Snowflake customers who want to activate on prospects in the ZMP) but do not want to share their PII directly with Zeta, basically want to easily match upon the prospects without their data leaving their own Snowflake environment.
The Clean Room environment allows Zeta clients to run Zeta’s secure matching UDF in the client’s Snowflake environment. Listed below are the essential elements of this solution:
Encapsulate waterfall matching logic in Secure UDTF “matcher“ – encodes references to Encrypted Zeta Universe View and Encrypted Staging View Share
UDTF joins views using masked fields
UDTF generates a table with matched users and other metadata that can be queried on the client-side (e.g. Zeta ID, match tier, etc.)
Client PII data never leaves the client’s snowflake environment
Client doesn’t have direct read access to Zeta Global proprietary identity graph data
Client doesn’t have read access to the “matcher” UDF but can execute permissions on the “matcher“ UDF
Architecture

Encrypted Staging Table Schema
1. Compile all consumer data to be matched upon into one Snowflake table on AWS US East 1 (N. Virginia).
2. While the schema of the client’s encrypted staging table/view that the “matcher“ UDF is expected to run on should include all of the following columns, it is not necessary that all the columns have values. All the columns need to be normalized.
Required schema:
Columns | Description |
---|---|
cuid | Unique client specific id for each row |
email_md5 | md5 hash of normalized raw email addresses |
postal_firstname | md5 hash of normalized raw postal firstname |
postal_lastname | md5 hash of normalized raw postal lastname |
postal_addressline1 | md5 hash of normalized raw postal address line 1 |
postal_addressline2 | md5 hash of normalized raw postal address line 2 |
postal_city | md5 hash of normalized raw postal city |
postal_state | md5 hash of normalized raw postal state |
postal_zip | md5 hash of normalized raw postal zip |
postal_zipplus4 | md5 hash of normalized raw postal zipplus4 |
Normalization steps that are performed before MD5 hashing the column data are as follows:
Remove white space and special characters in the beginning and end of the column value
Coalesce null column values to an empty string before appending the random_salt seed
Lower case all the input column values
Please make sure the client executes similar normalization and encryption at their end as these exact normalization steps are performed on Zeta’s snowflake account
Sample encrypted and normalized client view:
create or replace secure view STAGING_SALTED copy grants as
select a.cuid as cuid
,md5(lower(coalesce(trim(email_email,' -/\!@#$%^&*()_-=+'), ''))) as email_md5
,md5(lower(coalesce(trim(a.postal_firstname, ' -/\!@#$%^&*()_-=+'), '')||c.salt_seed)) as postal_firstname
,md5(lower(coalesce(trim(a.postal_lastname, ' -/\!@#$%^&*()_-=+'), '')||c.salt_seed)) as postal_lastname
,md5(lower(coalesce(trim(a.postal_addressline1, ' -/\!@#$%^&*()_-=+'), '')||c.salt_seed)) as postal_addressline1
,md5(lower(coalesce(trim(a.postal_addressline2, ' -/\!@#$%^&*()_-=+'), '')||c.salt_seed)) as postal_addressline2
,md5(lower(coalesce(trim(a.postal_city, ' -/\!@#$%^&*()_-=+'), '')||c.salt_seed)) as postal_city
,md5(lower(coalesce(trim(a.postal_state, ' -/\!@#$%^&*()_-=+'), '')||c.salt_seed)) AS postal_state
,md5(lower(coalesce(trim(a.postal_zip, ' -/\!@#$%^&*()_-=+'), '')||c.salt_seed)) AS postal_zip
,md5(lower(coalesce(trim(a.postal_zipplus4, ' -/\!@#$%^&*()_-=+'), '')||c.salt_seed)) AS postal_zipplus4
from CUSTOMER_UNIVERSE a join (select '<random_client_specific_seed>' as salt_seed ) c;
The client-specific seed to use in the above step will be provided by Zeta.
Reach out to David Konshuh (dkonshuh@zetaglobal.com), the Clean Room Product Manager, and Rameses Alfonzo (RAlfonzo@zetaglobal.com), DevOps, to coordinate on the client-specific seed.
Setup Instructions (Client-Side)
Part A: Creating a Secure SHARE on the Client’s Snowflake Account
The setup example assumes the following:
The secure view
STAGING_SALTED
was created in the schemaUS
on the database calledSALES
.The view will be shared with the account
zetaglobal
.The outbound Share
CLEANROOM_SHARE
will contain shared objects.The role
ACCOUNTADMIN
is used or an alternate role that cancreate share on account
and the ability to grant privileges on the objects to be shared.
Step | Notes |
---|---|
1. Create a secure share on Snowflake Account.
| Create the container for shared objects. |
2. Grant access to the database and schema.
| The Share requires |
3. Share the encrypted view.
| |
4. Grant access to the share to Zeta Global’s Snowflake account.
|
Part B: Create a Database from a Share Imported from Zeta Global
The example assumes the following:
Zeta Global will provide a Secure Share named
ZETA_CLEANROOM
to the Client.The Share includes the database
ZETA
, the schemaCLEANROOM,
and the functionMATCHER
.The Client will create an imported database named
ZETA_CLEANROOOM
.The role
ACCOUNTADMIN
or an alternate role is used to execute the statements below.
Step | Description |
---|---|
1. | List incoming and outgoing Shares in the account - an inbound share called |
2. | Create a database using the inbound share. |
3. | By default the role |
4. | Verify that the function is available on the imported database. |
5. | Test the |
6. | The resulting table should be shared back with Zeta Global for further activation or insights. |
Matcher UDF Execution
Sample matcher UDF execution SQL on the client-side:
create or replace table sales.us.cleanroom_matched as select * from table(ZETA_CLEANROOM.CLEANROOM.MATCHER());
Schema of the matched output that the “matcher“ UDF is expected to generate:
Columns | Data type | Description |
---|---|---|
cuid | VARCHAR | Unique client specific id for each row |
postal_firstname | VARCHAR | md5 hash of normalized raw postal firstname |
postal_lastname | VARCHAR | md5 hash of normalized raw postal lastname |
postal_addressline1 | VARCHAR | md5 hash of normalized raw postal address line 1 |
postal_addressline2 | VARCHAR | md5 hash of normalized raw postal address line 2 |
postal_city | VARCHAR | md5 hash of normalized raw postal city |
postal_state | VARCHAR | md5 hash of normalized raw postal state |
postal_zip | VARCHAR | md5 hash of normalized raw postal zip |
postal_zipplus4 | VARCHAR | md5 hash of normalized raw postal zipplus4 |
email_md5 | VARCHAR | md5 hash of normalized raw email addresses |
zeta_cid | NUMBER | zeta customer id if there is a match in Zeta’s Identity Graph otherwise returns NULL |
tier_id | VARCHAR | 1 – match found on fullname + address 2 – match found on email_md5 3 – match found on lastname + address 4 – match found on address only NA - no match found |
dt | TIMESTAMP_NTZ | date the connection between the linkages was established on Zeta’s side, NULL if no match found |
rank | NUMBER | rank of the connection, derived from recency, in the event that an input user identifier has multiple zeta_cid linked to it |
tier_id, dt, and rank are metadata fields for the matching run on the Zeta Identity Graph.
Match rate calculation:
select COUNT(distinct zeta_cid)*100/COUNT(*) as match_rate from sales.us.cleanroom_matched;
Snowflake Compute Cost Estimates
Clients will incur Snowflake compute costs to use the cleanroom. Cost estimate assumptions include:
Snowflake has a list price of $4/credit.
A Large (L) Snowflake warehouse consumes 8 credits/hour.
A 500K row input table may require up to 15 minutes to resolve in an L warehouse (as per Zeta’s testing).
Therefore, ~1M input records may take ~30 minutes of an L warehouse’s computing power, at an hourly rate of $4/credit * 8 credits/hour = $32/hour.