Skip to main content
Skip table of contents

Data Flow | Action Nodes

image-20240710-090552.png

Action Nodes are snippets of ETL configuration that can be applied to Standard Data Flows to help transform the data. They are the backbone of how Standard Data Flows work.

To enable an Action Node in your account, reach out to Support.

Here’s the list of Action Nodes available in your account:

Action Node

Description

Inputs

Outputs

Append Unique Id Property

Appends an UUID as an additional property

  1. Appends a new column with unique values in each row, using the name specified in the property field.

Input:

image-20240119-105009.png

 

For example,

an input file with

CODE
"Email", "first_name", "last_name"

will have an extra last column added in the output file:

CODE
"Email", "first_name", "last_name", "zetauniqueid"

Output:

image-20240119-105031.png

 

Current Timestamp Property 

Append current timestamp attribute in a column in the format YYYY-mm-dd HH:MM:SS

  1. Timestamp Column Name: Enter the column name in which the timestamp needs to be added

Input:

image-20240119-103921.png

The current timestamp will be added as an extra column into the output file with the column name entered in the action node.

Output:

image-20240119-103928.png

 

 

Source File Name Pattern

Define or alter the source file name pattern to manage which input files are chosen by the data flow.

  1. Source File Name Pattern: Define or alter the source file name pattern to manage which input files are chosen by the data flow.

Input:

image-20240119-105815.png

 

In the below screenshot, only those files in the source folder that match the pattern are copied to the destination.

Output:

image-20240119-105831.png

 

Rehash Column

Replace column with its rehashed values using the rehash algorithm dropdown value.

  1. Column Name: The column name in the input file whose values need to be rehashed.

  2. Rehash Algorithm: The rehash algorithm to use. Currently supported algorithms are md5 and sha256. To add a new algorithm, please contact Zeta Support.

Input:

image-20240118-092308.png

 

The output file will have its Column name column’s values rehashed to the algorithm chosen in the Rehash algorithm field.

Output:

image-20240118-092315.png

 

Trim Spaces

This function trims spaces from the property values.

  1. Input Property: Choose the column name whose row values you want to trim spaces from, both before and after.

Input:

image-20240119-105957.png

 

Removes spaces before and after the values of a specific column from a file.

Output:

image-20240119-110015.png

 

Input Filename Pattern

Regular expression to match the input file names that need to be picked up by the Data Flow.

Filename Match Expression: Enter the regex pattern to allow only specific files to be be picked up by the Data Flow.

The matched files are picked by Data Flow.

Output File Extension

Changes the extension of an output file without changing the actual file contents.
For example, rename "file.json" to "file.jsonl”

  1. Files with this extension ...: The extension that needs to be changed.

  2. ... are given this extension: The replacement value

Input:

image-20240119-103439.png

 

File such as filename.json will be renamed to filename.jsonl when the input provided are .json and .jsonl for Files with this extension and are given this extension input fields respectively.

Output:

image-20240119-103448.png

 

Output File Format (advanced)

  1. File Needs Header Row?: Sets the first row as a header if enabled.

  2. File Type: Sets the type of the file.

  3. Column Separator: Sets the column separator between data entries.

  4. Quote Style: None, All, Minimal, Non Numeric.

Concatenate a Fixed String With Property

Select a property to append a fixed string to and define the new property name

  1. Input Property Name: The input property whose values you want to read before appending a string.

  2. Output Property Name: The output property column name that contains the resulting concatenated string.

  3. Text to Append: The text to append to each row of values in input 1 and display it in input 2 above.

Remove JSON Attributes

Removes JSON attributes from your JSON or JSONL file.

  1. Attributes to be Removed: list the names of keys that need to be removed from the input file.

  2. Maximum Search Depth is an optimization. If you know that your file has only N levels within which these attributes will reside, you can put the value N here. The data flow will ignore looking at data beyond levels N.

  3. Multiline JSON? toggle allows the ETL to read pretty-printed JSONs that span multiple lines.

All key-value pairs matching the specified key name in the entered attributes field will be excluded from the output file.

Optional Global Property

Adds a new column with the same static value for each row

  1. Global Property Name: The new added column name.

  2. Global Property Value (Optional): Default value is ""

Adds a new column with the name from Input 1, where each row's value is populated from the Input 2 field.

Recursive Source Search

If enabled? is set to true then the data flow will recursively search for files in the folder selected within the source card.

enabled? toggle: allows the data flow to recursively search within the folder selected in the source card.

Enable data flows to recursively search in the source folder selected within the Source card.

Define/Rename/Omit/Copy Columns

Define, rename, omit, and copy columns in the output file. This action works for both CSV and JSONL files.

Define all columns with one comma delimited string: column_name1:action:new_name,...
"action", "new_name" are optional. Enter ALL column names even if these columns are not being edited.
Example - omit column from output: column1:omit
Example - rename column: column1:rename:user_id
Example - copy column: column1:copy:user_id

Example String: user_id, firstname:rename:first_name, middle_name:omit, last_name: copy:family_name

Output file will contains the columns as determined by the input string.

Define 2 File Columns, Define 3 File Columns, …, Define 26 File Columns

Use for CVS files that contain exactly x columns. Columns must be defined in the correct order.
*Combined 20 action nodes for exact number of output fields. Items 48-67 moved to bottom.

Here is how that column definition node works:

  • If a column needs to be copied to a new name

    • then set Output File Column Name to the new name

    • if the column doesn’t need to be copied, then you can set Output File Column Name to nil

  • If a column needs to be removed

    • then set Omit From Output File to true

  • If a column needs to be renamed

    • then set Output File Column Name to the new name

    • and set Omit From Output File to true

    • this will copy the column and remove the original

image-20240119-210714.png
  1. Column n: Name: Case-sensitive

  2. Column n: Data Type: Choose from the dropdown

  3. Column n: Maximum Length: Zero for no limit

  4. Column n: Nullable? toggle: On/Off

  5. Column n: Omit from Output File toggle: On/Off

  6. Column n: Output File Column Name: Put the input file’s name or use a new one if you want to rename

  7. Column n: Validation Pattern

Output file will be based on the inputs entered by you for each column.

Multi-file Snowflake Export

Additional Snowflake Properties.

  1. Turn the Export to Single File? toggle on/off based on your desired output

  2. Maximum File Size?: Total size of the multiple files the Snowflake export will be split into.

Input:

image-20240119-105405.png

 

This action node splits the export data from a snowflake connection into multiple files.

This action only works with snowflake direct export

Output:

image-20240119-105444.png

 

GPG Encrypt

GPG Encryption. Key Password is required. Please reach out to Support to set this action node.

  1. Key Path: Provide the path to the S3 location where the key is stored.

  2. Key Password: The password to encrypt the input file

  3. Recipients: If there is no recipient email to be added, use an 8-digit key id (last 8 of the 16 hex keyid). For e.g., 75BB47D0

Once you have the input path to the Key, the Key Password and the Recipients list, the output file will be encrypted. Output file type will be GPG.

GPG Encrypt (passwordless)

GPG Encryption. Key Password is not required. Please reach out to Support to set this action node.

image-20240119-210404.png
  1. Key Path: Provide the path to the S3 location where the key is stored.

  2. Recipients: If there is no recipient email to be added, use an 8-digit key id (last 8 of the 16 hex keyid). For e.g., 75BB47D0

Once you have the input path to the Key, the Key Password and the Recipients list, the output file will be encrypted. Output file type will be GPG.

GPG Decrypt

GPG Decryption. Key Password is required. Please reach out to Support to set this action node.

image-20240119-210330.png
  1. Key Path: Provide the path to the S3 location where the key is stored.

  2. Key Password: The password to decrypt the input file

Once you have the path to input in the Key Path field and Key password, the output file will be decrypted

GPG Decrypt (passwordless)

GPG Decryption. Key Password is not required. Please reach out to Support to set this action node.

  1. Key Path: Provide the path to the S3 location where the key is stored.

Once you have the path to input in the Key Path field, the output file will be decrypted

 

Input Compression Type

Select the compression type of your input file so that the Data Flow can correctly uncompress it and read the file contents.

Compression Type: zip or gzip

Uncompressed file

Concatenate Timestamp With Property

Select a property to append a unique timestamp to and define the new property name.

  1. Input Property Name: Select a property to append a unique timestamp to and define the

  2. Output Property Name: new property name

Input:

image-20240119-105909.png

 

Output file will have a new column added with the new property name

Output:

image-20240119-105928.png

 

Input File Format

Sets input file type, quote character, row separator and column separator values to ensure the data flow correctly reads the input files.

image-20240117-185858.png
  1. File Type: the type of file such as csv, jsonl

  2. Quote Character: the character to represent quotes. Eg: double-quote (“) or single quote (')

  3. Row Separator: character that separates the rows. Eg: newline (“\n”)

  4. Column Separator: the column separator character such as pipe (|) or comma (,)

Input:

image-20240119-104736.png

 

 

For any input file with values pipe-delimited will retain the pipe delimiter in the output file with an addition of double-quote field frames.

Output:

image-20240118-092207.png

 

Input File Format (advanced)

Advanced version of the Input File Format node to inform the the Data Flow the content of the input file.

Has Header Row: whether the first row is a header row.

File Type: the type of file such as csv, jsonl

Column Separator: the column separator character such as pipe (|) or comma (,)

Quote Character: the character to represent quotes. Eg: double-quote (“) or single quote (')

Row Separator: character that separates the rows. Eg: newline (“\n”)

Encoding: the encoding of file. Eg: UTF-8-SIG

Required Encoding: the encoding this file should comply with. Eg: UTF-8-SIG

Extension: The extension of the file, that appears after the dot after the file name. Eg: csv, jsonl

Source Archive Path

Source files will be moved to this path once they have been retrieved.

  1. Source Archive Path: Location and name of the archive folder the source file is to be moved to

This node allows you to move the source file into an archive folder. Please note that the folder should be located inside the source folder selected in the Source card.

Output:

image-20240119-105746.png

Custom Input File Type

A text field to inform the data flow the input file content type. By default, all files are considered of type CSV by the data flow.

File type: The input file type such as csv, json, jsonl, txt, pdf, etc.

Input file is passed as is to the next node in the flow.

Input File Type

A dropdown to inform the data flow whether the input file content are CSV or JSON. By default, all files are considered of type CSV by the data flow.

  1. Input File Format: Select wither CSV or JSON based on your input file type.

Input file is passed as is to the next node in the flow.

Output File Type

JSONL support only (CSV is default)

  1. Output File Type: A dropdown to inform the data flow whether the output files should be of type JSONL. Note that by default, all output files are considered of type CSV.

Input:

image-20240118-092239.png

 

The output file type will be based on the value chosen in the dropdown of Output File Type

Output:

image-20240118-092247.png

 

Split by Line Count

Split into multiple output files by line count.

  1. Lines per File: A integer that determines how many rows a file can contain. When the output file has more rows, the file will be split.

Input:

image-20240119-103642.png

 

Multiple output files in which all files will contain at most Lines per File lines. Note that the last file may contain fewer lines.

Output:

image-20240119-103651.png
image-20240119-103659.png

Split File By Column Value

A file is created for each distinct value in the column.

  1. Column Name: the column name that you want to use to split the file

  2. Max Column Distinct Values: the maximum number of distinct values expected in the Column Name column

Input:

clean_test_records_out.csv

image-20240119-104344.png

 

For each unique value in Column Name column, a file will be created. Please ensure that the number of distinct values are less than the Max Column Distinct Values value.

Output:

clean_test_records_out.csv_NY_14638922748329.csv

clean_test_records_out.csv_NY_14638922748330.csv

clean_test_records_out.csv_MN_14638922748335.csv

clean_test_records_out.csv_GA_14638922748346.csv

clean_test_records_out.csv_OK_14638922748351.csv

De-dupe File

De-duplicate records by unique values of selected properties.

  1. Choose one column from the input file to dedupe, and this action will keep the first tuple that has the data and delete all others.

  2. property_names: Email

  3. input_file_type: CSV

  4. output_format: CSV

Input:

image-20240119-104938.png

 

The output file will have no duplicate rows

Output:

image-20240119-104956.png

 

Single Output File

Merge multi-file output to a single file.

  1. Single Output File: a toggle when switched ON means that multiple files will give out only a single success email.

Input:

image-20240119-103354.png

 

When the Single Output File toggle is switched on, the output of multiple input files will be merged into one.

Output:

image-20240119-103403.png

 

Non-Standard UTF-8 File Encoding

Uses UTF-8-SIG encoding to reduce likelihood of parsing errors in the input files.

  1. Input Data Flow Name: UTF_8_Test

Input:

image-20240119-104618.png

 

File such as UTF-8-BOM will be converted to UTF-8 when the input goes through a non-standard UTF-8 File Encoder

Output:

image-20240119-104627.png

JSON Columns Re-serializer

Only relevant for BQ->JSONL exports with JSON column values (multiple columns)

Big Query refers to querying Snowflake or another external database using SQL to obtain the data used in an audience or data flow. If your desired data flow that you're trying to test is not using such a query, there is no impact to your testing for that message.

Add Single Column Header

Set first column header name.

 

  1. New Column Name: All the column names in the input file whose values need to be distributed into different columns.

Input:

image-20240119-104042.png

 

Every unique value in Column Name column separated by commas will become individual columns.

Output:

image-20240119-104102.png

 

Use CSV Implicit Header

  1. Enabled: True/False

Input:

image-20240119-103751.png

 

 

This action node adds 'double quote comma' to the output file. The same action takes place to both the input files with and without headers.

Output:

image-20240119-103804.png

 

 

Enable/disable header validation

Header validation should only be enabled if required.

  1. Enable/disable header validation: a toggle when switched ON means that the headers of the input file will be considered when applying the action workflow.

Input:

image-20240119-103814.png

 

 

In both cases of the Needs header validation? toggle being switched on or off, the action node adds 'double quote comma' to the output file.

Output(when enabled):

image-20240119-103822.png

Output(when disabled):

image-20240119-103830.png

 

Override Start Date

Can be used to override a flow's start date, particularly for file-based data flows such as those using sources like SFTP, Amazon S3, etc. Note that this node is not necessary for Snowflake and BigQuery, as they already support this feature.

  1. Start Date: Use this node to set the start date of a data flow.

You can use the Override_Start_Date to set the dates for file-base flows just like for the database-based flows.

Run Frequency/Schedule

Sets the Flow's run frequency to either Daily or Hourly

 Run Schedule: Choose the schedule between Daily or Hourly.

 The Flow will be executed at the chosen frequency.

Cron-based Run Frequency/Schedule

Sets the Flow's run frequency with cron string.

  1. Cron String: Uses cron syntax when scheduling the data flow

 

For example,

0 12 * * * : Fire at 12:00 p.m. (noon) every day

15 10 * * * : Fire at 10:15 a.m. every day

0-5 14 * * * : Fire every minute starting at 2:00 p.m. and ending at 2:05 p.m. every day

Success Email Notifications

Receive Success Notification Emails when the Data Flow execution has succeeded.

  1. Success Notifications: A multi-value list that can accept multiple email addresses that will receive an email when this data flow is successfully executed.

Success email notification to the email addresses entered in the Success Notifications input field.

image-20240118-092340.png

 

Failure Email Notifications

Receive Failure Notification Emails when the Data Flow errors.

  1. Failure Notifications: A multi-value list that can accept multiple email addresses that will receive an email when this data flow fails to execute.

Failure email notification to the email addresses entered in the Failure Notifications input field.

Output:

image-20240122-160822.png

Add Global Property to file

Define a new property with a static value that is the same for all records in the file.

  1. Global Property Name: The property’s name that will be added as a global property.

  2. Global Property Value: The static value that the property should take.

Input:

image-20240119-103537.png

 

A file will be generated with an additional column whose name will be same as the value entered for Global Property Name field and value on each row will be the value entered in Global Property Value field.

Output:

image-20240119-103601.png

 

Rename Output File

Set final file name based on regular expression pattern.

  1. File Name Match Pattern: The file name that needs to be changed

  2. File Name Replacement Value: The name you want the output file to have

Input:

image-20240119-104859.png

 

 

File with match pattern Test_Data_flow02142023_7v13_result_Rename_output_file.csv will be renamed to Test.csv when the replacement_value entered is Test.

Output:

image-20240119-104850.png

 

Validate Source Row Count (jsonl)

This function validates whether the number of rows match with the count value mentioned within the same JSONL file.

Screen Shot 2024-01-24 at 12.25.21 PM.png
  1. Expected Count Attribute Name: the row count entity name

  2. Notify on Failure Emails : DL that receives email when this row verification process fails.

  3. Notify on Success Emails: DL that receives email when this row verification process succeeds.

Input:

Screen Shot 2024-01-24 at 12.33.59 PM.png

 

The output file generated will be named validated_<inputFileName> where <inputFileName> is the input file that validation was performed upon, if the validation succeeds. If the validation fails, output file will NOT be generated. In either case emails that were entered in input fields 2 and 3 will get emails.

Output:

Screen Shot 2024-01-24 at 12.35.03 PM.png

 

Validate Source Row Count (csv)

This function validates whether the number of rows match with the count value mentioned within the same CSV file.

Screen Shot 2024-01-24 at 12.27.08 PM.png
  1. File Contains Headers : toggle to inform the ETL whether the first row in the CSV file is a header.

  2. Include Header in Count : toggle to decide whether you want to count the header row. Note that this toggle will have an effect ONLY IF the toggle in point 1 above was switched ON.

  3. Row Count Position: 0-based position of where the row count value exists in the CSV file. For example, if the count is in column C in Excel, you will enter 2 (and NOT 3)

  4. Notify on Failure Emails : DL that receives email when this row verification process fails.

  5. Notify on Success Emails : DL that receives email when this row verification process succeeds.

Input:

Screen Shot 2024-01-24 at 12.35.38 PM.png

 

The output file generated will be named validated_<inputFileName> where <inputFileName> is the input file that validation was performed upon, if the validation succeeds. If the validation fails, output file will NOT be generated. In either case emails that were entered in input fields 4 and 5 will get emails.

Output:

Screen Shot 2024-01-24 at 12.35.38 PM.png

Add Row Count Metadata (jsonl)

This function adds a new entry as a last row to represent the number of rows in the file.

Screen Shot 2024-01-24 at 12.27.32 PM.png
  1. Expected Count Attribute Name : the name of the row count property in the JSON file.

Input:

Screen Shot 2024-01-24 at 12.36.16 PM.png

The output file generated will be of same name as the input file. Of course, the output file will have a new row representing the count value.

Output:

Screen Shot 2024-01-24 at 12.39.04 PM.png

Add Row Count Metadata (csv)

This function adds a new entry as a last row to represent the number of rows in the file.

Screen Shot 2024-01-24 at 12.38.35 PM.png
  1. Include Header in Count : Toggle to inform ETL on whether to count the header row or not.

  2. Row Count Position : The position in CSV file where the row count should be entered. Note that the position is 0-based. For example, if you want the count to appear in column C in Excel, you will enter 2 (and NOT 3)

Input:

Screen Shot 2024-01-24 at 12.36.35 PM.png

The output file generated will be of same name as the input file. Of course, the output file will have a new row representing the count value.

Output:

Screen Shot 2024-01-24 at 12.38.03 PM.png

 

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.