FAQs (Data)

“It’s not the answer that enlightens, but the question.” - Eugene Jonesco

With the aim to respond to your needs more quickly and appropriately, we have jotted down the most frequently asked questions about Zeta Data.

Question	Answer
Why am I seeing errors about size thresholds being exceeded in my data flow, even though daily volumes are well below the limit?	The maximum supported file size is 5 GB (5368709120 bytes). Please ensure that both the input file and the output file remain within this limit. If the source file exceeds the permitted size, it may trigger threshold errors even when overall daily volumes are lower.
What encryption does ZMP use for the final email with the download link?	The email itself is typically protected using TLS (Transport Layer Security) during transmission. However, the download link points to a file hosted on S3, and anyone with the link can access it — meaning the file is not access-restricted via authentication. That said, there is likely encryption at rest applied on the S3 backend, as part of standard AWS security practices. For more detailed encryption specifics (e.g., key management or access control), it's best to consult the Data Connectivity team.
Can we have specific event properties masked within Snowflake? From a functionality standpoint, is this something we can support?	Snowflake natively supports column-level data masking through masking policies. However, it does not support fine-grained masking of individual fields within semi-structured object columns (such as JSON properties inside a VARIANT column) using native masking policies. You can define masking policies based on Snowflake roles to return: `NULL` Hashed values A fixed string While masking entire columns is straightforward, masking specific keys inside a JSON object is not natively supported and would require additional handling through custom views, parsing logic, or transformation layers outside of Snowflake’s standard masking capabilities.
Why is my Data Flow not generating files anymore?	This might happen if your selected query does not return a JSON, which is causing the data flow to fail. To resolve this issue, the Snowflake query needs to be fixed to ensure it returns the correct JSON format.
We have had the following data flow alerts: CODE Partner: zetaeu Site Id: zeta-eu Feed: zetaeucustomersummaryongoing_128 Service Name: FILE_PROCESSOR Execution Date: 2024-11-27 11:47:23 File Name: 167_zetaeucustomersummaryongoing_128_1732706129.jsonl_3_0_1.jsonl Source File Name: 167_zetaeucustomersummaryongoing_128_1732706129.jsonl_3_0_1.jsonl Log URL: SELECT * from SOR.PUBLIC.STORAGE_ERROR where file_metadata:filename = '167_zetaeucustomersummaryongoing_128_1732706129.jsonl_3_0_1.jsonl' and file_metadata:last_modified = '1.732706647e+17' and in_topic = 'ingress' and coalesce(retry_data:retried, false) = false Error: Traceback: and CODE Partner: zetaeu Site Id: zeta-eu Feed: zetaeucustomersummaryongoing_128 Service Name: FILE_PROCESSOR Execution Date: 2024-11-27 11:47:23 File Name: 167_zetaeucustomersummaryongoing_128_1732706129.jsonl_2_6_1.jsonl Source File Name: 167_zetaeucustomersummaryongoing_128_1732706129.jsonl_2_6_1.jsonl Log URL: SELECT * from SOR.PUBLIC.STORAGE_ERROR where file_metadata:filename = '167_zetaeucustomersummaryongoing_128_1732706129.jsonl_2_6_1.jsonl' and file_metadata:last_modified = '1.732706618e+17' and in_topic = 'ingress' and coalesce(retry_data:retried, false) = false Error: Traceback: Can you please advise if this means data has not been exported by the data flow? At the moment we’re not sure what to do in response to this error?	`Storage_ERROR` occurs when the run fails. The error could be because of anything like a network failure or anything that might have disrupted the data flow run. However, since there is a retry logic built into the data flows, that made the run load successfully.
My data flow was used to carry out a one-time historical load of events, handling an estimated 65 million records. Unfortunately, it failed during execution and didn't produce the expected file name, as shown in the provided error log. What might have led to this problem?	The reason is that the max file size (5368709120) was exceeded for unloading single file mode.
Why does my file indicate a 'success' status yet fail to appear in the designated destination folder?	If the file isn't showing up in the destination, we can check on our end to see if the sink is active or inactive for you.
Why is my file in a report being skipped by data flows?	If a file is skipped, it could be one of the following: the file does not match the file name restrictions set in the data flow the file is not the correct type e.g. the data flow defaults to "csv" types, but a "jsonl" file was dropped the file is older than when the flow was last activated and the "catchup" flag has been disabled the file is not in the correct source directory there is a decryption/decompression error There could probably be some more scenarios missed in the list above. To check to see why the file was skipped, check these scenarios. do the file restrictions in the flow match the filename? is the file type the same as what the data flow expects is the file compressed/encrypted? is the data flow configured to handle that
Why did my data flow run a day before the scheduled time?	By default, there are certain situations where a data flow will automatically perform a "catch-up" execution if it has missed the last scheduled run. We have recently added a feature that allows you to disable this catch-up execution. This option can be enabled when setting the schedule for the data flow by configuring an account feature flag. The flag is called "Strict Schedule Node" and can be utilized when the flow needs to fetch a file only at a specific time (instead of as soon as the file is dropped).
Can we re-map email MD5?	Yes, but requires some pre-work to enable the field. Create a file and load to ZMP using the `email_md5_id` column - this should be imported as-is, i.e. do not create a new mapping for this field The file should contain `email_md5_id` and `user_id` and at least one valid row. the contents can be a sample row which can be deleted later on (see: Delete a Person if you need to delete the record) Once the file is loaded, you should now be able to select `email_md5_id` to be remapped
Do we support single and double quotes as text qualifiers?	Double quotes (`"`) SUPPORTED Single quotes (`'`) NOT SUPPORTED
Are all properties represented?	Yes, implicit and existing people properties will be represented in the drop down list
Can we change a data type from this mapping?	No. but the standard rules apply for imports, i.e. we will infer the data type of a new field or abide by existing data types where possible
Can we map into contacts, e.g. “email” to “contact value”	Not at this time, but have added this to the feedback list above
Can we map to objects?	Yes, from column to individual object property but not from object to object or object to property. Keep in mind that we’d update the whole object even if only one element is included, so there is potential to nullify data
How should we handle cases where both `user_id` and `email` are present in the file?	Currently, we only support mapping into `uid` so the recommendation would be to NOT map those fields - in cases where those are not named correctly and both present, we will unfortunately not support imports via both at this time