What is the process we should be following to ensu...

Ion_Todd · 06-18-2023 07:07 AM

Hi all, what is the process we should be following to ensure we don’t get duplicate events? I found one reference to “event.disambiguation_key”, is this the way forward?

Gal_Polak1

Hi @Ion_Todd , There is no user based mechanism for deduplication of data on the Chronicle SIEM side of things at this time. Identical batches of logs are automatically deduplicated.
The disambiguating key is used when a single log outputs multiple UDM events, eg if a single log out outs two UDM events they will be tagged disambiguation key 1 and disambiguation key 2.
Just to confirm when you say Event you mean a parsed log into UDM and not a Rule detection alert, correct?

Ion_Todd

Thanks for the clarification @Gal_Polak1 Sorry for the incorrect wording, when I said event above, i’m actually talking about a raw log hitting the ingest API (and being parsed). So i’m sort of conflating things.

Are you able to share how identical batches are automatically deduplicated? An easy example for me to find right now is a custom log source where we don’t have an event timestamp in the raw log. These are 4 of the same raw log being replayed (I assume accidentally by us) into the ingest API. The search string i’ve used is the log’s ID

Adding a timestamp to the original log is possible, is the deduplication relying on id + timestamp?

View files in slack

What is the process we should be following to ensure we don’t get duplicate events?