This document describes how to import offline data from the CRM (i.e. sales or leads statuses) into BigQuery using JSON feed.

Imported offline data can be combined with other website data to improve ROI reporting and attribution by optimising not only based on online transactions but on final order/lead statuses in the CRM.

Before you begin

Creating JSON feed

SegmentStream can download JSON feeds and import its’ content into Google BigQuery.

Feed requirements

  • The feed should be generated by the customer’s server and be available at a specific URL, for example, https://example.com/feed.ndjson.
  • The feed should be in NDJSON (new line delimited JSON) format.
  • We highly recommend implementing Basic Auth to protect the feed from being publicly exposed.
  • (Optional, recommended) It should be possible to fetch daily feed data using date parameter added to the feed URL.
    For example, the feed with the URL https://example.com/feed.ndjson?date=20200101 should display transactions updates for the 1st of January 2020 This way instead of importing all transactions, only daily updates will be imported. This approach will be explained later on in this guide.

CRM data example

Imagine, you have the following data in your CRM:

createdAt updatedAt orderId currency total status userId
2020-01-10 2020-01-15 N1 USD 120.13 delivered U1
2020-01-15 2020-01-20 N2 USD 96.12 refunded U2
2020-01-20 2020-01-20 N3 USD 299.99 received U3

Where:

  • createdAt - time of the order creation in ISO 8601 standard;
  • updatedAt - time of the latest order status update in ISO 8601;
  • orderId - unique identifier of the order in your CRM system or website database. For other businesses it might be leadId or any other unique record identifier;
  • currency - order currency code;
  • total - Total cost of the order;
  • status - The most recent order status (e.g. received, shipped, delivered, refunded, etc);
  • userId - unique identifier of the user that made an order;

Note: here is provided minimal schema for stitching online and offline orders, but you can add any other fields you need.

There are two different approaches to how this CRM data can be exported to the NDJSON feed.

Approach 1: The feed contains the snapshot of all orders/leads from the CRM with its current statuses

Implementation of this approach will lead to full data overwrite every day with the new data. Your feed may contain either all order/leads from the CRM, or orders/leads for the last year.

This approach is simple to implement but will lead to the overhead in terms of lots of data transfer which might be not applicable if you have many transactions per day.

Sample feed data

Line Feed record
1 {"orderId":"N1", "createdAt":"2020-01-10","updatedAt":"2020-01-15","currency":"USD","total":120.13,"status":"delivered","userId":"U1"}
2 {"orderId":"N2", "createdAt":"2020-01-15","updatedAt":"2020-01-20", "currency":"USD","total":96.12,"status":"shipped","userId":"U2"}
3 {"orderId":"N3", "createdAt":"2020-01-20","updatedAt":"2020-01-20", "currency":"USD","total":299.99,"status":"pendingShipment","userId":"U3"}

Notice that feed contains different updatedAt values and absolutely all records from the example CRM data.

Approach 2: The feed contains only orders/leads that were updated during the day

If you choose to implement this option you should parse date query parameter of the feed request and respond only with necessary update records. For example, if a request is https://example.com?date=20200120, you should return only CRM records that were updated 20th of Jan, 2020.

Sample feed data

Line Feed record
1 {"orderId":"N2", "createdAt":"2020-01-15","updatedAt":"2020-01-20", "currency":"USD","total":96.12,"status":"shipped","userId":"U2"}
2 {"orderId":"N3", "createdAt":"2020-01-20","updatedAt":"2020-01-20", "currency":"USD","total":299.99,"status":"pendingShipment","userId":"U3"}

Note: This approach allows to significantly reduce the amount of transferred data by importing only order updates instead of all CRM orders.

Your order feed should contain only updated records, meaning that for each change in the status of existing transaction you create a new record in the feed with the new value of the updatedAt field and preserving the value of the createdAt field.

Enabling “JSON Feed” data source

  1. Go to SegmentStream admin panel▸.
  2. Open Data Sources ▸ Add Data Source.
  3. Select JSON Feed.
  4. Fill Username and Password for you Basic Auth credentials (if you decided not to implement Basic Auth you can enter any values).
  5. In the Feed URL field specify the URL where your feed can be found (don’t enter date query parameter).
  6. In the Destination table name field specify the BigQuery table name that will contain imported feed data.
  7. Enable Partition table by date option if you decided to use Approach 2 for the feed generation (recommended).

How feed import works

Everyday SegmentStream fetches data from the specified feed URL using provided credentials.

If you enabled Partition table by date option, an additional date query parameter with the value YYYYMMDD will be added to the URL, where:

  • YYYY - current year;
  • MM - current month;
  • DD - current date;

Example request as of 20 Jan, 2020 when Partition table by date is disabled:
GET https://example.com/feed.ndjson

Example request as of 20 Jan, 2020 when Partition table by date is enabled:
GET https://example.com/feed.ndjson?date=20200120

SegmentStream will automatically detect JSON schema and create a corresponding table in Googe BigQuery. For example, for the sample CRM data provided above the schema will looks like this:

Field Data Type Description
createdAt DATETIME Time of the order creation in ISO 8601 standard.
updatedAt DATETIME Time of the latest order status update in ISO 8601.
orderId STRING Unique identifier of the order in your CRM system or website database. For other businesses it might be leadId or any other unique record identifier.
currency STRING Order currency code.
total FLOAT Total cost of the order.
status STRING The most recent order status (e.g. received, shipped, delivered, refunded, etc).
userId STRING Unique identifier of the user that made an order.

When Partition table by date is disabled SegmentStream will create a table with the name project_id.dataset_name.destination_table_name. Every day this table will be fully overwritten with the new data.

When Partition table by date is enabled SegmentStream will create a table with the name project_id.dataset_name.destination_table_name_YYYYMMDD, where YYYYMMDD prefix represents the date of import. Every day SegmentStream will add a new table for each day.

What’s next

During the next 24 hours your data will be uploaded to the corresponding BigQuery table.

This document was last updated on May 8, 2020. If you notice any gaps, outdated information or simply want to leave some feedback to help us improve our documentation, please let us know!