Importing offline data using JSON feed
This document describes how to import offline data from the CRM (i.e. sales or leads statuses) into BigQuery using JSON feed.
Imported offline data can be combined with other website data to improve ROI reporting and attribution by optimising not only based on online transactions but on final order/lead statuses in the CRM.
Before you begin
- Make sure that BigQuery is connected to SegmentStream.
Creating JSON feed
SegmentStream can download JSON feeds and import its’ content into Google BigQuery.
Feed requirements
- The feed should be generated by the customer’s server and be available at a specific URL, for example,
https://example.com/feed.ndjson
. - The feed should be in NDJSON (new line delimited JSON) format.
- We highly recommend implementing Basic Auth to protect the feed from being publicly exposed.
- (Optional, recommended) It should be possible to fetch daily feed data using date parameter added to the feed URL.
For example, the feed with the URLhttps://example.com/feed.ndjson?date=20200101
should display transactions updates for the 1st of January 2020 This way instead of importing all transactions, only daily updates will be imported. This approach will be explained later on in this guide.
CRM data example
Imagine, you have the following data in your CRM:
createdAt | updatedAt | orderId | currency | total | status | userId |
---|---|---|---|---|---|---|
2020-01-10 | 2020-01-15 | N1 | USD | 120.13 | delivered | U1 |
2020-01-15 | 2020-01-20 | N2 | USD | 96.12 | refunded | U2 |
2020-01-20 | 2020-01-20 | N3 | USD | 299.99 | received | U3 |
Where:
createdAt
- time of the order creation in ISO 8601 standard;updatedAt
- time of the latest order status update in ISO 8601;orderId
- unique identifier of the order in your CRM system or website database. For other businesses it might beleadId
or any other unique record identifier;currency
- order currency code;total
- Total cost of the order;status
- The most recent order status (e.g.received
,shipped
,delivered
,refunded
, etc);userId
- unique identifier of the user that made an order;
Note: here is provided minimal schema for stitching online and offline orders, but you can add any other fields you need.
There are two different approaches to how this CRM data can be exported to the NDJSON feed.
Approach 1: The feed contains the snapshot of all orders/leads from the CRM with its current statuses
Implementation of this approach will lead to full data overwrite every day with the new data. Your feed may contain either all order/leads from the CRM, or orders/leads for the last year.
This approach is simple to implement but will lead to the overhead in terms of lots of data transfer which might be not applicable if you have many transactions per day.
Sample feed data
Line | Feed record |
---|---|
1 | {"orderId":"N1", "createdAt":"2020-01-10","updatedAt":"2020-01-15","currency":"USD","total":120.13,"status":"delivered","userId":"U1"} |
2 | {"orderId":"N2", "createdAt":"2020-01-15","updatedAt":"2020-01-20", "currency":"USD","total":96.12,"status":"shipped","userId":"U2"} |
3 | {"orderId":"N3", "createdAt":"2020-01-20","updatedAt":"2020-01-20", "currency":"USD","total":299.99,"status":"pendingShipment","userId":"U3"} |
Notice that feed contains different
updatedAt
values and absolutely all records from the example CRM data.
Approach 2: The feed contains only orders/leads that were updated during the day
If you choose to implement this option you should parse date
query parameter of the feed request and respond only with necessary update records. For example, if a request is https://example.com?date=20200120
, you should return only CRM records that were updated 20th of Jan, 2020.
Sample feed data
Line | Feed record |
---|---|
1 | {"orderId":"N2", "createdAt":"2020-01-15","updatedAt":"2020-01-20", "currency":"USD","total":96.12,"status":"shipped","userId":"U2"} |
2 | {"orderId":"N3", "createdAt":"2020-01-20","updatedAt":"2020-01-20", "currency":"USD","total":299.99,"status":"pendingShipment","userId":"U3"} |
Note: This approach allows to significantly reduce the amount of transferred data by importing only order updates instead of all CRM orders.
Your order feed should contain only updated records, meaning that for each change in the status of existing transaction you create a new record in the feed with the new value of the updatedAt
field and preserving the value of the createdAt
field.
Common mistakes
-
Object instead of newline delimited JSON:
{ users: [ {"orderId": "123", "total": 100} {"orderId": "124", "total": 250} {"orderId": "125", "total": 300} ] }
-
Pseudo-array instead of newline delimited JSON:
[ {"orderId": "123", "total": 100} {"orderId": "124", "total": 250} {"orderId": "125", "total": 300} ]
-
Pseudo-object instead of newline delimited JSON:
{ {"orderId": "123", "total": 100} {"orderId": "124", "total": 250} {"orderId": "125", "total": 300} }
-
Objects separated by commas instead of newline delimited JSON:
{"orderId": "123", "total": 100}, {"orderId": "124", "total": 250}, {"orderId": "125", "total": 300}
-
Different data types in different rows:
{"orderId": "123", "total": "100"} {"orderId": "124", "total": 250} {"orderId": "125", "total": 300}
total can’t be a STRING and INTEGER
Enabling “JSON Feed” data source
- Go to SegmentStream admin panel▸.
- Open Data Sources ▸ Add Data Source.
- Select JSON Feed.
- Fill Username and Password for you Basic Auth credentials (if you decided not to implement Basic Auth you can enter any values).
- In the Feed URL field specify the URL where your feed can be found (don’t enter
date
query parameter). - In the Destination table name field specify the BigQuery table name that will contain imported feed data.
- Enable Partition table by date option if you decided to use Approach 2 for the feed generation (recommended).
How feed import works
Everyday SegmentStream fetches data from the specified feed URL using provided credentials.
If you enabled Partition table by date option, an additional date
query parameter with the value YYYYMMDD
will be added to the URL, where:
YYYY
- current year;MM
- current month;DD
- current date;
Example request as of 20 Jan, 2020 when Partition table by date is disabled:
GET https://example.com/feed.ndjson
Example request as of 20 Jan, 2020 when Partition table by date is enabled:
GET https://example.com/feed.ndjson?date=20200120
SegmentStream will automatically detect JSON schema and create a corresponding table in Googe BigQuery. For example, for the sample CRM data provided above the schema will looks like this:
Field | Data Type | Description |
---|---|---|
createdAt |
DATETIME |
Time of the order creation in ISO 8601 standard. |
updatedAt |
DATETIME |
Time of the latest order status update in ISO 8601. |
orderId |
STRING |
Unique identifier of the order in your CRM system or website database. For other businesses it might be leadId or any other unique record identifier. |
currency |
STRING |
Order currency code. |
total |
FLOAT |
Total cost of the order. |
status |
STRING |
The most recent order status (e.g. received , shipped , delivered , refunded , etc). |
userId |
STRING |
Unique identifier of the user that made an order. |
When Partition table by date is disabled SegmentStream will create a table with the name project_id.dataset_name.destination_table_name
. Every day this table will be fully overwritten with the new data.
When Partition table by date is enabled SegmentStream will create a table with the name project_id.dataset_name.destination_table_name_YYYYMMDD
, where YYYYMMDD
prefix represents the date of import. Every day SegmentStream will add a new table for each day.
What’s next
During the next 24 hours your data will be uploaded to the corresponding BigQuery table.
This document was last updated on October 19, 2020. If you notice any gaps, outdated information or simply want to leave some feedback to help us improve our documentation, please let us know!