Joining data

Last updated:

|

The real power of the data warehouse is the ability to combine data from multiple tables in a single query. Joins enable you to do this. They enable you to choose fields that act as connections between PostHog and external sources.

Table joins

You can join external data on existing PostHog schemas and other external data tables. These joins are saved and interpreted anytime they're accessed on the origin table.

To define a join, go to the SQL editor, click the three dots next to your source table, and click Add join. Here you define the source table key, joining table, and joining table key as well as how the fields are accessed.

For example, if you import your Stripe data, you can define a join between the PostHog's events table's distinct_id key and the stripe_customer table's email key. You can then access the stripe_customer table through the events table like SELECT stripe_customer.id FROM events.

Create a join

Once joined, source properties can be used in filters, breakdowns, and SQL expressions.

To edit or delete a table join, click the three dots next to your source table, click View table schema, click the three dots next to your joined table, and select Edit or Delete.

Person joins

Person joins are a special type of table joins. They are joins on the persons table in PostHog. When you join external data on this table, we enable you to use it like a person filter in insights.

Note: To see extended person properties in filters, be sure to start the join on the persons table as the source table. To do that, go to the SQL editor, find the persons table in the left column, click the three dots next to the persons table, and click Add join.

Filter on joined person properties

Note: Be sure that your joined keys actually match. For example persons.id returns a UUID, even if you use an email as a distinct_id when capturing events. You might need to add a person property like email.

Query joins

If you only want to join data together for a single insight or query, you can use SQL commands like WHERE IN and JOIN SQL commands.

For example, to get a count of events for your Hubspot contacts you can filter events.distinct_id by email FROM hubspot_contacts like this:

SELECT COUNT() AS event_count, distinct_id
FROM events
WHERE distinct_id IN (SELECT email FROM hubspot_contacts)
GROUP BY distinct_id
ORDER BY event_count DESC

You can also use a JOIN such as INNER JOIN or LEFT JOIN to combine data. For example, to get a count of events for your Stripe customers you can INNER JOIN on distinct_id and email like this:

SELECT events.distinct_id, COUNT() AS event_count
FROM events
INNER JOIN prod_stripe_customer ON events.distinct_id = prod_stripe_customer.email
GROUP BY events.distinct_id
ORDER BY event_count DESC

To learn more about joining data, see our guide on joining data.

Questions? Ask Max AI.

It's easier than reading through 814 pages of documentation

Community questions

Was this page useful?

Next article

Creating materialized views

Views can be materialized and stored in the PostHog data warehouse. This means that the view is precomputed, which can significantly improve query performance. This is useful for expensive and frequently used queries like KPI dashboards or embedded analytics queries. To materialize a view, go to the SQL editor , select the Materialization tab below the query, click Save and materialize , and give it a name without spaces. You can then query the view like any other table. Once materialized…

Read next article