How to Build a Custom "Cross-Sell" Engine Based on Real Purchase Data in Magento 2
How to Build a Custom "Cross-Sell" Engine Based on Real Purchase Data in Magento 2
Let’s be honest: Magento’s native cross-sell, related and upsell features are useful, but they’re often static and manual. If you want smarter, revenue-driving suggestions that reflect how your customers actually buy, you need a cross-sell engine built on real purchase data. In this post I’ll walk you through a practical, step-by-step approach to build a custom cross-sell engine for Magento 2. I’ll keep the tone casual — think of this as a lunchtime chat with a teammate — and I’ll include concrete code snippets, SQL, and architecture tips you can reuse on magefine.com stores or any Magento 2 installation. You’ll learn how to: - Analyze existing purchase data to identify cross-selling patterns - Build an algorithm using co-purchase counts and simple normalization - Integrate recommendations with Magento 2 native cross-sell links or a custom UI block - Optimize performance for large datasets - Run A/B tests and measure the impact on Average Order Value (AOV) Why build this? Because data-driven cross-sells typically beat manual rules: they adapt to product bundles that naturally occur, and they find niche pairings you wouldn’t guess by hand.
Overview of the design
High level, the engine has these parts: - Data layer: aggregate co-purchase counts from sales_order_item - Scoring layer: convert counts into scores (e.g., normalized frequency or PMI) - Storage layer: store recommendations (custom table or product links) - Delivery layer: display recommendations on product page, cart page, or checkout - Measurement + experimentation: A/B tests and instrumentation to measure AOV lift We’ll build a simple, robust version and highlight places to improve (machine learning, session-based models, real-time updates) as your data and traffic grow.1) Analyze your purchase data to find cross-sell patterns
First step: explore the data. You need to know how often products are bought together. Magento stores order items in sales_order_item — this table is your friend. A quick SQL to get raw co-purchase counts (orders where both items appear):SELECT
oi1.product_id AS product_id,
oi2.product_id AS co_product_id,
COUNT(DISTINCT oi1.order_id) AS orders_together
FROM sales_order_item oi1
JOIN sales_order_item oi2
ON oi1.order_id = oi2.order_id
AND oi1.product_id <> oi2.product_id
WHERE oi1.parent_item_id IS NULL AND oi2.parent_item_id IS NULL
GROUP BY oi1.product_id, oi2.product_id
ORDER BY orders_together DESC
LIMIT 100;
Notes:
- Filter out parent_item_id to ignore bundle/child items duplicates.
- Use COUNT(DISTINCT order_id) to count unique orders.
- This raw table can be huge for stores with many orders: we’ll discuss scaling below.
This gives you quick insight: which products co-occur most often. But counts alone are biased by very popular products — we’ll normalize later.
Exploratory checks
- Distribution: how many pairs have more than N co-purchases? - Lifetime vs Windowed: use last 12 months or last 90 days depending on seasonality - Category cross-sells: are co-purchases mostly in the same category or across categories? Run a few queries and create charts. Export the top pairs and verify if they make commercial sense. If a weird pairing shows up, investigate data quality (returns, test orders).2) Create a recommendation algorithm based on order history
We’ll start with a simple, effective algorithm: use co-purchase counts and normalize by product popularity to get a score. Two common scores: - Normalized co-purchase frequency: count(product A with B) / count(product A) - Pointwise Mutual Information (PMI) approximation (measures association beyond chance) I’ll show a practical implementation for both. The first is easier and often enough for cross-sells.Algorithm 1 — Conditional frequency (easy & interpretable)
score(A→B) = co_count(A,B) / total_orders_with_A This is intuitive: of orders containing A, how many contain B? You can also compute symmetric scores if needed. SQL to compute conditional frequency for the last 12 months:WITH co AS (
SELECT
oi1.product_id AS product_id,
oi2.product_id AS co_product_id,
COUNT(DISTINCT oi1.order_id) AS co_count
FROM sales_order_item oi1
JOIN sales_order_item oi2
ON oi1.order_id = oi2.order_id
AND oi1.product_id <> oi2.product_id
WHERE oi1.parent_item_id IS NULL
AND oi2.parent_item_id IS NULL
AND oi1.created_at >= DATE_SUB(NOW(), INTERVAL 12 MONTH)
GROUP BY oi1.product_id, oi2.product_id
), totals AS (
SELECT product_id, COUNT(DISTINCT order_id) AS total_order_count
FROM sales_order_item
WHERE parent_item_id IS NULL
AND created_at >= DATE_SUB(NOW(), INTERVAL 12 MONTH)
GROUP BY product_id
)
SELECT
c.product_id,
c.co_product_id,
c.co_count,
t.total_order_count,
(c.co_count / t.total_order_count) AS score
FROM co c
JOIN totals t ON t.product_id = c.product_id
ORDER BY score DESC
LIMIT 100;
This yields a score between 0 and 1. You can threshold it to avoid showing weak pairs.
Algorithm 2 — Simple PMI-ish score
PMI = log(P(A,B) / (P(A) * P(B))). In practice, we use counts and logs to avoid tiny probabilities. PMI surfaces surprising associations not explained purely by popularity. Pseudo-SQL outline (simplified):-- compute counts: co_count, count_a, count_b, total_orders
-- then compute pmi = log((co_count / total_orders) / ((count_a/total_orders) * (count_b/total_orders)))
PMI is more computationally heavy, but good if you want unusual pairings.
From scores to recommendations
For each product A, sort candidate B by score descending and keep top N (e.g., 10). You can apply business rules: - Exclude items without stock - Exclude the same SKU - Promote higher-margin items by multiplying the score by margin factor - Exclude items with zero visibility or disabled status3) Implementing the engine: Magento 2 module walkthrough
I’ll show a minimal module structure and a service that builds recommendations nightly. Module files (high-level): - app/code/Magefine/CrossSell/registration.php - app/code/Magefine/CrossSell/etc/module.xml - app/code/Magefine/CrossSell/etc/crontab.xml - app/code/Magefine/CrossSell/Setup/Patch/Schema/CreateRecommendationTable.php - app/code/Magefine/CrossSell/Model/RecommendationBuilder.php - app/code/Magefine/CrossSell/Console/Command/BuildRecommendations.php (optional) Example registration.php:<?php
use Magento\Framework\Component\ComponentRegistrar;
ComponentRegistrar::register(
ComponentRegistrar::MODULE,
'Magefine_CrossSell',
__DIR__
);
module.xml:
<?xml version="1.0"?>
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="urn:magento:framework:Module/etc/module.xsd">
<module name="Magefine_CrossSell" setup_version="1.0.0" />
</config>
crontab.xml (run nightly):
<?xml version="1.0"?>
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="urn:magento:framework:Module/etc/crontab.xsd">
<group id="default">
<job name="magefine_crosssell_build" instance="Magefine\CrossSell\Model\RecommendationBuilder" method="execute">
<schedule>0 3 * * *</schedule>
</job>
</group>
</config>
CreateRecommendationTable patch (simplified):
<?php
namespace Magefine\CrossSell\Setup\Patch\Schema;
use Magento\Framework\Setup\ModuleDataSetupInterface;
use Magento\Framework\Setup\Patch\PatchInterface;
use Magento\Framework\Setup\Patch\SchemaPatchInterface;
class CreateRecommendationTable implements SchemaPatchInterface
{
private $moduleDataSetup;
public function __construct(ModuleDataSetupInterface $moduleDataSetup)
{
$this->moduleDataSetup = $moduleDataSetup;
}
public function apply()
{
$installer = $this->moduleDataSetup;
$installer->getConnection()->startSetup();
if (!$installer->getConnection()->isTableExists($installer->getTable('magefine_cross_sell_recommendations'))) {
$table = $installer->getConnection()
->newTable($installer->getTable('magefine_cross_sell_recommendations'))
->addColumn('id', \Magento\Framework\DB\Ddl\Table::TYPE_INTEGER, null, ['identity' => true, 'unsigned' => true, 'nullable' => false, 'primary' => true], 'ID')
->addColumn('product_id', \Magento\Framework\DB\Ddl\Table::TYPE_INTEGER, null, ['unsigned' => true, 'nullable' => false], 'Product ID')
->addColumn('linked_product_id', \Magento\Framework\DB\Ddl\Table::TYPE_INTEGER, null, ['unsigned' => true, 'nullable' => false], 'Linked Product ID')
->addColumn('score', \Magento\Framework\DB\Ddl\Table::TYPE_DECIMAL, '12,4', ['nullable' => false, 'default' => '0.0000'], 'Score')
->addIndex($installer->getIdxName('magefine_cross_sell_recommendations', ['product_id']), ['product_id'])
->setComment('Magefine Cross-sell Recommendations');
$installer->getConnection()->createTable($table);
}
$installer->getConnection()->endSetup();
}
public static function getDependencies()
{
return [];
}
public function getAliases()
{
return [];
}
}
RecommendationBuilder.php (core logic)
This class runs the SQL aggregation and writes top N recommendations to your table.<?php
namespace Magefine\CrossSell\Model;
use Magento\Framework\App\ResourceConnection;
use Psr\Log\LoggerInterface;
class RecommendationBuilder
{
private $resource;
private $logger;
public function __construct(ResourceConnection $resource, LoggerInterface $logger)
{
$this->resource = $resource;
$this->logger = $logger;
}
public function execute()
{
$connection = $this->resource->getConnection();
$table = $connection->getTableName('sales_order_item');
$recTable = $connection->getTableName('magefine_cross_sell_recommendations');
// 1) compute co-purchase counts in SQL
$sql = "SELECT oi1.product_id as product_id, oi2.product_id as co_product_id, COUNT(DISTINCT oi1.order_id) as co_count
FROM {$table} oi1
JOIN {$table} oi2 ON oi1.order_id = oi2.order_id AND oi1.product_id <> oi2.product_id
WHERE oi1.parent_item_id IS NULL AND oi2.parent_item_id IS NULL
/* and oi1.created_at >= DATE_SUB(NOW(), INTERVAL 12 MONTH) */
GROUP BY oi1.product_id, oi2.product_id";
$rows = $connection->fetchAll($sql);
// 2) compute totals per product
$totalsSql = "SELECT product_id, COUNT(DISTINCT order_id) as total_count FROM {$table} WHERE parent_item_id IS NULL GROUP BY product_id";
$totals = $connection->fetchAll($totalsSql);
$totalMap = [];
foreach ($totals as $t) {
$totalMap[$t['product_id']] = (int)$t['total_count'];
}
// prepare insert batch
$connection->delete($recTable); // replace old recommendations
$insertData = [];
// compute conditional score
foreach ($rows as $r) {
$pid = (int)$r['product_id'];
$cid = (int)$r['co_product_id'];
$coCount = (int)$r['co_count'];
if (!isset($totalMap[$pid]) || $totalMap[$pid] == 0) continue;
$score = $coCount / $totalMap[$pid];
// keep threshold
if ($score < 0.01) continue; // configurable
$insertData[] = ['product_id' => $pid, 'linked_product_id' => $cid, 'score' => $score];
if (count($insertData) >= 1000) {
$connection->insertMultiple($recTable, $insertData);
$insertData = [];
}
}
if (count($insertData)) {
$connection->insertMultiple($recTable, $insertData);
}
$this->logger->info('Magefine CrossSell: recommendations rebuilt.');
}
}
Notes:
- This naive builder reads aggregated rows into memory — it’s fine for mid-size catalogs, but we’ll optimize below for very large datasets.
- You may want to run this as a CLI command first to test before enabling the cron.
4) Integrate with Magento 2 native cross-sell features
There are two main ways to surface recommendations in the storefront: 1. Push recommendations into Magento product links (native cross-sell), so the standard block shows them. 2. Build a custom block that reads magefine_cross_sell_recommendations and renders cards. Pushing into product links is nice because it integrates with existing blocks. I’ll show how to update product links programmatically.Write recommendations into product links
You can use the ProductRepository and ProductLink management to create cross-sell links. Example snippet:// inside a service that has ProductRepositoryInterface $productRepository
// and \Magento\Catalog\Api\Data\ProductLinkInterfaceFactory $productLinkFactory
$product = $this->productRepository->getById($productId);
$links = $product->getProductLinks() ?: [];
$link = $this->productLinkFactory->create();
$link->setSku($product->getSku());
$link->setLinkedProductSku($linkedSku); // linked product SKU
$link->setLinkType('crosssell');
$link->setPosition(0);
$links[] = $link;
$product->setProductLinks($links);
$this->productRepository->save($product);
Important:
- Work in batches and respect indexing and cache. Saving full Product objects frequently can be slow.
- For big catalogs, prefer a direct insert into catalog_product_link table using resource model to avoid loading product objects repeatedly. Use Magento APIs where possible for consistency.
Alternative: custom block reading recommendations
Custom block pros: - Faster for reads (you control queries) - No heavy writes into product tables - Easier to A/B test — show magefine list for variant A or native cross-sell for variant B A simple block template fetches top N entries from magefine_cross_sell_recommendations for current product, joins to catalog_product_entity for SKU, and uses product repository for rendering minimal data (or uses the product collection with minimal attributes).5) Performance: processing large volumes of orders and many SKUs
If you have a big store (100k+ SKUs, millions of orders), the naive SQL + PHP loop will not scale. Here are production tips. 1) Use SQL aggregation as much as possible - The single SQL join we used is powerful: let MySQL/Percona do the heavy lifting. - Add proper indexes: sales_order_item(order_id, product_id, created_at, parent_item_id). 2) Windowed processing and time-based rollups - Compute recommendations on a rolling window (last 90 days) to keep the dataset manageable and relevant. - Use incremental aggregation: process only new orders since last run, update co-purchase counters, and re-score. 3) Batch inserts and avoid loading full product models - Use insertMultiple on the resource connection. - Avoid productRepository->save in the main loop; batch later or use direct DB writes. 4) Precompute and cache results - Store recommendations in a table and cache the UI output in Redis. - Use Varnish for full page caching; set ESI for blocks that need dynamic recommendations per product. 5) Use message queues for heavy work - For very big workloads, split the job into multiple workers via message queue (RabbitMQ). Each worker processes a subset of products and writes partial results. 6) Use approximate counting for very high volume - Use HyperLogLog for distinct counts in streaming scenarios; for cross-sell you usually need exact counts, but approximate counters can be a pragmatic tradeoff. 7) Use read replicas for heavy analytic queries - Route the aggregation reads to a replica to avoid impacting the write DB.6) A/B testing and measuring impact on AOV
You’ll want to validate that the engine actually increases Average Order Value (AOV) and conversion rate. Here’s a simple A/B plan. Test design: - Divide traffic randomly: Control (existing cross-sell) vs Variant (data-driven magefine cross-sell) - 1:1 split or 40:60 if you’re cautious - Run until you reach statistical significance on primary metric: AOV or revenue per session Metrics to track: - Average Order Value (AOV) - Conversion Rate (sessions → orders) - Attach rate: % of orders where a recommended item was added - Revenue uplift and statistical significance (t-test or Bayesian) Implementation approaches: Option A: Client-side experiment - Serve variant decisions in JavaScript (feature flag + API that returns recommendations) - Easy to implement but can cause flicker if not handled well Option B: Server-side experiment in Magento - Use a small middleware that picks variant and renders the appropriate block - Record which variant the user saw in a cookie and persist in order (so you can attribute revenue) Instrumentation: always record exposure and action with a small event table, for example magefine_cross_sell_events with columns (session_id, product_id, shown_variant, clicked_product_id, added_to_cart, order_id, timestamp). Recording exposures & conversions (simple approach): - On product page render, fire a server-side event or enqueue a small JS beacon: {product_id, variant} - On add-to-cart, record whether the added SKU came from recommendation - On order completion, connect the order to the variant cookie and compute AOVMeasuring results and sample size
- Calculate required sample size based on baseline AOV and expected uplift. - Avoid running underpowered tests — small traffic leads to false negatives. - Use daily aggregation and don’t stop early; watch for seasonality.7) Advanced ideas and improvements
Once the simple engine proves value, consider: - Time decay: weight recent co-purchases more than old ones - Personalized recommendations: incorporate user purchase history or segments - Session-based recommendations: use last viewed / last added to cart to rank - Use machine learning (collaborative filtering, matrix factorization, or modern embeddings) for better accuracy - Use Elasticsearch or Redis for fast lookups and similarity queries - Promote higher margin items or items with inventory constraints8) Example: End-to-end walkthrough (practical run)
Here’s a compact, runnable flow you can test:
1. Create the recommendation table (apply the schema patch above).
2. Run RecommendationBuilder execute() from CLI to generate recommendations locally.
3. Inspect top recommendations via SQL:
SELECT r.product_id, r.linked_product_id, r.score, p.sku
FROM magefine_cross_sell_recommendations r
JOIN catalog_product_entity p ON p.entity_id = r.linked_product_id
WHERE r.product_id = 12345
ORDER BY r.score DESC
LIMIT 10;
4. Build a small block that queries the table and renders the SKUs as links with thumbnails. Use catalog/product repository or product image helper to render images.
5. Implement a simple A/B test: variant A shows existing Magento cross-sell, variant B shows magefine recommendations. Store the variant in a cookie and record it with the order.
9) Security, privacy and compliance
- Personalization may use user identifiers; respect GDPR and local privacy rules.
- Keep PII out of analytic exports unless you have consent.
- If you export data to third-party ML services, ensure secure connections and contracts.
10) Operational checklist before going live
- Validate recommendations manually for the top 1,000 products
- Make sure no disabled or out-of-stock items are shown (or mark them)
- Create a cleanup job to remove stale records
- Add monitoring: job success/failure logs, number of recommendations generated, average score
- Monitor business KPIs after deployment: AOV, attach rate, returns
Summary and final tips
- Start simple: conditional frequency is robust and interpretable.
- Use SQL aggregation; push heavy work to the DB where possible.
- Store recommendations in a table for fast reads and simple A/B testing.
- Integrate with Magento natively if you want low-friction display; use custom block for speed and flexibility.
- Scale with batching, message queues, and read replicas when needed.
- Measure with carefully designed A/B tests and track AOV, conversion, and attach rates.
If you’re running Magento on magefine.com hosting or using a Magefine extension, this architecture fits nicely: keep the recommendation build as a nightly job (or asynchronous workers) and use Redis/Varnish to keep storefront latency low.
Appendix: Useful SQL & code snippets
1) Fast co-purchase counts using dates and indexing
SELECT
oi1.product_id,
oi2.product_id AS co_product,
COUNT(DISTINCT oi1.order_id) AS co_count
FROM sales_order_item oi1
JOIN sales_order_item oi2
ON oi1.order_id = oi2.order_id AND oi1.product_id <> oi2.product_id
WHERE oi1.parent_item_id IS NULL
AND oi2.parent_item_id IS NULL
AND oi1.created_at >= '2024-01-01' -- window
GROUP BY oi1.product_id, oi2.product_id
HAVING co_count >= 3; -- threshold
2) Batch update to product link table (direct DB, use carefully)
INSERT INTO catalog_product_link (product_id, linked_product_id, link_type_id)
VALUES (123, 456, (SELECT link_type_id FROM catalog_product_link_type WHERE code = 'crosssell'))
ON DUPLICATE KEY UPDATE product_id = product_id;
3) Minimal JS beacon to record exposures (for experiments)
fetch('/rest/V1/magefine/crosssell/expose', {
method: 'POST',
credentials: 'same-origin',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({product_id: 12345, variant: 'magefine_v2'})
});
Closing note
Building a custom cross-sell engine is one of the highest-impact engineering investments for an e-commerce store: it combines relatively simple data work with measurable revenue uplift. Start with a conservative rollout, monitor AOV and attach rates, and iterate.
If you want, I can:
- Provide a full ready-to-install Magefine_CrossSell module with unit tests
- Help tune the scoring function for your catalog
- Draft an A/B test plan and sample-size calculator for your traffic levels
Drop me details about your catalog size and traffic and I’ll adapt the code to your scale.
SELECT r.product_id, r.linked_product_id, r.score, p.sku
FROM magefine_cross_sell_recommendations r
JOIN catalog_product_entity p ON p.entity_id = r.linked_product_id
WHERE r.product_id = 12345
ORDER BY r.score DESC
LIMIT 10;
SELECT
oi1.product_id,
oi2.product_id AS co_product,
COUNT(DISTINCT oi1.order_id) AS co_count
FROM sales_order_item oi1
JOIN sales_order_item oi2
ON oi1.order_id = oi2.order_id AND oi1.product_id <> oi2.product_id
WHERE oi1.parent_item_id IS NULL
AND oi2.parent_item_id IS NULL
AND oi1.created_at >= '2024-01-01' -- window
GROUP BY oi1.product_id, oi2.product_id
HAVING co_count >= 3; -- threshold
INSERT INTO catalog_product_link (product_id, linked_product_id, link_type_id)
VALUES (123, 456, (SELECT link_type_id FROM catalog_product_link_type WHERE code = 'crosssell'))
ON DUPLICATE KEY UPDATE product_id = product_id;
fetch('/rest/V1/magefine/crosssell/expose', {
method: 'POST',
credentials: 'same-origin',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({product_id: 12345, variant: 'magefine_v2'})
});




