Material Data De-Duplication

Product Data DeDuplication has been a crux work area at SoftNis. We identify and remove redundant products based on similar manufacturer’s name and part numbers or may be different manufacturers with similar functionality.

Data Deduplication Solutions

Product Data DeDuplication is a process which is conducted before and after cleansing the product description. In some cases duplicates may be identified without cleansing the item’s description and in other cases it is necessary to cleanse the item’s description before identifying duplicate entries.

Duplicate Items before cleansing

Duplicate Items
Stock Number 4B8938A7 7V0008D7
Vendor Name     NORTON Nort
ITEM CODE 66252940863 66252940863
Description 1 GW 7.0  7 X 1/2 X 1 ¼ St Wheel grinding
Bin Location M 305 L 206
Stock (Each) 12 15


Duplicate Items after cleansing

In Some cases, Duplicate items are identified after cleansing the item’s description, like manufacturer Name, Part Number and Description.

Before Cleansing
Stock Number 6U4981U8 2C83679G
Vendor Number Smith-co Smith
ITEM CODE nill 02939122Q
Description 1 pvc BL VLV sip 029-3912-2Q, 2y Valv Bal
Bin Location A 12 V 06
Stock (Each) 120 42


Cleansed Item Description

For Example:

After Cleansed,  Duplicate Items
Stock Number 6U4981U8 2C83679G
Normalized Vendor Name Smith-Cooper International Smith-Cooper International
Product Ball Valve Ball Valve
Cleansed Vendor Part Number 02939122Q 02939122Q
Cleansed Description Ball Valve, PVC, 3 Inch, Slip-On, 2-Way Ball Valve
Bin Location A 12 V 06
Stock (Each) 120 42