Let's continue introducing you to our new project - product matching assisted by Machine Learning (ML).
Before kicking the project off, we had to make some really important decisions regarding the project scope.
Of course, one would like his solution to be as broadly applicable as possible.
Language-specific model would probably be more precise but would require training for each language individually.
And, preparing a training set, as you will see is a very difficult one
As Price2Spy has clients from literally all over the world, we would need to cover at least 15 different languages, and some of them written in non-Latin scripts.
Price2Spy works with over 25 different industries. Preparing 25 training sets to build 25 different ML models seemed like a nightmare.
Again, the industry-specific model would probably be more precise but would require training for each language individually. And, preparing a training set which is representative enough, as you will see is a very difficult one
One thing that we have learned in 9 years in this business is that a wrong match is something that we cannot afford to have in Price2Spy. Wrong match => Wrong pricing decision. Our customers cannot have that => we cannot have that!
99% matching accuracy is not sufficient. Even if it’s only 1% of wrong matches – how can the client know which 1% is wrong?
So, we have our 3 key ML matching decisions. On to the next task – preparing the training set!