1. bookVolume 37 (2021): Issue 2 (June 2021)
    Special Issue on New Techniques and Technologies for Statistics
Journal Details
License
Format
Journal
First Published
01 Oct 2013
Publication timeframe
4 times per year
Languages
English
access type Open Access

Applying Machine Learning for Automatic Product Categorization

Published Online: 22 Jun 2021
Page range: 395 - 410
Received: 01 May 2019
Accepted: 01 Mar 2020
Journal Details
License
Format
Journal
First Published
01 Oct 2013
Publication timeframe
4 times per year
Languages
English
Abstract

Every five years, the U.S. Census Bureau conducts the Economic Census, the official count of US businesses and the most extensive collection of data related to business activity. Businesses, policymakers, governments and communities use Economic Census data for economic development, business decisions, and strategic planning. The Economic Census provides key inputs for economic measures such as the Gross Domestic Product and the Producer Price Index. The Economic Census requires businesses to fill out a lengthy questionnaire, including an extended section about the goods and services provided by the business.

To address the challenges of high respondent burden and low survey response rates, we devised a strategy to automatically classify goods and services based on product information provided by the business. We asked several businesses to provide a spreadsheet containing Universal Product Codes and associated text descriptions for the products they sell. We then used natural language processing to classify the products according to the North American Product Classification System. This novel strategy classified text with very high accuracy rates - our best algorithms surpassed over 90%.

Keywords

Agarwal, D., B. Long, J. Traupman, D. Xin, and L. Zhang. 2015. “LASER: A Scalable Response Prediction Platform for Online Advertising.” In Proceedings of the 7th ACM International Conference on Web Search and Data Mining. February 2014, New York, NY, USA: 173–182. DOI: https://doi.org/10.1145/2556195.2556252.Search in Google Scholar

Bahassine, S., A. Madani, M. Al-Sarem, and M. Kissi. 2018. “Feature selection using an improved Chi-square for Arabic text classification.” Journal of King Saud University-Computer and Information Sciences: 1319–1578. DOI: https://doi.org/10.1016/j.jksuci.2018.05.010.Search in Google Scholar

Bast, H., and D. Majumdar. 2005. “Why spectral retrieval works.” In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 2005, Salvador, Brazil: 11–18. DOI: http://doi.acm.org.ezproxy.lib.unimelb.edu.au/10.1145/1076034.1076040.Search in Google Scholar

Blanz, V., B. Scholokopf, H. Bulthoff, C. Burges, V.N. Vapnik, and T. Vetter. 1996. “Comparison of view-based object recognition algorithms using realistic 3D models.” In Proceedings of International Conference on Artificial Neural Networks-ICNN’96, July 1996, Berlin, Germany: 251–256. DOI: https://doi.org/10.1007/3-540-61510-5_45.Search in Google Scholar

Chen, J., and D. Warren. 2013. “Cost-sensitive learning for large-scale hierarchical classification of commercial products,” In Proceedings of 22nd Conference on Information and Knowledge Management (CIKM2013) 1351–1360. DOI: https://doi.org/10.1145/2505515.2505582.Search in Google Scholar

Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2019. “Bert: Pretraining of deep bidirectional transformers for language understanding.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, June 2019, Minneapolis, USA: 4171–4186. DOI: https://www.aclweb.org/anthology/N19-1423/.Search in Google Scholar

Eyheramendy, S., D. Lewis, and D. Madigan. 2003. “On the Naı¨ve Bayes model for text categorization.” In Proceedings of the 9th Workshop on Artificial Intelligence, January 2003, Key West, USA: 93–100. Available at: http://proceedings.mlr.press/r4/eyheramendy03a/eyheramendy03a.pdf.Search in Google Scholar

Grandini M., E. Bagli, and G. Visani. 2020. “Metrics for Multi-Class Classification: an Overview.” Available at: arXiv preprint arXiv:2008.05756.Search in Google Scholar

Guenther, N., and M. Schonlau. 2016. “Support vector machines.” Stata Journal 16: 917–937. DOI: https://doi.org/10.1177/1536867X1601600407.Search in Google Scholar

Howard, J., and S. Ruder. 2018. “Universal Language Model Fine-tuning for Text Classification.” In Proceedings of ACL.Search in Google Scholar

Ikonomakis, M., S. Kotsiantis, and V. Tampakas. 2005. “Text classification using machine learning techniques.” WSEAS Transactions on Computers 8(4): 966–974. DOI: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.95.9153&rep=rep1&type=pdf.Search in Google Scholar

Joachims, T. 1998. “Text categorization with Support Vector Machines: Learning with many relevant features.” In: Machine Learning: Lecture Notes in Computer Science.vol 1398. Springer: Berlin and Heidelberg.Search in Google Scholar

Joachims, T. 2001. “A statistical Learning model of text classification for Support Vector Machines.” In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval: SIGIR 2001, September 9–13, 2001, New Orleans, LA, USA: 128–136.Search in Google Scholar

Kozareva, Z. 2015. “Everyone likes shopping! Multi-class product categorization for e-commerce.” In Proceedings of Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL: 1329–1333. DOI: https://doi.org/10.3115/v1/N15-1147.Search in Google Scholar

Lee, Y., A. Scholari, B. Chun, D. Santambrogio, M. Weimer, and M. Interlandi. 2018. “PRETZEL: Opening the black box of Machine Learning Prediction Serving.” 13th USENIX Symposium on Operating Systems Design and Implementation. October 2018, Berkeley, USA: 611–626. DOI: https://dl.acm.org/doi/10.5555/3291168.3291213.Search in Google Scholar

Malte, A., and P. Ratadiya. 2019. “Evolution of transfer learning in natural language processing.” Available at: arXiv preprint arXiv:1910.07370 [cs.CL] Accessed 20 August 2019.Search in Google Scholar

Masood, A., and A. Al-Jumaily. 2013. “Computer aided diagnostic support system for skin cancer: a review of techniques and algorithms.” International Journal of Biomedical Imaging: 1–22. DOI: https://www.hindawi.com/journals/ijbi/2013/323268/.Search in Google Scholar

Moreno, P., and P. Ho. 2003. “A New SVM Approach to Speaker Identification and Verification Using Probabilistic Distance Kernels.” In Eurospeech: 2965–2968. DOI: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.2378&rep=rep1&type=pdf.Search in Google Scholar

Osuna, E., R. Freund, and F. Girosi. 1997. “Support Vector Machines: Training and Applications”, A.I. Memo No. 1602, Artificial Intelligence Laboratory, MIT. DOI: https://ieeexplore.ieee.org/document/622408.Search in Google Scholar

Pagliardini, M., P. Gupta, and M. Jaggi. 2018. “Unsupervised learning of sentence embeddings using compositional n-gram features.” Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, June 2018, New Orleans, USA : 528–540. DOI: https://www.aclweb.org/anthology/N18-1049.Search in Google Scholar

Radford, A., K. Narasimhan, T. Salimans, and I. Sutskever. 2018. “Improving language understanding by generative pre-training.” In The Handbook of Contemporary Semantic Theory. editor: S. Lappin. Blackwell, Cambridge MA & Oxford.Search in Google Scholar

Roberts, C. 1996. “Anaphora in Intensional Contexts.” In The Handbook of Contemporary Semantic Theory, edited by S. Lappin. Cambridge MA & Oxford: Blackwell.Search in Google Scholar

Schmidt, M., and H. Gish. 1996. “Speaker identification via support vector classifiers.” In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, May 1996, Washington, USA: 105–108. DOI: https://ieeexplore.ieee.org/document/540301.Search in Google Scholar

Sebastiani, F. 2002. “Machine Learning in Automated Text Categorization.” ACM Computing Surveys 34: 1–47. DOI: https://doi.org/10.1145/505282.505283.Search in Google Scholar

Shoker, L., S. Sanei, and J. Chambers. 2005. “Artifact removal from electroencephalograms using a hybrid BSS-SVM algorithm.” In IEEE Signal Processing Letters: 721–724. DOI: https://doi.org/10.1109/LSP.2005.855539.Search in Google Scholar

Thompson, K., and Y. Ellis. 2015. Exploratory Data Analysis of Economic Census Products: Methods and Results. In JSM Proceedings, Survey Research Methods Section, American Statistical Association, Seattle, WA, August 7–13. Alexandria, USA: 828–842. DOI: http://www.asasrms.org/Proceedings/y2015/files/233942.pdf.Search in Google Scholar

Tong, S., and D. Koller. 2002. “Support vector machine active learning with applications to text classification.” In The Journal of Machine Learning Research (Volume 2): 45–46. DOI: https://doi.org/10.1162/153244302760185243.Search in Google Scholar

Vapnik, V.N. 2000. The Nature of Statistical Learning Theory, (2nd edition). New York: Springer.Search in Google Scholar

Wang, S., and C.D. Manning. 2012. “Baselines and bigrams. Simple, good sentiment and topic classification.” In Proceedings of the 50th Annual Meeting of the Association for Computatioral Linguistics (Volume 2: Short Papers), Association for Computational Linguistics: 90–94.Search in Google Scholar

Zafrir, O., G. Boudoukh, P. Izsak, and M. Wasserblat. 2019. “Q8BERT. Quantized 8Bit BERT.” Available at: arXiv:191O.O6188 [cs.CL]. Accessed 30 November 2019.Search in Google Scholar

Zhang, J., R. Jin, Y. Yang, and A.G. Hauptmann. 2003. “Modified Logistic Regression. An Approximation to SVM and Its Applications in Large-Scale Text Categorization,” In Proceedings of the 20th International Conference on Machine Learning. August 21–24, Washington D.C., USA: 888–895. Available at: arXiv:1910.06188 [cs.CL] (accessed 30 November 2019).Search in Google Scholar

Zhang, M., G. Johnson, and J. Wang. 2011. “Predicting Takeover Success Using ML Techniques.” Journal of Business & Economics Research (JBER).Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo