Read More.

Call for Paper - January – 2025 Edition   

IJATCA solicits original research papers for the January – 2025 Edition.
Last date of manuscript submission is January 30, 2025.

                                                   

Enhancing Pharmaceutical Quality and Safety: Leveraging Machine Learning and Statistical Techniques for Optimal Threshold Value Determination in Medicinal Batches


Volume: 9 Issue: 2
Year of Publication: 2023
Authors: Dr. Rajat



Abstract

The primary aim of this research paper was to determine optimal threshold values for vital parameters in different medicinal batches, guaranteeing high standards of quality and safety. To accomplish this objective, the study harnessed the combined potential of state-of-the-art machine learning algorithms, linear regression models, rigorous statistical techniques like ETL and Airflow. Most common algorithms in both statistics and machine learning is linear regression. By leveraging these advanced data processing methodologies, the research looks forward to enhance the pharmaceutical industry\"s ability to assess and maintain the quality and safety of medicines effectively. The ETL process starts with extracting data from Hive, that offers efficient storage and processing capabilities, making it an ideal source for data extraction. The extracted data is then transformed using ML and data analysis techniques. The transformation logic is implemented using Jupyter Notebooks, it provides an interactive environment for developing and executing code, making it easy to apply ML algorithms and data manipulation techniques. After the data has been transformed it is loaded back into PostgreSQL, a powerful and scalable relational database management system that provides robust data storage and querying capabilities, making it an ideal destination for the transformed data. The loaded data is organized within PostgreSQL tables. The transformed data stored in PostgreSQL can then be used by the final product, which could be a web application, a reporting dashboard, or any other system that requires access to the process an enriched data. These tools enabled for formation of threshold values for parameters of different medicines with high accuracy by efficient data processing, analysis, and visualization, allowing users to make data-driven decisions and gain insights from the transformed data.

References

  1. M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science (1979), vol. 349, no. 6245, pp. 255–260, 2015.

  2. G. Bonaccorso, Machine learning algorithms. Packt Publishing Ltd, 2017.

  3. S. Rong and Z. Bao-Wen, “The research of regression model in machine learning field,” in MATEC Web of Conferences, EDP Sciences, 2018, p. 01033.

  4. D. Maulud and A. M. Abdulazeez, “A review on linear regression comprehensive in machine learning,” Journal of Applied Science and Technology Trends, vol. 1, no. 4, pp. 140–147, 2020.

  5. D. Seenivasan, “ETL (Extract, Transform, Load) Best Practices,” International Journal of Computer Trends and Technology, vol. 71, no. 1, pp. 40–44, 2023.

  6. A. Raj, J. Bosch, H. H. Olsson, and T. J. Wang, “Modelling data pipelines,” in 2020 46th Euromicro conference on software engineering and advanced applications (SEAA), IEEE, 2020, pp. 13–20.

  7. A. Thusoo et al., “Hive: a warehousing solution over a map-reduce framework,” Proceedings of the VLDB Endowment, vol. 2, no. 2, pp. 1626–1629, 2009.

  8. S. P. Reiss, “Bee/hive: A software visualization back end,” in IEEE Workshop on Software Visualization, 2001, pp. 44–48.

  9. E. Costa, C. Costa, and M. Y. Santos, “Efficient big data modelling and organization for hadoop hive-based data warehouses,” in European, Mediterranean, and Middle Eastern Conference on Information Systems, Springer, 2017, pp. 3–16.

  10. A. Cardoso, J. Leitão, and C. Teixeira, “Using the Jupyter notebook as a tool to support the teaching and learning processes in engineering courses,” in The Challenges of the Digital Transformation in Education: Proceedings of the 21st International Conference on Interactive Collaborative Learning (ICL2018)-Volume 2, Springer, 2019, pp. 227–236.

  11. D. J. B. Clarke et al., “Appyters: Turning Jupyter Notebooks into data-driven web apps,” Patterns, vol. 2, no. 3, 2021.

  12. I. Drosos, T. Barik, P. J. Guo, R. DeLine, and S. Gulwani, “Wrex: A unified programming-by-example interaction for synthesizing readable code for data scientists,” in Proceedings of the 2020 CHI conference on human factors in computing systems, 2020, pp. 1–12.

  13. Z. Aftab, W. Iqbal, K. M. Almustafa, F. Bukhari, and M. Abdullah, “Automatic NoSQL to relational database transformation with dynamic schema mapping,” Sci Program, vol. 2020, pp. 1–13, 2020.

  14. ?. B. Co?kun, A. Çak?r, and B. Anbaro?lu, “Performance matters on identification of origin-destination matrix on big geospatial data,” The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 43, pp. 449–454, 2020.

  15. K. C. Mondal, N. Biswas, and S. Saha, “Role of machine learning in ETL automation,” in Proceedings of the 21st international conference on distributed computing and networking, 2020, pp. 1–6.

  16. S. Günnemann, “Machine learning meets databases,” Datenbank-Spektrum, vol. 17, no. 1, pp. 77–83, 2017.

  17. T. J. Walsh, I. Szita, C. Diuk, and M. L. Littman, “Exploring compact reinforcement-learning representations with linear regression,” arXiv preprint arXiv:1205.2606, 2012.

  18. P. Vassiliadis, “A survey of extract–transform–load technology,” International Journal of Data Warehousing and Mining (IJDWM), vol. 5, no. 3, pp. 1–27, 2009.

  19. G. G. W. Mhon and N. S. M. Kham, “ETL preprocessing with multiple data sources for academic data analysis,” in 2020 IEEE Conference on Computer Applications (ICCA), IEEE, 2020, pp. 1–5.

  20. A. Rasool, R. Tao, K. Kashif, W. Khan, P. Agbedanu, and N. Choudhry, “Statistic Solution for Machine Learning to Analyze Heart Disease Data,” in Proceedings of the 2020 12th International Conference on Machine Learning and Computing, 2020, pp. 134–139.

  21. A. C. Onal, O. B. Sezer, M. Ozbayoglu, and E. Dogdu, “Weather data analysis and sensor fault detection using an extended IoT framework with semantics, big data, and machine learning,” in 2017 IEEE International Conference on Big Data (Big Data), IEEE, 2017, pp. 2037–2046.

  22. H. Morris et al., “Bringing Business Objects into Extract-Transform-Load (ETL) Technology,” in 2008 IEEE International Conference on e-Business Engineering, IEEE, 2008, pp. 709–714.

  23. M. Bala, O. Boussaid, and Z. Alimazighi, “Big-ETL: extracting-transforming-loading approach for Big Data,” in International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), 2015, pp. 1–4.

  24. M. Masson, C. Cayèré, M.-N. Bessagnet, C. Sallaberry, P. Roose, and C. Faucher, “An ETL-like platform for the processing of mobility data,” in Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, 2022, pp. 547–555.

Keywords

Machine Learning, Artificial Intelligence, Linear regression, Pharmaceutical batch process, Threshold value, Data Transformation




© 2025 International Journal of Advanced Trends in Computer Applications
Foundation of Computer Applications (FCA), All right reserved.
Vision & Mission | Privacy Policy | Terms and Conditions