{"id":3880,"date":"2025-12-15T11:21:15","date_gmt":"2025-12-15T11:21:15","guid":{"rendered":"https:\/\/jita-au.com\/?p=3880"},"modified":"2025-12-15T14:02:18","modified_gmt":"2025-12-15T14:02:18","slug":"an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era","status":"publish","type":"post","link":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/","title":{"rendered":"An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"3880\" class=\"elementor elementor-3880\" data-elementor-settings=\"{&quot;ha_cmc_init_switcher&quot;:&quot;no&quot;}\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-e232701 elementor-section-height-min-height elementor-section-full_width elementor-hidden-tablet elementor-hidden-mobile elementor-section-height-default elementor-section-items-middle wpr-particle-no wpr-jarallax-no wpr-parallax-no wpr-sticky-section-no\" data-id=\"e232701\" data-element_type=\"section\" data-settings=\"{&quot;background_background&quot;:&quot;classic&quot;,&quot;_ha_eqh_enable&quot;:false}\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-cb97594\" data-id=\"cb97594\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a18c791 elementor-widget__width-inherit ha-has-bg-overlay elementor-widget elementor-widget-heading\" data-id=\"a18c791\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Vol. 15 No. 2 (2025): JITA - APEIRON<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b2fff26 elementor-widget elementor-widget-heading\" data-id=\"b2fff26\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><b>Boris Damjanovi\u0107, Dragan Kora\u0107, Dejan Simi\u0107, Negovan Stamenkovi\u0107<b><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-52044bb elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"52044bb\" data-element_type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1c82606 elementor-widget elementor-widget-heading\" data-id=\"1c82606\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c297c9e elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"c297c9e\" data-element_type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b6ebd13 elementor-widget__width-inherit elementor-widget elementor-widget-heading\" data-id=\"b6ebd13\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Review paper<br>DOI: https:\/\/doi.org\/10.7251\/JIT2502145D<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2f3b63f elementor-align-center elementor-widget elementor-widget-button\" data-id=\"2f3b63f\" data-element_type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/jita-au.com\/wp-content\/uploads\/2025\/12\/Pages-from-JITA_Vol-15_Issue-2-WEB-7.pdf\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Download Article PDF<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cb7c4cb elementor-widget__width-inherit elementor-widget elementor-widget-heading\" data-id=\"cb7c4cb\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Abstract<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-61aebe3 elementor-widget__width-inherit elementor-widget elementor-widget-text-editor\" data-id=\"61aebe3\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>While the early evolution of large language models (LLMs), including shift from statistical approaches to the Transformer\narchitecture, illustrates their historical impact on the processing of natural language; however, the latest research in neural networks\nhas enabled the faster and more powerful rise of language models grounded in solid theoretical foundations. These advantages,\ndriven by advances in computing systems (e.g., ultra-powerful processing and memory capabilities), enable the development of\nnumerous new models based on new emerging technologies such as artificial intelligence (AI). Thus, we provide an evolutionary\noverview of LLMs involved in the shift from the statistical to deep learning approach, highlighting their key stages of development, with a\nparticular focused on concepts such as self-attention, the Transformer architecture, BERT, GPT, DeepSeek, and Claude. Finally, our\nconclusions present a reference point for future research associated with the emergence of new AI-supported models that are\nirreversibly transforming the way an increasing number of human activities are performed.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5c5f19e elementor-widget elementor-widget-heading\" data-id=\"5c5f19e\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Keywords: Artificial intelligence, large language models, Transformer architecture, self-attention.<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4df6678 elementor-widget elementor-widget-heading\" data-id=\"4df6678\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><b>Paper received:<\/b> 7.11.2025.<br><b>Paper accepted:<\/b> 12.11.2025.<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6dd494a wpr-logo-position-center elementor-widget elementor-widget-wpr-logo\" data-id=\"6dd494a\" data-element_type=\"widget\" data-widget_type=\"wpr-logo.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\n\t\t\t<div class=\"wpr-logo elementor-clearfix\">\n\n\t\t\t\t\t\t\t\t<picture class=\"wpr-logo-image\">\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/jita-au.com\/wp-content\/uploads\/2024\/03\/cc-by-1.png\" alt=\"\">\n\n\t\t\t\t\t\t\t\t\t\t\t<a class=\"wpr-logo-url\" rel=\"home\" aria-label=\"\" href=\"https:\/\/jita-au.com\/\"><\/a>\n\t\t\t\t\t\t\t\t\t<\/picture>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t<a class=\"wpr-logo-url\" rel=\"home\" aria-label=\"\" href=\"https:\/\/jita-au.com\/\"><\/a>\n\t\t\t\t\n\t\t\t<\/div>\n\t\t\t\t\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cb049d1 elementor-widget elementor-widget-shortcode\" data-id=\"cb049d1\" data-element_type=\"widget\" data-widget_type=\"shortcode.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-shortcode\"><div class=\"post-views content-post post-3880 entry-meta load-static\">\r\n\t\t\t\t<span class=\"post-views-icon dashicons dashicons-chart-bar\"><\/span> <span class=\"post-views-label\">Post Views:<\/span> <span class=\"post-views-count\">473<\/span>\r\n\t\t\t<\/div><\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-20a3f75 elementor-widget elementor-widget-shortcode\" data-id=\"20a3f75\" data-element_type=\"widget\" data-widget_type=\"shortcode.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-shortcode\">Downloaded Article PDF: <span class=\"snr-download-count-num\" data-post-id=\"0\">0<\/span> times\n<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-511d0df elementor-section-height-min-height elementor-section-full_width elementor-hidden-mobile elementor-hidden-desktop elementor-hidden-laptop elementor-section-height-default elementor-section-items-middle wpr-particle-no wpr-jarallax-no wpr-parallax-no wpr-sticky-section-no\" data-id=\"511d0df\" data-element_type=\"section\" data-settings=\"{&quot;background_background&quot;:&quot;classic&quot;,&quot;_ha_eqh_enable&quot;:false}\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-0acee9c\" data-id=\"0acee9c\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-c0e98b3 elementor-widget__width-inherit ha-has-bg-overlay elementor-widget elementor-widget-heading\" data-id=\"c0e98b3\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Vol. 15 No. 2 (2025): JITA - APEIRON<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a0fca35 elementor-widget elementor-widget-heading\" data-id=\"a0fca35\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><b>Boris Damjanovi\u0107, Dragan Kora\u0107, Dejan Simi\u0107, Negovan Stamenkovi\u0107<b><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-68f1849 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"68f1849\" data-element_type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e7cb7ee elementor-widget elementor-widget-heading\" data-id=\"e7cb7ee\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7b2f501 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"7b2f501\" data-element_type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5027126 elementor-widget__width-initial elementor-widget elementor-widget-heading\" data-id=\"5027126\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Review paper<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4a5312c elementor-widget__width-initial elementor-widget elementor-widget-heading\" data-id=\"4a5312c\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">DOI: https:\/\/doi.org\/10.7251\/JIT2502145D<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c535933 elementor-align-center elementor-widget elementor-widget-button\" data-id=\"c535933\" data-element_type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/jita-au.com\/wp-content\/uploads\/2025\/12\/Pages-from-JITA_Vol-15_Issue-2-WEB-7.pdf\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Download Article PDF<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-580ffd3 elementor-widget__width-inherit elementor-widget elementor-widget-heading\" data-id=\"580ffd3\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Abstract<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-211717e elementor-widget__width-inherit elementor-widget elementor-widget-text-editor\" data-id=\"211717e\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>While the early evolution of large language models (LLMs), including shift from statistical approaches to the Transformer architecture, illustrates their historical impact on the processing of natural language; however, the latest research in neural networks has enabled the faster and more powerful rise of language models grounded in solid theoretical foundations. These advantages, driven by advances in computing systems (e.g., ultra-powerful processing and memory capabilities), enable the development of numerous new models based on new emerging technologies such as artificial intelligence (AI). Thus, we provide an evolutionary overview of LLMs involved in the shift from the statistical to deep learning approach, highlighting their key stages of development, with a particular focused on concepts such as self-attention, the Transformer architecture, BERT, GPT, DeepSeek, and Claude. Finally, our conclusions present a reference point for future research associated with the emergence of new AI-supported models that are irreversibly transforming the way an increasing number of human activities are performed.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-62b72c3 elementor-widget elementor-widget-heading\" data-id=\"62b72c3\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Keywords: Artificial intelligence, large language models, Transformer architecture, self-attention.<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a2636dd elementor-widget elementor-widget-heading\" data-id=\"a2636dd\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><b>Paper received:<\/b> 7.11.2025.<br><b>Paper accepted:<\/b> 12.11.2025.<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f4dcd16 wpr-logo-position-center elementor-widget elementor-widget-wpr-logo\" data-id=\"f4dcd16\" data-element_type=\"widget\" data-widget_type=\"wpr-logo.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\n\t\t\t<div class=\"wpr-logo elementor-clearfix\">\n\n\t\t\t\t\t\t\t\t<picture class=\"wpr-logo-image\">\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/jita-au.com\/wp-content\/uploads\/2024\/03\/cc-by-1.png\" alt=\"\">\n\n\t\t\t\t\t\t\t\t\t\t\t<a class=\"wpr-logo-url\" rel=\"home\" aria-label=\"\" href=\"https:\/\/jita-au.com\/\"><\/a>\n\t\t\t\t\t\t\t\t\t<\/picture>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t<a class=\"wpr-logo-url\" rel=\"home\" aria-label=\"\" href=\"https:\/\/jita-au.com\/\"><\/a>\n\t\t\t\t\n\t\t\t<\/div>\n\t\t\t\t\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0f73093 elementor-widget elementor-widget-shortcode\" data-id=\"0f73093\" data-element_type=\"widget\" data-widget_type=\"shortcode.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-shortcode\"><div class=\"post-views content-post post-3880 entry-meta load-static\">\r\n\t\t\t\t<span class=\"post-views-icon dashicons dashicons-chart-bar\"><\/span> <span class=\"post-views-label\">Post Views:<\/span> <span class=\"post-views-count\">473<\/span>\r\n\t\t\t<\/div><\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-975233a elementor-widget elementor-widget-shortcode\" data-id=\"975233a\" data-element_type=\"widget\" data-widget_type=\"shortcode.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-shortcode\">Downloaded Article PDF: <span class=\"snr-download-count-num\" data-post-id=\"0\">0<\/span> times\n<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-2d980b8 elementor-section-height-min-height elementor-section-full_width elementor-hidden-tablet elementor-hidden-desktop elementor-hidden-laptop elementor-section-height-default elementor-section-items-middle wpr-particle-no wpr-jarallax-no wpr-parallax-no wpr-sticky-section-no\" data-id=\"2d980b8\" data-element_type=\"section\" data-settings=\"{&quot;background_background&quot;:&quot;classic&quot;,&quot;_ha_eqh_enable&quot;:false}\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d772afa\" data-id=\"d772afa\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5597455 elementor-widget__width-inherit elementor-widget-mobile__width-inherit ha-has-bg-overlay elementor-widget elementor-widget-heading\" data-id=\"5597455\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Vol. 15 No. 2 (2025): JITA - APEIRON<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2a0463c elementor-widget elementor-widget-heading\" data-id=\"2a0463c\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><b>Boris Damjanovi\u0107, Dragan Kora\u0107, Dejan Simi\u0107, Negovan Stamenkovi\u0107<b><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b0a72e2 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"b0a72e2\" data-element_type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-50c7989 elementor-widget elementor-widget-heading\" data-id=\"50c7989\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-96936f4 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"96936f4\" data-element_type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f04febc elementor-widget__width-initial elementor-widget-mobile__width-inherit elementor-widget elementor-widget-heading\" data-id=\"f04febc\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Review paper<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-389228b elementor-widget__width-initial elementor-widget-mobile__width-inherit elementor-widget elementor-widget-heading\" data-id=\"389228b\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">DOI: https:\/\/doi.org\/10.7251\/JIT2502145D<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e55dfc6 elementor-align-center elementor-widget elementor-widget-button\" data-id=\"e55dfc6\" data-element_type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/jita-au.com\/wp-content\/uploads\/2025\/12\/Pages-from-JITA_Vol-15_Issue-2-WEB-7.pdf\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">Download Article PDF<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-80faf79 elementor-widget__width-inherit elementor-widget elementor-widget-heading\" data-id=\"80faf79\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Abstract<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8cefffd elementor-widget__width-inherit elementor-widget elementor-widget-text-editor\" data-id=\"8cefffd\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>While the early evolution of large language models (LLMs), including shift from statistical approaches to the Transformer architecture, illustrates their historical impact on the processing of natural language; however, the latest research in neural networks has enabled the faster and more powerful rise of language models grounded in solid theoretical foundations. These advantages, driven by advances in computing systems (e.g., ultra-powerful processing and memory capabilities), enable the development of numerous new models based on new emerging technologies such as artificial intelligence (AI). Thus, we provide an evolutionary overview of LLMs involved in the shift from the statistical to deep learning approach, highlighting their key stages of development, with a particular focused on concepts such as self-attention, the Transformer architecture, BERT, GPT, DeepSeek, and Claude. Finally, our conclusions present a reference point for future research associated with the emergence of new AI-supported models that are irreversibly transforming the way an increasing number of human activities are performed.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0c5e44b elementor-widget elementor-widget-heading\" data-id=\"0c5e44b\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Keywords: Artificial intelligence, large language models, Transformer architecture, self-attention.<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8fcf84a elementor-widget elementor-widget-heading\" data-id=\"8fcf84a\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><b>Paper received:<\/b> 7.11.2025.<br><b>Paper accepted:<\/b> 12.11.2025.<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dee52ab wpr-logo-position-center elementor-widget elementor-widget-wpr-logo\" data-id=\"dee52ab\" data-element_type=\"widget\" data-widget_type=\"wpr-logo.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\n\t\t\t<div class=\"wpr-logo elementor-clearfix\">\n\n\t\t\t\t\t\t\t\t<picture class=\"wpr-logo-image\">\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/jita-au.com\/wp-content\/uploads\/2024\/03\/cc-by-1.png\" alt=\"\">\n\n\t\t\t\t\t\t\t\t\t\t\t<a class=\"wpr-logo-url\" rel=\"home\" aria-label=\"\" href=\"https:\/\/jita-au.com\/\"><\/a>\n\t\t\t\t\t\t\t\t\t<\/picture>\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\t\t\t\t<a class=\"wpr-logo-url\" rel=\"home\" aria-label=\"\" href=\"https:\/\/jita-au.com\/\"><\/a>\n\t\t\t\t\n\t\t\t<\/div>\n\t\t\t\t\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5cb9871 elementor-widget elementor-widget-shortcode\" data-id=\"5cb9871\" data-element_type=\"widget\" data-widget_type=\"shortcode.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-shortcode\"><div class=\"post-views content-post post-3880 entry-meta load-static\">\r\n\t\t\t\t<span class=\"post-views-icon dashicons dashicons-chart-bar\"><\/span> <span class=\"post-views-label\">Post Views:<\/span> <span class=\"post-views-count\">473<\/span>\r\n\t\t\t<\/div><\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-87a5ee0 elementor-widget elementor-widget-shortcode\" data-id=\"87a5ee0\" data-element_type=\"widget\" data-widget_type=\"shortcode.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-shortcode\">Downloaded Article PDF: <span class=\"snr-download-count-num\" data-post-id=\"0\">0<\/span> times\n<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Vol. 15 No. 2 (2025): JITA &#8211; APEIRON Boris Damjanovi\u0107, Dragan Kora\u0107, Dejan Simi\u0107, Negovan Stamenkovi\u0107 An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era Review paperDOI: https:\/\/doi.org\/10.7251\/JIT2502145D Download Article PDF Abstract While the early evolution of large language models (LLMs), including shift from statistical approaches to the Transformer architecture, illustrates their historical impact on the processing of natural language; however, the latest research in neural networks has enabled the faster and more powerful rise of language models grounded in solid theoretical foundations. These advantages, driven by advances in computing systems (e.g., ultra-powerful processing and memory capabilities), enable the development of numerous new models based on new emerging technologies such as artificial intelligence (AI). Thus, we provide an evolutionary overview of LLMs involved in the shift from the statistical to deep learning approach, highlighting their key stages of development, with a particular focused on concepts such as self-attention, the Transformer architecture, BERT, GPT, DeepSeek, and Claude. Finally, our conclusions present a reference point for future research associated with the emergence of new AI-supported models that are irreversibly transforming the way an increasing number of human activities are performed. Keywords: Artificial intelligence, large language models, Transformer architecture, self-attention. Paper received: 7.11.2025.Paper accepted: 12.11.2025. Vol. 15 No. 2 (2025): JITA &#8211; APEIRON Boris Damjanovi\u0107, Dragan Kora\u0107, Dejan Simi\u0107, Negovan Stamenkovi\u0107 An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era Review paper DOI: https:\/\/doi.org\/10.7251\/JIT2502145D Download Article PDF Abstract While the early evolution of large language models (LLMs), including shift from statistical approaches to the Transformer architecture, illustrates their historical impact on the processing of natural language; however, the latest research in neural networks has enabled the faster and more powerful rise of language models grounded in solid theoretical foundations. These advantages, driven by advances in computing systems (e.g., ultra-powerful processing and memory capabilities), enable the development of numerous new models based on new emerging technologies such as artificial intelligence (AI). Thus, we provide an evolutionary overview of LLMs involved in the shift from the statistical to deep learning approach, highlighting their key stages of development, with a particular focused on concepts such as self-attention, the Transformer architecture, BERT, GPT, DeepSeek, and Claude. Finally, our conclusions present a reference point for future research associated with the emergence of new AI-supported models that are irreversibly transforming the way an increasing number of human activities are performed. Keywords: Artificial intelligence, large language models, Transformer architecture, self-attention. Paper received: 7.11.2025.Paper accepted: 12.11.2025. Vol. 15 No. 2 (2025): JITA &#8211; APEIRON Boris Damjanovi\u0107, Dragan Kora\u0107, Dejan Simi\u0107, Negovan Stamenkovi\u0107 An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era Review paper DOI: https:\/\/doi.org\/10.7251\/JIT2502145D Download Article PDF Abstract While the early evolution of large language models (LLMs), including shift from statistical approaches to the Transformer architecture, illustrates their historical impact on the processing of natural language; however, the latest research in neural networks has enabled the faster and more powerful rise of language models grounded in solid theoretical foundations. These advantages, driven by advances in computing systems (e.g., ultra-powerful processing and memory capabilities), enable the development of numerous new models based on new emerging technologies such as artificial intelligence (AI). Thus, we provide an evolutionary overview of LLMs involved in the shift from the statistical to deep learning approach, highlighting their key stages of development, with a particular focused on concepts such as self-attention, the Transformer architecture, BERT, GPT, DeepSeek, and Claude. Finally, our conclusions present a reference point for future research associated with the emergence of new AI-supported models that are irreversibly transforming the way an increasing number of human activities are performed. Keywords: Artificial intelligence, large language models, Transformer architecture, self-attention. Paper received: 7.11.2025.Paper accepted: 12.11.2025.<\/p>\n","protected":false},"author":1,"featured_media":3827,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"elementor_header_footer","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-3880","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era - JITA -Journal of Information Technology and Application<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era - JITA -Journal of Information Technology and Application\" \/>\n<meta property=\"og:description\" content=\"Vol. 15 No. 2 (2025): JITA &#8211; APEIRON Boris Damjanovi\u0107, Dragan Kora\u0107, Dejan Simi\u0107, Negovan Stamenkovi\u0107 An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era Review paperDOI: https:\/\/doi.org\/10.7251\/JIT2502145D Download Article PDF Abstract While the early evolution of large language models (LLMs), including shift from statistical approaches to the Transformer architecture, illustrates their historical impact on the processing of natural language; however, the latest research in neural networks has enabled the faster and more powerful rise of language models grounded in solid theoretical foundations. These advantages, driven by advances in computing systems (e.g., ultra-powerful processing and memory capabilities), enable the development of numerous new models based on new emerging technologies such as artificial intelligence (AI). Thus, we provide an evolutionary overview of LLMs involved in the shift from the statistical to deep learning approach, highlighting their key stages of development, with a particular focused on concepts such as self-attention, the Transformer architecture, BERT, GPT, DeepSeek, and Claude. Finally, our conclusions present a reference point for future research associated with the emergence of new AI-supported models that are irreversibly transforming the way an increasing number of human activities are performed. Keywords: Artificial intelligence, large language models, Transformer architecture, self-attention. Paper received: 7.11.2025.Paper accepted: 12.11.2025. Vol. 15 No. 2 (2025): JITA &#8211; APEIRON Boris Damjanovi\u0107, Dragan Kora\u0107, Dejan Simi\u0107, Negovan Stamenkovi\u0107 An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era Review paper DOI: https:\/\/doi.org\/10.7251\/JIT2502145D Download Article PDF Abstract While the early evolution of large language models (LLMs), including shift from statistical approaches to the Transformer architecture, illustrates their historical impact on the processing of natural language; however, the latest research in neural networks has enabled the faster and more powerful rise of language models grounded in solid theoretical foundations. These advantages, driven by advances in computing systems (e.g., ultra-powerful processing and memory capabilities), enable the development of numerous new models based on new emerging technologies such as artificial intelligence (AI). Thus, we provide an evolutionary overview of LLMs involved in the shift from the statistical to deep learning approach, highlighting their key stages of development, with a particular focused on concepts such as self-attention, the Transformer architecture, BERT, GPT, DeepSeek, and Claude. Finally, our conclusions present a reference point for future research associated with the emergence of new AI-supported models that are irreversibly transforming the way an increasing number of human activities are performed. Keywords: Artificial intelligence, large language models, Transformer architecture, self-attention. Paper received: 7.11.2025.Paper accepted: 12.11.2025. Vol. 15 No. 2 (2025): JITA &#8211; APEIRON Boris Damjanovi\u0107, Dragan Kora\u0107, Dejan Simi\u0107, Negovan Stamenkovi\u0107 An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era Review paper DOI: https:\/\/doi.org\/10.7251\/JIT2502145D Download Article PDF Abstract While the early evolution of large language models (LLMs), including shift from statistical approaches to the Transformer architecture, illustrates their historical impact on the processing of natural language; however, the latest research in neural networks has enabled the faster and more powerful rise of language models grounded in solid theoretical foundations. These advantages, driven by advances in computing systems (e.g., ultra-powerful processing and memory capabilities), enable the development of numerous new models based on new emerging technologies such as artificial intelligence (AI). Thus, we provide an evolutionary overview of LLMs involved in the shift from the statistical to deep learning approach, highlighting their key stages of development, with a particular focused on concepts such as self-attention, the Transformer architecture, BERT, GPT, DeepSeek, and Claude. Finally, our conclusions present a reference point for future research associated with the emergence of new AI-supported models that are irreversibly transforming the way an increasing number of human activities are performed. Keywords: Artificial intelligence, large language models, Transformer architecture, self-attention. Paper received: 7.11.2025.Paper accepted: 12.11.2025.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/\" \/>\n<meta property=\"og:site_name\" content=\"JITA -Journal of Information Technology and Application\" \/>\n<meta property=\"article:published_time\" content=\"2025-12-15T11:21:15+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-15T14:02:18+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/jita-au.com\/wp-content\/uploads\/2025\/12\/Pages-from-JITA_Vol-15_Issue-2-WEB.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"612\" \/>\n\t<meta property=\"og:image:height\" content=\"805\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/\"},\"author\":{\"name\":\"admin\",\"@id\":\"https:\/\/jita-au.com\/#\/schema\/person\/d4becda53cfcbc99c449927eabf3877f\"},\"headline\":\"An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era\",\"datePublished\":\"2025-12-15T11:21:15+00:00\",\"dateModified\":\"2025-12-15T14:02:18+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/\"},\"wordCount\":667,\"publisher\":{\"@id\":\"https:\/\/jita-au.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/jita-au.com\/wp-content\/uploads\/2025\/12\/Pages-from-JITA_Vol-15_Issue-2-WEB.jpg\",\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/\",\"url\":\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/\",\"name\":\"An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era - JITA -Journal of Information Technology and Application\",\"isPartOf\":{\"@id\":\"https:\/\/jita-au.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/jita-au.com\/wp-content\/uploads\/2025\/12\/Pages-from-JITA_Vol-15_Issue-2-WEB.jpg\",\"datePublished\":\"2025-12-15T11:21:15+00:00\",\"dateModified\":\"2025-12-15T14:02:18+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#primaryimage\",\"url\":\"https:\/\/jita-au.com\/wp-content\/uploads\/2025\/12\/Pages-from-JITA_Vol-15_Issue-2-WEB.jpg\",\"contentUrl\":\"https:\/\/jita-au.com\/wp-content\/uploads\/2025\/12\/Pages-from-JITA_Vol-15_Issue-2-WEB.jpg\",\"width\":612,\"height\":805},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/jita-au.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/jita-au.com\/#website\",\"url\":\"https:\/\/jita-au.com\/\",\"name\":\"JITA -Journal of Information Technology and Application\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/jita-au.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/jita-au.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/jita-au.com\/#organization\",\"name\":\"JITA -Journal of Information Technology and Application\",\"url\":\"https:\/\/jita-au.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/jita-au.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/jita-au.com\/wp-content\/uploads\/2024\/03\/cropped-JITA-logo-300px-1-1.jpg\",\"contentUrl\":\"https:\/\/jita-au.com\/wp-content\/uploads\/2024\/03\/cropped-JITA-logo-300px-1-1.jpg\",\"width\":300,\"height\":164,\"caption\":\"JITA -Journal of Information Technology and Application\"},\"image\":{\"@id\":\"https:\/\/jita-au.com\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/jita-au.com\/#\/schema\/person\/d4becda53cfcbc99c449927eabf3877f\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/jita-au.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fb1767673e75e9127846ff73b2b9e96214fba2d4675dc6799cec11e9b4380ca2?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fb1767673e75e9127846ff73b2b9e96214fba2d4675dc6799cec11e9b4380ca2?s=96&d=mm&r=g\",\"caption\":\"admin\"},\"sameAs\":[\"https:\/\/jita-au.com\"],\"url\":\"https:\/\/jita-au.com\/index.php\/author\/jita-au-com\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era - JITA -Journal of Information Technology and Application","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/","og_locale":"en_US","og_type":"article","og_title":"An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era - JITA -Journal of Information Technology and Application","og_description":"Vol. 15 No. 2 (2025): JITA &#8211; APEIRON Boris Damjanovi\u0107, Dragan Kora\u0107, Dejan Simi\u0107, Negovan Stamenkovi\u0107 An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era Review paperDOI: https:\/\/doi.org\/10.7251\/JIT2502145D Download Article PDF Abstract While the early evolution of large language models (LLMs), including shift from statistical approaches to the Transformer architecture, illustrates their historical impact on the processing of natural language; however, the latest research in neural networks has enabled the faster and more powerful rise of language models grounded in solid theoretical foundations. These advantages, driven by advances in computing systems (e.g., ultra-powerful processing and memory capabilities), enable the development of numerous new models based on new emerging technologies such as artificial intelligence (AI). Thus, we provide an evolutionary overview of LLMs involved in the shift from the statistical to deep learning approach, highlighting their key stages of development, with a particular focused on concepts such as self-attention, the Transformer architecture, BERT, GPT, DeepSeek, and Claude. Finally, our conclusions present a reference point for future research associated with the emergence of new AI-supported models that are irreversibly transforming the way an increasing number of human activities are performed. Keywords: Artificial intelligence, large language models, Transformer architecture, self-attention. Paper received: 7.11.2025.Paper accepted: 12.11.2025. Vol. 15 No. 2 (2025): JITA &#8211; APEIRON Boris Damjanovi\u0107, Dragan Kora\u0107, Dejan Simi\u0107, Negovan Stamenkovi\u0107 An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era Review paper DOI: https:\/\/doi.org\/10.7251\/JIT2502145D Download Article PDF Abstract While the early evolution of large language models (LLMs), including shift from statistical approaches to the Transformer architecture, illustrates their historical impact on the processing of natural language; however, the latest research in neural networks has enabled the faster and more powerful rise of language models grounded in solid theoretical foundations. These advantages, driven by advances in computing systems (e.g., ultra-powerful processing and memory capabilities), enable the development of numerous new models based on new emerging technologies such as artificial intelligence (AI). Thus, we provide an evolutionary overview of LLMs involved in the shift from the statistical to deep learning approach, highlighting their key stages of development, with a particular focused on concepts such as self-attention, the Transformer architecture, BERT, GPT, DeepSeek, and Claude. Finally, our conclusions present a reference point for future research associated with the emergence of new AI-supported models that are irreversibly transforming the way an increasing number of human activities are performed. Keywords: Artificial intelligence, large language models, Transformer architecture, self-attention. Paper received: 7.11.2025.Paper accepted: 12.11.2025. Vol. 15 No. 2 (2025): JITA &#8211; APEIRON Boris Damjanovi\u0107, Dragan Kora\u0107, Dejan Simi\u0107, Negovan Stamenkovi\u0107 An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era Review paper DOI: https:\/\/doi.org\/10.7251\/JIT2502145D Download Article PDF Abstract While the early evolution of large language models (LLMs), including shift from statistical approaches to the Transformer architecture, illustrates their historical impact on the processing of natural language; however, the latest research in neural networks has enabled the faster and more powerful rise of language models grounded in solid theoretical foundations. These advantages, driven by advances in computing systems (e.g., ultra-powerful processing and memory capabilities), enable the development of numerous new models based on new emerging technologies such as artificial intelligence (AI). Thus, we provide an evolutionary overview of LLMs involved in the shift from the statistical to deep learning approach, highlighting their key stages of development, with a particular focused on concepts such as self-attention, the Transformer architecture, BERT, GPT, DeepSeek, and Claude. Finally, our conclusions present a reference point for future research associated with the emergence of new AI-supported models that are irreversibly transforming the way an increasing number of human activities are performed. Keywords: Artificial intelligence, large language models, Transformer architecture, self-attention. Paper received: 7.11.2025.Paper accepted: 12.11.2025.","og_url":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/","og_site_name":"JITA -Journal of Information Technology and Application","article_published_time":"2025-12-15T11:21:15+00:00","article_modified_time":"2025-12-15T14:02:18+00:00","og_image":[{"width":612,"height":805,"url":"https:\/\/jita-au.com\/wp-content\/uploads\/2025\/12\/Pages-from-JITA_Vol-15_Issue-2-WEB.jpg","type":"image\/jpeg"}],"author":"admin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#article","isPartOf":{"@id":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/"},"author":{"name":"admin","@id":"https:\/\/jita-au.com\/#\/schema\/person\/d4becda53cfcbc99c449927eabf3877f"},"headline":"An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era","datePublished":"2025-12-15T11:21:15+00:00","dateModified":"2025-12-15T14:02:18+00:00","mainEntityOfPage":{"@id":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/"},"wordCount":667,"publisher":{"@id":"https:\/\/jita-au.com\/#organization"},"image":{"@id":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#primaryimage"},"thumbnailUrl":"https:\/\/jita-au.com\/wp-content\/uploads\/2025\/12\/Pages-from-JITA_Vol-15_Issue-2-WEB.jpg","inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/","url":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/","name":"An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era - JITA -Journal of Information Technology and Application","isPartOf":{"@id":"https:\/\/jita-au.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#primaryimage"},"image":{"@id":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#primaryimage"},"thumbnailUrl":"https:\/\/jita-au.com\/wp-content\/uploads\/2025\/12\/Pages-from-JITA_Vol-15_Issue-2-WEB.jpg","datePublished":"2025-12-15T11:21:15+00:00","dateModified":"2025-12-15T14:02:18+00:00","breadcrumb":{"@id":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#primaryimage","url":"https:\/\/jita-au.com\/wp-content\/uploads\/2025\/12\/Pages-from-JITA_Vol-15_Issue-2-WEB.jpg","contentUrl":"https:\/\/jita-au.com\/wp-content\/uploads\/2025\/12\/Pages-from-JITA_Vol-15_Issue-2-WEB.jpg","width":612,"height":805},{"@type":"BreadcrumbList","@id":"https:\/\/jita-au.com\/index.php\/2025\/12\/15\/an-evolutionary-overview-of-large-language-models-from-statistical-methods-to-the-transformer-era\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/jita-au.com\/"},{"@type":"ListItem","position":2,"name":"An Evolutionary Overview of Large Language Models: From Statistical Methods to the Transformer Era"}]},{"@type":"WebSite","@id":"https:\/\/jita-au.com\/#website","url":"https:\/\/jita-au.com\/","name":"JITA -Journal of Information Technology and Application","description":"","publisher":{"@id":"https:\/\/jita-au.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/jita-au.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/jita-au.com\/#organization","name":"JITA -Journal of Information Technology and Application","url":"https:\/\/jita-au.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/jita-au.com\/#\/schema\/logo\/image\/","url":"https:\/\/jita-au.com\/wp-content\/uploads\/2024\/03\/cropped-JITA-logo-300px-1-1.jpg","contentUrl":"https:\/\/jita-au.com\/wp-content\/uploads\/2024\/03\/cropped-JITA-logo-300px-1-1.jpg","width":300,"height":164,"caption":"JITA -Journal of Information Technology and Application"},"image":{"@id":"https:\/\/jita-au.com\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/jita-au.com\/#\/schema\/person\/d4becda53cfcbc99c449927eabf3877f","name":"admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/jita-au.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/fb1767673e75e9127846ff73b2b9e96214fba2d4675dc6799cec11e9b4380ca2?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fb1767673e75e9127846ff73b2b9e96214fba2d4675dc6799cec11e9b4380ca2?s=96&d=mm&r=g","caption":"admin"},"sameAs":["https:\/\/jita-au.com"],"url":"https:\/\/jita-au.com\/index.php\/author\/jita-au-com\/"}]}},"_links":{"self":[{"href":"https:\/\/jita-au.com\/index.php\/wp-json\/wp\/v2\/posts\/3880","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jita-au.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jita-au.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jita-au.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jita-au.com\/index.php\/wp-json\/wp\/v2\/comments?post=3880"}],"version-history":[{"count":7,"href":"https:\/\/jita-au.com\/index.php\/wp-json\/wp\/v2\/posts\/3880\/revisions"}],"predecessor-version":[{"id":3958,"href":"https:\/\/jita-au.com\/index.php\/wp-json\/wp\/v2\/posts\/3880\/revisions\/3958"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/jita-au.com\/index.php\/wp-json\/wp\/v2\/media\/3827"}],"wp:attachment":[{"href":"https:\/\/jita-au.com\/index.php\/wp-json\/wp\/v2\/media?parent=3880"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jita-au.com\/index.php\/wp-json\/wp\/v2\/categories?post=3880"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jita-au.com\/index.php\/wp-json\/wp\/v2\/tags?post=3880"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}