Releases: WillTheFarmer/apache-logs-to-mysql
MySQL & MariaDB compatible
Minimized, Cleaned, Formatted (spaces/no tabs) and removed schema name from all database object names in each of the 27 development scripts. MySQL/MariaDB specific code and version improvements. Processing is twice as fast in MariaDB on both Ubuntu and Windows. MySQL screams on MacOS with Apple silicon ARM64 processors. Did not install MariaDB on MacOS.
- [3.3.1] stripped schema name from all qualified database object names in
apache_logs_schema.sqlto minimize code and make transition to different schema name if required easier. - [3.3.1] MariaDB and MySQL version-specific code implementation using - /M!100500 and /!50700 for index creation and adding system variables MySQL @@server_uuid and MariaDB @@server_uid.
- [3.3.1] increased column widths for LOAD TABLES -
first_line_request,req_uri,req_queryand decreaseduseragentto handle LimitRequestLine 8190 - https://httpd.apache.org/docs/2.2/mod/core.html#limitrequestline' - [3.3.1] increased column width from 2000 to 5000 for TABLE
access_log_reqqueryand modified FUNCTIONaccess_reqQueryIDincreased in_ReqQuery VARCHAR(2000) to VARCHAR(5000)
v3.3.0
This was missed during major revamp of two import processes. Does not effect processed data from last version. One of the main reasons for the revamp was to limit calls to importFileCheck to one per file. This should have been changed in version 3.0.0 when the process changed completely incorporating INNER JOIN with import_file TABLE. Version 3.2.9 called importFileCheck for each record in file instead of called once for each file.
- [3.3.0] modify of
process_access_importchangedl.importfileidtoDISTINCT(l.importfileid)for cursors csv2mysqlStatusFile, csv2mysqlLoadIDFile, vhostStatusFile, vhostLoadIDFile, combinedStatusFile, combinedLoadIDFile - [3.3.0] modify of
process_error_importchangedl.importfileidtoDISTINCT(l.importfileid)for cursors defaultByLoadIDFile and defaultByStatusFile
v3.2.9
- [3.2.9] fix mistake in made in last version of
process_access_importandprocess_error_import. importfileid was incorrectly changed causing records not to be related to files. - [3.2.9] add views -
access_client_city_list,access_client_country_code_list,access_client_country_list,access_client_subdivision_list,access_client_organization_list,access_client_network_list
v3.2.8
- [3.2.8] @@server_uuid and UUID() - these 2 are not the same - changed in version 3.2.0 on 02/01/2025 release - since then records are added to import_server TABLE as different servers each execution
- [3.2.8] add comments to
importProcessIDexplain changes - scraped using server_uuid, UUID() and server_uid. - [3.2.8] alter TABLE
import_serverrename COLUMNserveruuidtodbcomment. - [3.2.8] modify
process_access_parse,process_error_parse,process_access_importandprocess_error_importadd file LOOP for CALL toimportFileCheck. Less CALLS, cleaner code for changes made in version 3.0. - [3.2.8] add
FOR UPDATEclause to SELECTS forprocess_access_parse,process_error_parse,process_access_importandprocess_error_importto LOCK RECORDS.
v3.2.7
- [3.2.7] add except Exception as e: to all previous except: statements. e is printed with error message to console and logged to
import_errorTABLE. - [3.2.7] fixed MacOS and Linux platforms issue with double seperators in paths stored in
import_fileTABLE. This was a result of fixing the double separator on Windows platform. Issue now fixed on all 3 platforms. - [3.2.7] modify
apache_logs_schema.sqlgeneration script to comment out DROP statements and add comment to start of each merged file. - [3.2.7] add two indexes for companion Web Interface - mysql-to-apache-echarts which is due to be released mid-March.
Raw & Refined schema script
The first I heard about this problem was by email at 4PM today. Clearly no one else has attempted to install on MariaDB or fixed the issue themselves. This is a much better approach since I now have a python script to consolidate the 25 source code scripts into a much smaller SQL file without any of the TABLE LOCKS and SET environment variables. The file is now just raw, bare bones script. Please email or create issue for any problems. I am working on the Web interface for this and will have initial release first week of March.
- [3.2.6] created new Python script to generate
apache_logs_schema.sqlfrom the 25 refined development source code scripts. Prior to this version file was generated by MySQL Workbench. - [3.2.6] repository had MySQL Workbench generated script that added CHARSET and COLLATE. The raw source code scripts did not specify CHARSET or COLLATE. Raw source code scripts worked fine on MySQL and MariaDB testing.
- [3.2.6] MySQL 9.1 default is CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci. MariaDB 11.6 default is CHARSET=utf8mb4 COLLATE=utf8mb4_uca1400_ai_ci. COLLATE=utf8mb4_0900_ai_ci does not exist in MariaDB causing script errors.
- [3.2.6] The solution that works for both MySQL and MariaDB is only specify CHARSET=utf8mb4. If CHARACTER SET charset_name is specified without COLLATE, character set charset_name and its default collation are used.
Stress Test Improvements
Log Generator created data with many values outside of "real-life" logs. All prior testing was done with "real" logs. Some values outside of "real-life" values causing unseen issues that required changing some table columns from TINYINT to INT. The data also caused error routines to be truly tested. Ran 5 million Combined Access records and 1 Million Error records through MySQL and MariaDB. Used environment variables ERROR_SERVER, ERROR_SERVERPORT, COMBINED_SERVER, COMBINED_SERVERPORT to assign domains. This should be the last of the database structure changes. I did think I was done last version but the Log Generator shook things up!
MariaDB 11.6 is consistently twice as fast as MySQL 9.1 in every benchmark on Windows 11. I have not run benchmarks on Ubuntu or MacOS.
- [3.2.5] modify
process_access_importto correctremoteLogNameandremoteUsercolumn processing. Values were switched in tables stored. Running log generator stress tests flushed this out. - [3.2.5] modify
logs2mysql.pyto add timing variables for all child processes to display in logs and store toimport_loadTABLE. - [3.2.5] modify TABLE
import_loadadded six columns for process execution durations in seconds. - [3.2.5] modify
logs2mysql.pyreworked all process message logging verbiage to provide child process summary information at each phase. Running log generator stress tests flushed this out. - [3.2.5] modify
logs2mysql.pyreduced number of cursor objects created by reusing only two cursor objects. Reduced all variables for import_file TABLE processing using same variables for all. - [3.2.5] modify TABLES
access_log_useragentandlog_clientadded indexes for use inlogs2mysql.pyprocessing.
MariaDB compatible
1/30/2025 I attended a MariaDB webinar and was very impressed with technical knowledge and passion the four presenters conveyed about MariaDB. I spent that night installing database on MariaDB. Last night I ran 11,600 Apache Access and Error log files with 763,560 Access and 86,480 Error records thru both MySQL and MariaDB. MariaDB processes execute in about half the time as MySQL. I am liking MariaDB over MySQL. The Log Rotation functionality is very cool as well! Summary of changes below.
- [3.2.0] Database function and procedure modifications required for compatibility with MariaDB. Application processes have been tested with version 11.6. MariaDB tests twice as fast as MYSQL.
- [3.2.0] Major reworking of logs2mysql.py logging process messaging and incorporate Log File Rotation functionality with environment variables - BACKUP_DAYS and BACKUP_PATH
- [3.2.0] modify Store Function
apache_logs.importFileExistsfor Log File Rotation functionality. - [3.2.0] modify Store Function
apache_logs.importProcessIDfor compatibility with MariaDB. - [3.2.0] add Python function
def copy_backup_file(log_path_file, log_days)to reuse log file copy and delete functionality, - [3.2.0] add log summary to end of Python
processLogsinlogs2mysql.pyto provide more process information to PM2 logs.
IP Geolocation integration
- [3.0.0] This version is NOT backward compatible to previous versions due to many database and process changes. These are final major changes required for Web interface in development.
- [3.0.0] Integration with MaxMind GeoIP2 Python API to enhance Client IP geolocation data for Log Data Visualization in charts, reports & data analysis interfaces.
- [3.0.0] modify
logs2mysql.pyto integrate IP data retrieval process and reorganizing encapsulation of all processes within the same "Import Load Process". - [3.0.0] add TABLES
log_client_city,log_client_coordinate,log_client_country,log_client_network,log_client_organizationandlog_client_subdivisionfor IP geolocation data. - [3.0.0] add
normalize_clientSTORED PROCEDURE to normalize IP Address geolocation data into 6 tables. - [3.0.0] rename TABLES
log_clientnametolog_client,log_servernametolog_server - [3.0.0] rename COLUMNS
clientnameidtoclientid,servernameidtoserveridthroughout application tables and processes. - [3.0.0] modify
process_access_parseandprocess_error_parseWHERE CLAUSES for server_name UPDATE commands. - [3.0.0] add 16 stored functions for primary attribute tables to return names for Slice and dice is a data analysis in drill-down Web interface.
- [3.0.0] modify and reworded all console log messages in
logs2mysql.pyto standardize messages for each process. Added COLORS to coordinate message types for better readability. - [3.0.0] modify all database INDEX NAMES for standardization and consolidation.
- [3.0.0] tested simultaneously uploading logs from 10 VPS with multiple VirtualHosts on each Server processing thousands of files in different formats and millions of log records.
v2.1.6
- [2.1.6] repository name change - ApacheLogs2MySQL to apache-logs-to-mysql
- [2.1.6] rename files - apachelogs2MySQL.py to logs2mysql.py, apachelogs2MySQL.sql to apache_logs_schema.sql
- [2.1.6] modify
logs2mysql.pylineif useragent_process == 1:toif useragent_process >= 1: - [2.1.6] modify all files with refers to repository name. Changed
ApacheLogs2MySQLtoapache-logs-to-mysql - [2.1.6] "application name" is still referred to as
ApacheLogs2MySQLinREADME.md,CITATION.md,logs2mysql.py,watch4logs.py,apache_logs_schema.sql,INSTALL.mdandsettings.env