You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[1.0.1] remove whitespace and commented out old code on all stored programs
@@ -33,12 +34,12 @@
33
34
-[2.0.0] add ServerName & ServerPort on import Combined & Error logs stage tables. Option allow adding domains to logs.
34
35
-[2.0.0] add ERROR_SERVERNAME,ERROR_SERVERPORT,COMBINED_SERVERNAME & COMBINED_SERVERPORT variables to settings.env.
35
36
-[2.0.0] add SET servername & serverport COLUMN values to LOAD DATA statements.
36
-
-[2.0.0] create log_referer, log_remotehost, log_servername, log_serverport TABLES to assoicate Access and Error logs.
37
+
-[2.0.0] create log_referer, log_remotehost, log_servername, log_serverport TABLES to associate Access and Error logs.
37
38
-[2.0.0] add server_name & server_port COLUMNS to import_file TABLE. Provides second option to update Apache logs without %v.
38
39
-[2.0.0] add compound indexes ACCESS_LOG and ERROR_LOG for ServerName and Serverport.
39
40
-[2.0.0] modify process_access_import & process_error_import to populate empty server_name & server_port with ServerName & ServerPort from import_file TABLE.
40
41
-[2.0.0] add WATCH_LOG to setting Log Level in watch4logs.py. 0=no messages, 1=message when files found, 2=message when checking for files & files found
41
-
-[2.0.0] add class bcolors to place RED BACKGROUND on all ERROR - messsages
42
+
-[2.0.0] add class bcolors to place RED BACKGROUND on all ERROR - messages
42
43
-[2.0.0] add file - mysql_user_and_grants.sql - MySQL USER and GRANTS file for CREATE USER apache_upload for Python module
43
44
-[2.0.0] add Start and End DATETIME to processLogs Function. Already had duration times.
44
45
-[2.0.0] add file - call_processes.sql - description and CALL command lines for 5 Stored Procedures
@@ -47,7 +48,7 @@
47
48
-[2.0.0] This version is the application baseline
48
49
-[2.1.0] add request_log_id to access and error formats functionality. Enables easier association with access and error records.
49
50
-[2.1.0] add columns to load_error_default & load_access_csv2mysql TABLES
50
-
-[2.1.0] modify process_error_parse - replace POSITION function with LOCATE function, removed unrequired brackets, add parsing logic for %v and %L String Formats.
51
+
-[2.1.0] modify process_error_parse - replace POSITION function with LOCATE function, removed not required brackets, add parsing logic for %v and %L String Formats.
51
52
-[2.1.0] modify process_error_import - add normalization for request_log_id, replace POSITION function with LOCATE function
52
53
-[2.1.0] modify process_access_parse - add parsing for request_log_id, replace POSITION function with LOCATE function
53
54
-[2.1.0] modify process_access_import - add normalization for request_log_id, replace POSITION function with LOCATE function
@@ -88,4 +89,16 @@
88
89
-[2.1.6] rename files - apachelogs2MySQL.py to logs2mysql.py, apachelogs2MySQL.sql to apache_logs_schema.sql
89
90
-[2.1.6] modify `logs2mysql.py` line `if useragent_process == 1:` to `if useragent_process >= 1:`
90
91
-[2.1.6] modify all files with refers to repository name. Changed `ApacheLogs2MySQL` to `apache-logs-to-mysql`
91
-
-[2.1.6] "application name" is still referred to as `ApacheLogs2MySQL` in `README.md`, `CITATION.md`, `logs2mysql.py`, `watch4logs.py`, `apache_logs_schema.sql`, `INSTALL.md` and `settings.env`
92
+
-[2.1.6] "application name" is still referred to as `ApacheLogs2MySQL` in `README.md`, `CITATION.md`, `logs2mysql.py`, `watch4logs.py`, `apache_logs_schema.sql`, `INSTALL.md` and `settings.env`
93
+
-[3.0.0] This version is NOT backward compatible to previous versions due to many database and process changes. These are final major changes required for Web interface in development.
94
+
-[3.0.0] Integration with MaxMind GeoIP2 Python API to enhance Client IP geolocation data for Log Data Visualization in charts, reports & data analysis interfaces.
95
+
-[3.0.0] modify `logs2mysql.py` to integrate IP data retrieval process and reorganizing encapsulation of all processes within the same "Import Load Process".
96
+
-[3.0.0] add TABLES `log_client_city`, `log_client_coordinate`, `log_client_country`, `log_client_network`, `log_client_organization` and `log_client_subdivision` for IP geolocation data.
97
+
-[3.0.0] add `normalize_client` STORED PROCEDURE to normalize IP Address geolocation data into 6 tables.
98
+
-[3.0.0] rename TABLES `log_clientname` to `log_client`, `log_servername` to `log_server`
99
+
-[3.0.0] rename COLUMNS `clientnameid` to `clientid`, `servernameid` to `serverid` throughout application tables and processes.
100
+
-[3.0.0] modify `process_access_parse` and `process_error_parse` WHERE CLAUSES for server_name UPDATE commands.
101
+
-[3.0.0] add 16 stored functions for log attribute tables to return names for Slice and dice is a data analysis in drill-down Web interface.
102
+
-[3.0.0] modify and reworded all console log messages in `logs2mysql.py` to standardize messages for each process. Added COLORS to coordinate message types for better readability.
103
+
-[3.0.0] modify all database INDEX NAMES for standardization and consolidation.
104
+
-[3.0.0] tested simultaneously uploading logs from 10 VPS with multiple VirtualHosts on each Server processing thousands of files in different formats and millions of log records.
Copy file name to clipboardExpand all lines: .github/CONTRIBUTING.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,14 +10,14 @@ I volunteer for a nonprofit organization that wanted to import their Apache webs
10
10
11
11
First I installed the Apache log_sql_mysql modules which did create a single MySQL mostly empty table of the access log with no control or customization and many other issues. Next I experimented with several simple log file parsers but none normalized the parsed log data into a MySQL database. Finally I reviewed other available Apache logging solutions that didn't use MySQL including GoAccess, Logstash, Apache Viewer, DataDog and others as well as CrowdStrike and Solarwinds Loggly.
12
12
13
-
Mid-September 2024 after all my research I decided to write a simple solution which snowballed into a complete application. All October I worked long hours around the clock. November I spent incorporating the application into VPS websites and applications I oversee while making improvements along the way. Version 2.0.0 fixed the major issues encountered and is the application baseline. December I spent refining the major changes made in Version 2.0.0. Version 2.1.5 was last code change to fix client identification issue when OS version changes by adding `import_device` TABLE. The first week of January 2025 I spent processing millions of records from 10 VPS simultaneously to single MySQL Server.
13
+
Mid-September 2024 after all my research I decided to write a simple solution which snowballed into a complete application. All October I worked long hours around the clock. November I spent incorporating the application into VPS websites and applications I oversee while making improvements along the way. Version 2.0.0 fixed the major issues encountered and is the application baseline. December I spent refining the major changes made in Version 2.0.0. Version 2.1.5 was last code change to fix client identification issue when OS version changes by adding `import_device` TABLE.
14
14
15
-
Version 2.1.6 renames the repository, the 2 Python modules files and the MySQL schema creation script file. This version of application is production ready.
15
+
First 2 weeks of January 2025 I spent processing millions of records from 10 VPS simultaneously to single MySQL Server. Version 3.0.0 is last major change with IP Address geoLocation and a final pass through to fine tune processes and rename some tables and columns. This version of application is production ready.
16
+
17
+
The final version is less Python and more SQL and much faster processing millions of records. At this point, I have over 1050 hours of research, design, iteration & development into application. It is much more time then I intended to invest into this project but it did produce my first open-source software.
16
18
17
19
That's how volunteering, lack of a viable MySQL solution and a flexible schedule came together just right to allow me to dive deep into this project.
18
20
19
21
### “Timing, degree and conviction are the three wise men in this life.” — Robert I. Fitzhenry
20
22
21
-
The final version is less Python and more SQL and much faster processing millions of records. At this point, I have over 950 hours of research, design, iteration & development into application. It is much more time then I intended to invest into this project but it did produce my first open-source software.
22
-
23
-
Monetary contributions made will be reflected in development of Web Interface with Drill Down Capability and [apache/echarts](https://github.com/apache/echarts) Log Visualization integration for this MySQL `apache_logs` schema. Web Interface will be released in separate repository.
23
+
Monetary contributions made will be reflected in development of [Web Interface](https://github.com/WillTheFarmer/mysql-to-apache-echarts) for this MySQL `apache_logs` schema.
If MySQL steps completed successfully, successfully installed Python modules, renamed file `settings.env` to `.env`, and updated MySQL server connection and log folder variables it is time to run application.
Copy file name to clipboardExpand all lines: .github/README.md
+12-17Lines changed: 12 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,31 +3,25 @@
3
3
ApacheLogs2MySQL consists of two Python Modules & one MySQL Schema ***apache_logs*** to automate importing Access & Error files
4
4
and normalizing data into database designed for reports & data analysis.
5
5
6
-
Runs on Windows, Linux and MacOS & tested with MySQL versions 8.0.39, 8.4.3, 9.0.0 & 9.1.0.
7
-
8
6
Imports Access Logs in LogFormats - ***common***, ***combined*** and ***vhost_combined*** & additional ***csv2mysql***
9
7
LogFormat defined :point_down:
10
8
11
-
Imports Error Logs in ***default*** ErrorLogFormat & ***additional*** ErrorLogFormat defined below performing data harmonization
12
-
on Apache Codes & Messages, System Codes & Messages, and Log Messages to create a unified, standardized dataset.
13
-
Error Log view images :point_down:
14
-
15
-
Three options to associate ServerName & ServerPort with Access and Error logs missing `%v - canonical ServerName`
16
-
and `%p - canonical ServerPort` Format Strings described :point_down:
9
+
Imports Error Logs in ***default*** ErrorLogFormat & ***additional*** ErrorLogFormat defined below performing data harmonization on Apache Codes & Messages, System Codes & Messages, and Log Messages to create a unified, standardized dataset. Error Log view images :point_down:
17
10
18
-
4 LogFormats & 2 ErrorLogFormats can be loaded and 6 MySQL Stored Procedures can be processed in a single Python `ProcessLogs function` execution.
11
+
All processing stages are encapsulated within one "Import Load" that captures process metrics, notifications and errors into MySQL import tables. Every log data record is traceable back to the computer, folder, file, load process, parse process and import process it came from.
19
12
20
-
Database Schema ***apache_logs*** designed to accommodate unlimited servers & domains. Step-by-step guide for easy installation :point_down:
13
+
Multiple Access and Error logs and formats can be loaded, parsed and imported along with User Agent parsing and IP Address geoLocation retrieval in a single execution. A single execution can also be configured to only load logs to Server.
14
+
### Console Process Messages - 4 LogFormats, 2 ErrorLogFormats & 6 MySQL Stored Procedures
New version has [MaxMind GeoIP2](https://github.com/maxmind/GeoIP2-python) Python API integration with 5 additional MySQL tables for IP geoLocation data. Two DB-IP Lite databases are required - `IP to City` and `IP to ASN`. Free DB-IP Lite databases can be found at [DB-IP](https://db-ip.com/db/lite.php)
21
17
22
-
The accompanying visualization tool for the MySQL Schema ***apache_logs*** is [MySQL2ApacheECharts](https://github.com/willthefarmer/mysql-to-apache-echarts)
23
-
created in a separate repository. The Web interface consists of Express.js web application frameworks with Drill Down Capability & Apache ECharts frameworks for Data Visualization.
18
+
A visualization tool for the MySQL Schema ***apache_logs*** is [MySQL2ApacheECharts](https://github.com/willthefarmer/mysql-to-apache-echarts) and currently under development. The Web interface consists of Express.js web application frameworks with Drill Down Capability & [Apache ECharts](https://github.com/apache/echarts) frameworks for Data Visualization.
24
19
25
-
New version with [MaxMind GeoIP2](https://github.com/maxmind/GeoIP2-python) Python API integration will be released end of January
26
-
with 5 additional tables for IP geolocation data. Tables are shown in updated diagram :point_down:
20
+
Database Schema ***apache_logs*** designed to accommodate unlimited servers & domains. Step-by-step guide for easy installation :point_down:
27
21
## Entity Relationship Diagram of apache_logs schema tables
Diagram created with open-source database diagrams editor [chartdb/chartdb](https://github.com/chartdb/chartdb)
30
-
## Application Description
24
+
## Application runs on Windows, Linux and MacOS
31
25
This is a fast, reliable processing application with detailed logging and two stages of data parsing.
32
26
First stage is performed in `LOAD DATA LOCAL INFILE` statements.
33
27
Second stage is performed in `process_access_parse` and `process_error_parse` Stored Procedures.
@@ -49,7 +43,7 @@ All folder paths, filename patterns, logging, processing, MySQL connection setti
49
43
50
44
Two Python Client modules can run in PM2 daemon process manager for 24/7 online processing on multiple web servers feeding a single Server module simultaneous.
51
45
52
-
Application is developed with Python 3.12, MySQL and 4 Python modules. Modules are listed with Python Package Index link,
46
+
Application is developed with Python 3.12, MySQL and 5 Python modules. Modules are listed with Python Package Index link,
53
47
install command for each platform & GitHub Repository link.
54
48
## Four Supported Access Log Formats
55
49
Apache uses same Standard Access LogFormats (***common***, ***combined***, ***vhost_combined***) on all 3 platforms. Each LogFormat adds 2 Format Strings to the prior.
@@ -144,7 +138,7 @@ To use this format place `ErrorLogFormat` before `ErrorLog` in `apache2.conf` to
144
138
|%v|The canonical ServerName of the server serving the request.|
145
139
|%L|Log ID of the request. A %L format string is also available in `mod_log_config` to allow to correlate access log entries with error log lines. If [mod_unique_id](https://httpd.apache.org/docs/current/mod/mod_unique_id.html) is loaded, its unique id will be used as log ID for requests.|
146
140
147
-
## Three options to attach ServerName & ServerPort to Access & Error logs
141
+
## Three options to associate ServerName & ServerPort to Access & Error logs
148
142
Apache LogFormats - ***common***, ***combined*** and Apache ErrorLogFormat - ***default*** do not contain `%v - canonical ServerName` and `%p - canonical ServerPort`.
149
143
150
144
In order to consolidate logs from multiple domains `%v - canonical ServerName` is required and `%p - canonical ServerPort` is optional.
@@ -187,6 +181,7 @@ command line under '2. Python Steps' below. If that works you are all set.
0 commit comments