You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/README.md
+42-22Lines changed: 42 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,27 +1,33 @@
1
1
# Apache Log Parser and Data Normalization Application
2
2
### Python handles File Processing & MySQL handles Data Processing
3
-
ApacheLogs2MySQL consists of two Python Modules & one MySQL Schema ***apache_logs*** to automate importing Access & Error files and normalizing data into database designed for reports & data analysis.
3
+
ApacheLogs2MySQL consists of two Python Modules & one MySQL Schema ***apache_logs*** to automate importing Access & Error files
4
+
and normalizing data into database designed for reports & data analysis.
4
5
5
6
Runs on Windows, Linux and MacOS & tested with MySQL versions 8.0.39, 8.4.3, 9.0.0 & 9.1.0.
6
7
7
-
Imports Access Logs in LogFormats - ***common***, ***combined*** and ***vhost_combined*** & additional ***csv2mysql*** LogFormat defined :point_down:
8
+
Imports Access Logs in LogFormats - ***common***, ***combined*** and ***vhost_combined*** & additional ***csv2mysql***
9
+
LogFormat defined :point_down:
8
10
9
-
Imports Error Logs in ***default*** ErrorLogFormat & ***additional*** ErrorLogFormat defined below performing data harmonization on Apache Codes & Messages,
10
-
System Codes & Messages, and Log Messages to create a unified, standardized dataset. Error Log view images :point_down:
11
+
Imports Error Logs in ***default*** ErrorLogFormat & ***additional*** ErrorLogFormat defined below performing data harmonization
12
+
on Apache Codes & Messages, System Codes & Messages, and Log Messages to create a unified, standardized dataset.
13
+
Error Log view images :point_down:
11
14
12
-
Three options to associate ServerName & ServerPort with Access and Error logs missing `%v - canonical ServerName` and `%p - canonical ServerPort` Format Strings described :point_down:
15
+
Three options to associate ServerName & ServerPort with Access and Error logs missing `%v - canonical ServerName`
16
+
and `%p - canonical ServerPort` Format Strings described :point_down:
13
17
14
18
4 LogFormats & 2 ErrorLogFormats can be loaded and 5 MySQL Stored Procedures can be processed in a single Python `ProcessLogs function` execution.
15
19
16
20
Database Schema ***apache_logs*** designed to accommodate unlimited servers & domains. Step-by-step guide for easy installation :point_down:
17
21
18
-
The accompanying visualization tool for the MySQL Schema ***apache_logs*** is [MySQL2ApacheECharts](https://github.com/willthefarmer/mysql-to-apache-echarts) created is a separate repository.
22
+
The accompanying visualization tool for the MySQL Schema ***apache_logs*** is [MySQL2ApacheECharts](https://github.com/willthefarmer/mysql-to-apache-echarts)
23
+
created is a separate repository.
19
24
The Web interface consists of Express.js web application frameworks with Drill Down Capability & Apache ECharts framework for Data Visualization.
20
25
## Entity Relationship Diagram of apache_logs schema tables
Diagram created with open-source database diagrams editor [chartdb/chartdb](https://github.com/chartdb/chartdb)
23
28
## Application Description
24
-
This is a fast, reliable processing application with detailed logging and two stages of data parsing. First stage is performed in `LOAD DATA LOCAL INFILE` statements.
29
+
This is a fast, reliable processing application with detailed logging and two stages of data parsing.
30
+
First stage is performed in `LOAD DATA LOCAL INFILE` statements.
25
31
Second stage is performed in `process_access_parse` and `process_error_parse` Stored Procedures.
26
32
27
33
Python handles polling of log file folders and executing MySQL Database LOAD DATA, Stored Procedures, Stored Functions and SQL Statements.
@@ -41,7 +47,8 @@ All folder paths, filename patterns, logging, processing, MySQL connection setti
41
47
42
48
Two Python Client modules can run in PM2 daemon process manager for 24/7 online processing on multiple web servers feeding a single Server module simultaneous.
43
49
44
-
Application is developed with Python 3.12, MySQL and 4 Python modules. Modules are listed with Python Package Index link, install command for each platform & GitHub Repository link.
50
+
Application is developed with Python 3.12, MySQL and 4 Python modules. Modules are listed with Python Package Index link,
51
+
install command for each platform & GitHub Repository link.
45
52
## Four Supported Access Log Formats
46
53
Apache uses same Standard Access LogFormats (***common***, ***combined***, ***vhost_combined***) on all 3 platforms. Each LogFormat adds 2 Format Strings to the prior.
47
54
Format String descriptions are listed below each LogFormat. Information from: https://httpd.apache.org/docs/2.4/mod/mod_log_config.html#logformat
@@ -142,17 +149,21 @@ In order to consolidate logs from multiple domains `%v - canonical ServerName` i
142
149
143
150
Listed are different methods to associate ServerName and ServerPort to all Access and Error logs.
144
151
145
-
1) Set `ERRORLOG_SERVERNAME`, `ERRORLOG_SERVERPORT`, `COMBINED_SERVERNAME`, `COMBINED_SERVERPORT` variables in .env file and uncomment `os.getenv` lines at top of `logs2mysql.py`.
146
-
By default, variables are defined and set to an empty string.
147
-
Below is screenshot of `logs2mysql.py` with commented `os.getenv` code. `server_name` and `server_port` COLUMNS of `load_error_default` and `load_access_combined` TABLES will be SET during Python `LOAD DATA LOCAL INFILE` execution.
152
+
1) Set `ERRORLOG_SERVERNAME`, `ERRORLOG_SERVERPORT`, `COMBINED_SERVERNAME`, `COMBINED_SERVERPORT` variables in .env file and uncomment `os.getenv`
153
+
lines at top of `logs2mysql.py`. By default, variables are defined and set to an empty string.
154
+
Below is screenshot of `logs2mysql.py` with commented `os.getenv` code. `server_name` and `server_port` COLUMNS of `load_error_default` and `load_access_combined`
155
+
TABLES will be SET during Python `LOAD DATA LOCAL INFILE` execution.
2) Manually ***UPDATE***`server_name` and `server_port` COLUMNS of `load_error_default` and `load_access_combined` TABLES after STORED PROCEDURES `process_access_parse` and `process_error_parse` and before `process_access_import` and `process_error_import`.
152
-
If `%v` or `%p` Format Strings exist parsing into `server_name` and `server_port` COLUMNS is performed in parse processes. Data Normalization is performed in import processes.
159
+
2) Manually ***UPDATE***`server_name` and `server_port` COLUMNS of `load_error_default` and `load_access_combined` TABLES after STORED PROCEDURES `process_access_parse`
160
+
and `process_error_parse` and before `process_access_import` and `process_error_import`.
161
+
If `%v` or `%p` Format Strings exist parsing into `server_name` and `server_port` COLUMNS is performed in parse processes.
162
+
Data Normalization is performed in import processes.
153
163
154
164
3) Populate `server_name` and `server_port` COLUMNS in `import_file` TABLE before import processes. This will populate all records associated with file.
155
-
This option only updates records with NULL values in ***load_tables***`server_name` and `server_port` COLUMNS while executing STORED PROCEDURES `process_access_import` and `process_error_import`.
165
+
This option only updates records with NULL values in ***load_tables***`server_name` and `server_port` COLUMNS while executing
166
+
STORED PROCEDURES `process_access_import` and `process_error_import`.
156
167
157
168
UPDATE commands to populate both Access and Error Logs if ***"Log File Names"*** are related to VirtualHost similar to:
158
169
```
@@ -166,7 +177,8 @@ UPDATE apache_logs.import_file SET server_name='farmwork.app', server_port=443 W
166
177
UPDATE apache_logs.import_file SET server_name='ip255-255-255-255.us-east.com', server_port=443 WHERE server_name IS NULL AND name LIKE '%error%';
167
178
```
168
179
## Required Python Modules
169
-
Python module links & install command lines for each platform. Single quotes around module name are required on macOS. The simplest installation option is run the command line under '2. Python Steps' below. If that works you are all set.
180
+
Python module links & install command lines for each platform. Single quotes around module name are required on macOS. The simplest installation option is run the
181
+
command line under '2. Python Steps' below. If that works you are all set.
170
182
|Python Package|Windows 10 & 11|Ubuntu 24.04|macOS 15.0.1 Darwin 24.0.0|GitHub Repository|
@@ -178,7 +190,8 @@ Python module links & install command lines for each platform. Single quotes aro
178
190
Steps make installation quick and straightforward. Application will be ready to import Apache logs on completion.
179
191
180
192
### 1. MySQL Steps
181
-
Before running `apache_logs_schema.sql` if User Account `root`@`localhost` does not exist on installation server open file and perform a ***Find and Replace*** using a User Account with DBA Role on installation server. Copy below:
193
+
Before running `apache_logs_schema.sql` if User Account `root`@`localhost` does not exist on installation server open
194
+
file and perform a ***Find and Replace*** using a User Account with DBA Role on installation server. Copy below:
If issues with ***pip install*** occur use individual install commands included above.
209
222
210
223
### 3. Create MySQL USER and GRANTS
211
-
To minimize data exposure and breach risks create a MySQL USER for Python module with GRANTS to only schema objects and privileges required to execute import processes. (`mysql_user_and_grants.sql` in repository)
224
+
To minimize data exposure and breach risks create a MySQL USER for Python module with GRANTS to only schema objects and privileges
225
+
required to execute import processes. (`mysql_user_and_grants.sql` in repository)
212
226

213
227
### 4. Settings.env Variables
214
228
settings.env with default settings for Windows. Make sure correct logFormats are in correct logFormat folders. Application does not
@@ -220,9 +234,11 @@ By default, load_dotenv() looks for standard setting file name `.env`. The file
220
234
load_dotenv() # Loads variables from .env into the environment
221
235
```
222
236
### 6. Run Application
223
-
If MySQL steps are complete, Python modules are installed, MySQL server connection and log folder variables are updated, and file `settings.env` is renamed to `.env` application is ready to go.
237
+
If MySQL steps are complete, Python modules are installed, MySQL server connection and log folder variables are updated,
238
+
and file `settings.env` is renamed to `.env` application is ready to go.
224
239
225
-
If log files exist in folders run `logs2mysql.py` and all files in all folders will be processed. Run `watch4logs.py` and drop a file or files into folder and `logs2mysql.py` will be executed.
240
+
If log files exist in folders run `logs2mysql.py` and all files in all folders will be processed. Run `watch4logs.py` and
241
+
drop a file or files into folder and `logs2mysql.py` will be executed.
226
242
If folders are empty or contain files when a file is drop into folder any unprocessed files in folders will be processed.
227
243
228
244
Run import process directly:
@@ -249,7 +265,8 @@ Set environment variables `ERROR_PROCESS`,`COMBINED_PROCESS`, `VHOST_PROCESS`, `
249
265
ONLY files & records processed by current `processLogs function` execution.
250
266
251
267
MySQL Stored Procedures can be run from Command Line Client or GUI Database Tool separately.
252
-
Execute Stored Procedures with second parameter 'ALL' processes files & records based on `process_status` value. Files & records can contain multiple `importloadid` values.
268
+
Execute Stored Procedures with second parameter 'ALL' processes files & records based on `process_status` value. Files & records
269
+
can contain multiple `importloadid` values.
253
270
```
254
271
COLUMN process_status in LOAD DATA tables - load_access_combined, load_access_csv2mysql, load_access_vhost, load_error_default
255
272
process_status=0 - LOAD DATA tables loaded with raw log data
@@ -275,7 +292,9 @@ Normalization ensures that data is organized in a way that makes sense for the d
275
292
MySQL `apache_logs` schema currently has 49 Tables, 853 Columns, 168 Indexes, 66 Views, 7 Stored Procedures and 43 Functions to process Apache Access log in 4 formats
276
293
& Apache Error log in 2 formats. Database normalization at work!
277
294
## MySQL Access Log View by Browser - 1 of 66 schema views
278
-
Current schema views are Access and Error Attribute Primary tables created in normalization process with simple aggregate values. These are primitive data presentations of the log data warehouse. ApacheLogs2MySQL is the 'EL' of the 'ELK' Stack. The Web interface with Drill Down Capability and [apache/echarts](https://github.com/apache/echarts) Log Visualization integration in development is the 'K' of the 'ELK' Stack.
295
+
Current schema views are Access and Error Attribute Primary tables created in normalization process with simple aggregate values.
296
+
These are primitive data presentations of the log data warehouse. ApacheLogs2MySQL is the 'EL' of the 'ELK' Stack. The Web interface with
297
+
Drill Down Capability and [apache/echarts](https://github.com/apache/echarts) Log Visualization integration in development is the 'K' of the 'ELK' Stack.
279
298
280
299
MySQL View - apache_logs.access_ua_browser_family_list - data from LogFormat: combined & csv2mysql
## MySQL Schema Objects - Tables, Stored Procedures, Functions and Views
299
-
Images of the `apache_logs` schema objects. Access and Error log attributes are normalized into separate entity tables. Each table is populated with unique values of the attribute.
318
+
Images of the `apache_logs` schema objects. Access and Error log attributes are normalized into separate entity tables.
319
+
Each table is populated with unique values of the attribute.
300
320
301
321
Database normalization is a critical process in database design with objectives of optimizing data storage, improving data integrity, and reducing data anomalies.
302
322
Organizing data into normalized tables greatly enhances efficiency and maintainability of a database system.
0 commit comments