Skip to content

Commit 8d69d01

Browse files
committed
Update README.md
1 parent 18f00f0 commit 8d69d01

File tree

1 file changed

+42
-22
lines changed

1 file changed

+42
-22
lines changed

.github/README.md

Lines changed: 42 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,33 @@
11
# Apache Log Parser and Data Normalization Application
22
### Python handles File Processing & MySQL handles Data Processing
3-
ApacheLogs2MySQL consists of two Python Modules & one MySQL Schema ***apache_logs*** to automate importing Access & Error files and normalizing data into database designed for reports & data analysis.
3+
ApacheLogs2MySQL consists of two Python Modules & one MySQL Schema ***apache_logs*** to automate importing Access & Error files
4+
and normalizing data into database designed for reports & data analysis.
45

56
Runs on Windows, Linux and MacOS & tested with MySQL versions 8.0.39, 8.4.3, 9.0.0 & 9.1.0.
67

7-
Imports Access Logs in LogFormats - ***common***, ***combined*** and ***vhost_combined*** & additional ***csv2mysql*** LogFormat defined :point_down:
8+
Imports Access Logs in LogFormats - ***common***, ***combined*** and ***vhost_combined*** & additional ***csv2mysql***
9+
LogFormat defined :point_down:
810

9-
Imports Error Logs in ***default*** ErrorLogFormat & ***additional*** ErrorLogFormat defined below performing data harmonization on Apache Codes & Messages,
10-
System Codes & Messages, and Log Messages to create a unified, standardized dataset. Error Log view images :point_down:
11+
Imports Error Logs in ***default*** ErrorLogFormat & ***additional*** ErrorLogFormat defined below performing data harmonization
12+
on Apache Codes & Messages, System Codes & Messages, and Log Messages to create a unified, standardized dataset.
13+
Error Log view images :point_down:
1114

12-
Three options to associate ServerName & ServerPort with Access and Error logs missing `%v - canonical ServerName` and `%p - canonical ServerPort` Format Strings described :point_down:
15+
Three options to associate ServerName & ServerPort with Access and Error logs missing `%v - canonical ServerName`
16+
and `%p - canonical ServerPort` Format Strings described :point_down:
1317

1418
4 LogFormats & 2 ErrorLogFormats can be loaded and 5 MySQL Stored Procedures can be processed in a single Python `ProcessLogs function` execution.
1519

1620
Database Schema ***apache_logs*** designed to accommodate unlimited servers & domains. Step-by-step guide for easy installation :point_down:
1721

18-
The accompanying visualization tool for the MySQL Schema ***apache_logs*** is [MySQL2ApacheECharts](https://github.com/willthefarmer/mysql-to-apache-echarts) created is a separate repository.
22+
The accompanying visualization tool for the MySQL Schema ***apache_logs*** is [MySQL2ApacheECharts](https://github.com/willthefarmer/mysql-to-apache-echarts)
23+
created is a separate repository.
1924
The Web interface consists of Express.js web application frameworks with Drill Down Capability & Apache ECharts framework for Data Visualization.
2025
## Entity Relationship Diagram of apache_logs schema tables
2126
![Entity Relationship Diagram](./assets/entity_relationship_diagram.png)
2227
Diagram created with open-source database diagrams editor [chartdb/chartdb](https://github.com/chartdb/chartdb)
2328
## Application Description
24-
This is a fast, reliable processing application with detailed logging and two stages of data parsing. First stage is performed in `LOAD DATA LOCAL INFILE` statements.
29+
This is a fast, reliable processing application with detailed logging and two stages of data parsing.
30+
First stage is performed in `LOAD DATA LOCAL INFILE` statements.
2531
Second stage is performed in `process_access_parse` and `process_error_parse` Stored Procedures.
2632

2733
Python handles polling of log file folders and executing MySQL Database LOAD DATA, Stored Procedures, Stored Functions and SQL Statements.
@@ -41,7 +47,8 @@ All folder paths, filename patterns, logging, processing, MySQL connection setti
4147

4248
Two Python Client modules can run in PM2 daemon process manager for 24/7 online processing on multiple web servers feeding a single Server module simultaneous.
4349

44-
Application is developed with Python 3.12, MySQL and 4 Python modules. Modules are listed with Python Package Index link, install command for each platform & GitHub Repository link.
50+
Application is developed with Python 3.12, MySQL and 4 Python modules. Modules are listed with Python Package Index link,
51+
install command for each platform & GitHub Repository link.
4552
## Four Supported Access Log Formats
4653
Apache uses same Standard Access LogFormats (***common***, ***combined***, ***vhost_combined***) on all 3 platforms. Each LogFormat adds 2 Format Strings to the prior.
4754
Format String descriptions are listed below each LogFormat. Information from: https://httpd.apache.org/docs/2.4/mod/mod_log_config.html#logformat
@@ -142,17 +149,21 @@ In order to consolidate logs from multiple domains `%v - canonical ServerName` i
142149

143150
Listed are different methods to associate ServerName and ServerPort to all Access and Error logs.
144151

145-
1) Set `ERRORLOG_SERVERNAME`, `ERRORLOG_SERVERPORT`, `COMBINED_SERVERNAME`, `COMBINED_SERVERPORT` variables in .env file and uncomment `os.getenv` lines at top of `logs2mysql.py`.
146-
By default, variables are defined and set to an empty string.
147-
Below is screenshot of `logs2mysql.py` with commented `os.getenv` code. `server_name` and `server_port` COLUMNS of `load_error_default` and `load_access_combined` TABLES will be SET during Python `LOAD DATA LOCAL INFILE` execution.
152+
1) Set `ERRORLOG_SERVERNAME`, `ERRORLOG_SERVERPORT`, `COMBINED_SERVERNAME`, `COMBINED_SERVERPORT` variables in .env file and uncomment `os.getenv`
153+
lines at top of `logs2mysql.py`. By default, variables are defined and set to an empty string.
154+
Below is screenshot of `logs2mysql.py` with commented `os.getenv` code. `server_name` and `server_port` COLUMNS of `load_error_default` and `load_access_combined`
155+
TABLES will be SET during Python `LOAD DATA LOCAL INFILE` execution.
148156

149157
![load_settings_variables.png](./assets/load_settings_variables.png)
150158

151-
2) Manually ***UPDATE*** `server_name` and `server_port` COLUMNS of `load_error_default` and `load_access_combined` TABLES after STORED PROCEDURES `process_access_parse` and `process_error_parse` and before `process_access_import` and `process_error_import`.
152-
If `%v` or `%p` Format Strings exist parsing into `server_name` and `server_port` COLUMNS is performed in parse processes. Data Normalization is performed in import processes.
159+
2) Manually ***UPDATE*** `server_name` and `server_port` COLUMNS of `load_error_default` and `load_access_combined` TABLES after STORED PROCEDURES `process_access_parse`
160+
and `process_error_parse` and before `process_access_import` and `process_error_import`.
161+
If `%v` or `%p` Format Strings exist parsing into `server_name` and `server_port` COLUMNS is performed in parse processes.
162+
Data Normalization is performed in import processes.
153163

154164
3) Populate `server_name` and `server_port` COLUMNS in `import_file` TABLE before import processes. This will populate all records associated with file.
155-
This option only updates records with NULL values in ***load_tables*** `server_name` and `server_port` COLUMNS while executing STORED PROCEDURES `process_access_import` and `process_error_import`.
165+
This option only updates records with NULL values in ***load_tables*** `server_name` and `server_port` COLUMNS while executing
166+
STORED PROCEDURES `process_access_import` and `process_error_import`.
156167

157168
UPDATE commands to populate both Access and Error Logs if ***"Log File Names"*** are related to VirtualHost similar to:
158169
```
@@ -166,7 +177,8 @@ UPDATE apache_logs.import_file SET server_name='farmwork.app', server_port=443 W
166177
UPDATE apache_logs.import_file SET server_name='ip255-255-255-255.us-east.com', server_port=443 WHERE server_name IS NULL AND name LIKE '%error%';
167178
```
168179
## Required Python Modules
169-
Python module links & install command lines for each platform. Single quotes around module name are required on macOS. The simplest installation option is run the command line under '2. Python Steps' below. If that works you are all set.
180+
Python module links & install command lines for each platform. Single quotes around module name are required on macOS. The simplest installation option is run the
181+
command line under '2. Python Steps' below. If that works you are all set.
170182
|Python Package|Windows 10 & 11|Ubuntu 24.04|macOS 15.0.1 Darwin 24.0.0|GitHub Repository|
171183
|--------------|---------------|------------|--------------------------|-----------------|
172184
|[PyMySQL](https://pypi.org/project/PyMySQL/)|python -m pip install PyMySQL[rsa]|sudo apt-get install python3-pymysql|python3 -m pip install 'PyMySQL[rsa]'|[PyMySQL/PyMySQL](https://github.com/PyMySQL/PyMySQL)|
@@ -178,7 +190,8 @@ Python module links & install command lines for each platform. Single quotes aro
178190
Steps make installation quick and straightforward. Application will be ready to import Apache logs on completion.
179191

180192
### 1. MySQL Steps
181-
Before running `apache_logs_schema.sql` if User Account `root`@`localhost` does not exist on installation server open file and perform a ***Find and Replace*** using a User Account with DBA Role on installation server. Copy below:
193+
Before running `apache_logs_schema.sql` if User Account `root`@`localhost` does not exist on installation server open
194+
file and perform a ***Find and Replace*** using a User Account with DBA Role on installation server. Copy below:
182195
```
183196
root`@`localhost`
184197
```
@@ -208,7 +221,8 @@ python3 -m ensurepip --upgrade
208221
If issues with ***pip install*** occur use individual install commands included above.
209222

210223
### 3. Create MySQL USER and GRANTS
211-
To minimize data exposure and breach risks create a MySQL USER for Python module with GRANTS to only schema objects and privileges required to execute import processes. (`mysql_user_and_grants.sql` in repository)
224+
To minimize data exposure and breach risks create a MySQL USER for Python module with GRANTS to only schema objects and privileges
225+
required to execute import processes. (`mysql_user_and_grants.sql` in repository)
212226
![mysql_user_and_grants.sql in repository](./assets/mysql_user_and_grants.png)
213227
### 4. Settings.env Variables
214228
settings.env with default settings for Windows. Make sure correct logFormats are in correct logFormat folders. Application does not
@@ -220,9 +234,11 @@ By default, load_dotenv() looks for standard setting file name `.env`. The file
220234
load_dotenv() # Loads variables from .env into the environment
221235
```
222236
### 6. Run Application
223-
If MySQL steps are complete, Python modules are installed, MySQL server connection and log folder variables are updated, and file `settings.env` is renamed to `.env` application is ready to go.
237+
If MySQL steps are complete, Python modules are installed, MySQL server connection and log folder variables are updated,
238+
and file `settings.env` is renamed to `.env` application is ready to go.
224239

225-
If log files exist in folders run `logs2mysql.py` and all files in all folders will be processed. Run `watch4logs.py` and drop a file or files into folder and `logs2mysql.py` will be executed.
240+
If log files exist in folders run `logs2mysql.py` and all files in all folders will be processed. Run `watch4logs.py` and
241+
drop a file or files into folder and `logs2mysql.py` will be executed.
226242
If folders are empty or contain files when a file is drop into folder any unprocessed files in folders will be processed.
227243

228244
Run import process directly:
@@ -249,7 +265,8 @@ Set environment variables `ERROR_PROCESS`,`COMBINED_PROCESS`, `VHOST_PROCESS`, `
249265
ONLY files & records processed by current `processLogs function` execution.
250266

251267
MySQL Stored Procedures can be run from Command Line Client or GUI Database Tool separately.
252-
Execute Stored Procedures with second parameter 'ALL' processes files & records based on `process_status` value. Files & records can contain multiple `importloadid` values.
268+
Execute Stored Procedures with second parameter 'ALL' processes files & records based on `process_status` value. Files & records
269+
can contain multiple `importloadid` values.
253270
```
254271
COLUMN process_status in LOAD DATA tables - load_access_combined, load_access_csv2mysql, load_access_vhost, load_error_default
255272
process_status=0 - LOAD DATA tables loaded with raw log data
@@ -275,7 +292,9 @@ Normalization ensures that data is organized in a way that makes sense for the d
275292
MySQL `apache_logs` schema currently has 49 Tables, 853 Columns, 168 Indexes, 66 Views, 7 Stored Procedures and 43 Functions to process Apache Access log in 4 formats
276293
& Apache Error log in 2 formats. Database normalization at work!
277294
## MySQL Access Log View by Browser - 1 of 66 schema views
278-
Current schema views are Access and Error Attribute Primary tables created in normalization process with simple aggregate values. These are primitive data presentations of the log data warehouse. ApacheLogs2MySQL is the 'EL' of the 'ELK' Stack. The Web interface with Drill Down Capability and [apache/echarts](https://github.com/apache/echarts) Log Visualization integration in development is the 'K' of the 'ELK' Stack.
295+
Current schema views are Access and Error Attribute Primary tables created in normalization process with simple aggregate values.
296+
These are primitive data presentations of the log data warehouse. ApacheLogs2MySQL is the 'EL' of the 'ELK' Stack. The Web interface with
297+
Drill Down Capability and [apache/echarts](https://github.com/apache/echarts) Log Visualization integration in development is the 'K' of the 'ELK' Stack.
279298

280299
MySQL View - apache_logs.access_ua_browser_family_list - data from LogFormat: combined & csv2mysql
281300
![view-access_ua_browser_family_list.png](./assets/access_ua_browser_list.png)
@@ -296,7 +315,8 @@ Each attribute has an associated table in ***apache_logs*** schema. Using these
296315
![error_log_level_list](./assets/error_log_level_list.png)
297316

298317
## MySQL Schema Objects - Tables, Stored Procedures, Functions and Views
299-
Images of the `apache_logs` schema objects. Access and Error log attributes are normalized into separate entity tables. Each table is populated with unique values of the attribute.
318+
Images of the `apache_logs` schema objects. Access and Error log attributes are normalized into separate entity tables.
319+
Each table is populated with unique values of the attribute.
300320

301321
Database normalization is a critical process in database design with objectives of optimizing data storage, improving data integrity, and reducing data anomalies.
302322
Organizing data into normalized tables greatly enhances efficiency and maintainability of a database system.

0 commit comments

Comments
 (0)