Skip to content

Commit e814ff6

Browse files
authored
Merge pull request #17 from aai-institute/feature/god-function-anti-pattern
Add a monolithic/god function anti-pattern
2 parents 2bbd309 + f92dbfd commit e814ff6

File tree

3 files changed

+171
-1
lines changed

3 files changed

+171
-1
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Unreleased
22

33
* add devcontainer setup
4+
* add section *Single Level of Abstraction principle* to principles guide
5+
* add monolithic/god function anti-pattern
46

57
# v0.1.0 (2024-03-19)
68

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# One monolithic function
2+
3+
Meet the One Monolithic Function, also known as the Swiss Army Knife Function or the God Function—the Jack-of-all-trades and master of, well, none.
4+
These functions usually come with deceptively simple and generic names like *run*, *train*, or *main*. It may be the result of converting a Jupyter notebook, similar to what we’ve done in our [refactoring journey](../../refactoring-journey/step01-python-script/run_classifier_evaluation.py).
5+
6+
This function is like that one person who insists on doing everything themselves—from cooking dinner to fixing the plumbing—except in the coding world. But just like our multitasking friend who forgets to turn off the stove while unclogging the sink, this approach can quickly become a recipe for disaster.
7+
8+
By trying to do everything in one place, this monolithic function ends up being an unmaintainable tangle of responsibilities. It mixes high-level decisions like "What file format am I dealing with?" with low-level tasks like "Let's calculate the mean to fill in missing values," all in the same breath. It's a classic case of not knowing when to delegate, resulting in code that’s harder to read, harder to debug, and way harder to extend.
9+
10+
Take a look at the following code and try to understand what it does without reading it line by line:
11+
12+
```python
13+
import pandas as pd
14+
import numpy as np
15+
import json
16+
import logging
17+
from sklearn.model_selection import train_test_split
18+
from sklearn.tree import DecisionTreeClassifier
19+
from sklearn.metrics import accuracy_score
20+
21+
def main(file_path: str) -> float:
22+
23+
if file_path.endswith('.csv'):
24+
data = pd.read_csv(file_path)
25+
elif file_path.endswith('.json'):
26+
with open(file_path, 'r') as file:
27+
data_dict = json.load(file)
28+
data = pd.DataFrame(data_dict)
29+
else:
30+
raise ValueError("Unsupported file format!")
31+
32+
logging.info(f"Data loaded from {file_path} with {len(data)} rows and {len(data.columns)} columns.")
33+
34+
if 'target' not in data.columns:
35+
raise ValueError("Target column is missing in the dataset!")
36+
37+
for column in data.columns:
38+
if data[column].isnull().sum() > 0:
39+
mean_value = data[column].mean()
40+
data[column].fillna(mean_value, inplace=True)
41+
42+
for column in data.select_dtypes(include=[np.number]).columns:
43+
max_value = data[column].max()
44+
min_value = data[column].min()
45+
data[column] = (data[column] - min_value) / (max_value - min_value)
46+
47+
data['feature_interaction'] = data['feature1'] * data['feature2'] * np.log1p(data['feature3'])
48+
49+
X = data.drop('target', axis=1)
50+
y = data['target']
51+
52+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
53+
54+
model = DecisionTreeClassifier()
55+
model.fit(X_train, y_train)
56+
57+
y_pred = model.predict(X_test)
58+
59+
accuracy = accuracy_score(y_test, y_pred)
60+
return accuracy
61+
```
62+
Sure, if you painstakingly read it line by line, you’ll eventually arrive at this thrilling revelation about what the function does:
63+
64+
1. Loads the data by handling different file formats.
65+
2. Cleans the data by filling in missing values.
66+
3. Normalizes the data through scaling.
67+
4. Performs feature engineering by creating interaction terms.
68+
5. Trains a machine learning model.
69+
6. Evaluates model performance.
70+
71+
While this function does manage to accomplish several tasks, it suffers from poor readability. The mixture of different responsibilities within a single function makes it difficult to follow what’s happening at a glance.
72+
73+
Additionally, the function is not easily testable and hard to modify or extend. Because it handles everything from data loading to model evaluation, testing individual parts of the process in isolation is nearly impossible.
74+
Any changes to one part of the process could potentially impact the others, making the function fragile and prone to errors when updates are needed.
75+
76+
A first entry point for refactoring this function could be to apply the [Single Level of Abstraction Principle (SLAP)](../../oop-essentials/03-general-principles/README.md/#slap-single-level-of-abstraction-principle). By ensuring that each function operates at a single level of abstraction, you can begin to separate the high-level orchestration from the low-level details. The result could look like this:
77+
78+
```python
79+
import logging
80+
import json
81+
import pandas as pd
82+
import numpy as np
83+
from sklearn.model_selection import train_test_split
84+
from sklearn.tree import DecisionTreeClassifier
85+
from sklearn.metrics import accuracy_score
86+
87+
88+
def main(path: str) -> float:
89+
data = load_data(path)
90+
data = fill_missing_values(data)
91+
data = normalize_features(data)
92+
data = engineer_features(data)
93+
X_train, X_test, y_train, y_test = split_data(data)
94+
model = train_model(X_train, y_train)
95+
accuracy = evaluate_model(model, X_test, y_test)
96+
return accuracy
97+
98+
99+
def load_data(file_path: str) -> pd.DataFrame:
100+
if file_path.endswith('.csv'):
101+
data = pd.read_csv(file_path)
102+
elif file_path.endswith('.json'):
103+
with open(file_path, 'r') as file:
104+
data_dict = json.load(file)
105+
data = pd.DataFrame(data_dict)
106+
else:
107+
raise ValueError("Unsupported file format!")
108+
logging.info(f"Data loaded from {file_path} with {len(data)} rows and {len(data.columns)} columns.")
109+
return data
110+
111+
112+
def fill_missing_values(data: pd.DataFrame) -> pd.DataFrame:
113+
for column in data.columns:
114+
if data[column].isnull().sum() > 0:
115+
mean_value = data[column].mean()
116+
data[column].fillna(mean_value, inplace=True)
117+
return data
118+
119+
120+
def normalize_features(data: pd.DataFrame) -> pd.DataFrame:
121+
for column in data.select_dtypes(include=[np.number]).columns:
122+
max_value = data[column].max()
123+
min_value = data[column].min()
124+
data[column] = (data[column] - min_value) / (max_value - min_value)
125+
return data
126+
127+
128+
def engineer_features(data: pd.DataFrame) -> pd.DataFrame:
129+
data['feature_interaction'] = data['feature1'] * data[
130+
'feature2'] * np.log1p(data['feature3'])
131+
return data
132+
133+
134+
def split_data(data: pd.DataFrame) -> tuple[pd.DataFrame, ...]:
135+
y = data['target']
136+
X = data.drop('target', axis=1)
137+
return train_test_split(X, y, test_size=0.2, random_state=42)
138+
139+
140+
def train_model(X_train: pd.DataFrame, y_train: pd.DataFrame) -> DecisionTreeClassifier:
141+
model = DecisionTreeClassifier()
142+
model.fit(X_train, y_train)
143+
return model
144+
145+
146+
def evaluate_model(model: DecisionTreeClassifier, X_test: pd.DataFrame, y_test: pd.DataFrame) -> float:
147+
y_pred = model.predict(X_test)
148+
accuracy = accuracy_score(y_test, y_pred)
149+
return accuracy
150+
151+
```
152+
By simply extracting low-level functions in a way that the low-level functions have an isolated task and calling
153+
them in the high-level function *main*, we already gained:
154+
155+
1. Improved Readability: The main function now reads more like a summary of the overall process, with each low-level function clearly named to describe its specific task. This makes it easier for developers to understand the code at a glance.
156+
157+
2. Enhanced Testability: Isolated functions are easier to unit test.
158+
You can test each low-level function individually to ensure it performs its task correctly, leading to more reliable and robust code.
159+
160+
3. Increased Reusability: Low-level functions that perform specific tasks can often be reused in different parts of the codebase or in future projects, reducing the need to write redundant code.
161+
162+
Nevertheless, this should only be considered a first step toward improving the code. While extracting low-level functions helps provide more clarity about what’s happening, the code still lacks a coherent software design and remains fragile and inflexible. To achieve a truly robust and maintainable solution, further refactoring is necessary.
163+
164+
We highly encourage you to follow our [refactoring journey](../../refactoring-journey/README.md) to explore a more structured and well-designed approach. This will not only enhance the code’s flexibility but also ensure it’s better suited to handle future changes and extensions.

oop-essentials/03-general-principles/README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,10 +86,14 @@ If possible, the design should prefer idiomatic language constructs over exotic,
8686

8787
# SLAP (Single Level of Abstraction Principle)
8888

89-
The **Single Level of Abstraction Principle** (SLAP) states that a function should operate at a single level of abstraction. Specifically, a function should either perform a high-level operation by calling other functions or handle low-level operations directly (e.g., loops, conditionals, or simple calculations), but not mix these levels within the same function.
89+
The **Single Level of Abstraction Principle** (SLAP) states that a function should operate at a single level of abstraction. Specifically, a function should either perform a high-level operation by orchestrating and calling other functions, or handle low-level operations directly, but not mix these levels within the same function. Low-level functions should have a single, clear purpose, such as performing a specific calculation, iterating over a collection, or executing a conditional check.
9090

9191
Adhering to SLAP ensures that functions are cohesive and focused, making them easier to understand and maintain. When a function strictly adheres to SLAP, it either orchestrates higher-level processes by delegating tasks to other functions or directly handles low-level details. This separation allows developers to comprehend the function's purpose quickly, without being distracted by unrelated details, which can increase the readability a lot.
9292

93+
Applying SLAP can often be the first step
94+
in refactoring a [complicated code snippet](../../anti-patterns/monolithic-function/README.md).
95+
96+
9397
# Extreme Programming (XP)
9498

9599
YAGNI is one of the core principles of extreme programming (aka XP), which addresses not only principles pertaining to code but also collaboration.

0 commit comments

Comments
 (0)