<!-- TITANIC.md -->

# Core Machine Learning Concepts in the Titanic Practical

This document explains the primary machine learning concepts demonstrated in the Titanic practical notebook. The practical covers end-to-end steps from data preparation to model evaluation.

## 1. Data Cleaning and Imputation
- **Handling Missing Values:**  
  - Filling missing values in the **Embarked** column with its mode.
  - Replacing missing values in the **Age** column with its median.
- **Dropping Columns:**  
  - Removing columns (e.g., **Cabin**) that have excessive missing data.

## 2. Feature Engineering
- **Creating New Features:**  
  - **FamilySize:** Calculated by summing `SibSp` and `Parch` and adding 1.
  - **IsAlone:** A binary feature indicating if a passenger is alone (i.e., when **FamilySize** equals 1).

## 3. Encoding Categorical Variables
- **Label Encoding:**  
  - Converting categorical text data (such as **Sex**) into numerical values.
- **One-Hot Encoding:**  
  - Transforming the **Embarked** column into dummy variables for numerical representation.

## 4. Feature Scaling
- **Standardization:**  
  - Utilizing `StandardScaler` to scale numerical features (e.g., **Age** and **Fare**) so that all features contribute equally during training.

## 5. Train/Test Split
- **Data Partitioning:**  
  - Splitting the dataset into training and testing sets with a fixed random state. This helps evaluate the model on unseen data and prevents overfitting.

## 6. Model Training and Evaluation
- **Logistic Regression:**  
  - Training a logistic regression model on the processed data for binary classification (predicting survival).
- **Evaluation Techniques:**  
  - Assessing model performance using metrics such as accuracy, precision, recall, and F1 score.
  - Visual tools like the confusion matrix and ROC curve provide further insight into model performance.

This practical notebook serves as a comprehensive example of applying these core machine learning concepts to a real-world dataset to build, train, and evaluate a predictive model.