# Core Machine Learning Concepts in the Titanic Practical This document explains the primary machine learning concepts demonstrated in the Titanic practical notebook. The practical covers end-to-end steps from data preparation to model evaluation. ## 1. Data Cleaning and Imputation - **Handling Missing Values:** - Filling missing values in the **Embarked** column with its mode. - Replacing missing values in the **Age** column with its median. - **Dropping Columns:** - Removing columns (e.g., **Cabin**) that have excessive missing data. ## 2. Feature Engineering - **Creating New Features:** - **FamilySize:** Calculated by summing `SibSp` and `Parch` and adding 1. - **IsAlone:** A binary feature indicating if a passenger is alone (i.e., when **FamilySize** equals 1). ## 3. Encoding Categorical Variables - **Label Encoding:** - Converting categorical text data (such as **Sex**) into numerical values. - **One-Hot Encoding:** - Transforming the **Embarked** column into dummy variables for numerical representation. ## 4. Feature Scaling - **Standardization:** - Utilizing `StandardScaler` to scale numerical features (e.g., **Age** and **Fare**) so that all features contribute equally during training. ## 5. Train/Test Split - **Data Partitioning:** - Splitting the dataset into training and testing sets with a fixed random state. This helps evaluate the model on unseen data and prevents overfitting. ## 6. Model Training and Evaluation - **Logistic Regression:** - Training a logistic regression model on the processed data for binary classification (predicting survival). - **Evaluation Techniques:** - Assessing model performance using metrics such as accuracy, precision, recall, and F1 score. - Visual tools like the confusion matrix and ROC curve provide further insight into model performance. This practical notebook serves as a comprehensive example of applying these core machine learning concepts to a real-world dataset to build, train, and evaluate a predictive model.