GitHub - SBanditaDas/Lung-Cancer-Survival-Prediction-Using-ML: Predicting lung cancer survival outcomes using clinical and lifestyle data. Includes preprocessing, feature engineering, and Random Forest classification with ~95% test accuracy. Built for healthcare risk stratification and early prognosis

Lung Cancer Survival Prediction

Overview

This project builds a machine learning system to predict whether a patient diagnosed with lung cancer is likely to survive, based on clinical and lifestyle data. It leverages a comprehensive dataset of patient records and applies preprocessing, feature engineering, and classification modeling to deliver accurate survival predictions.

Dataset Description

The dataset is sourced from /kaggle/input/lung-cancer-dataset/dataset_med.csv and includes detailed patient information.

Key columns:

id: Unique patient identifier
age, gender, country: Demographics
diagnosis_date, end_treatment_date: Cancer timeline
cancer_stage: Stage I–IV
family_history, smoking_status: Lifestyle and genetic risk
bmi, cholesterol_level: Clinical metrics
hypertension, asthma, cirrhosis, other_cancer: Comorbidities
treatment_type: Surgery, radiation, chemotherapy, or combined
survived: Target label (yes/no)

Workflow Summary :

1. Data Loading

df = pd.read_csv('/kaggle/input/lung-cancer-dataset/dataset_med.csv')

2. Preprocessing

# Convert dates and calculate treatment duration
df['diagnosis_date'] = pd.to_datetime(df['diagnosis_date'])
df['end_treatment_date'] = pd.to_datetime(df['end_treatment_date'])
df['treatment_duration'] = (df['end_treatment_date'] - df['diagnosis_date']).dt.days

# Encode categorical features
df_encoded = pd.get_dummies(df.drop(['id', 'diagnosis_date', 'end_treatment_date'], axis=1), drop_first=True)

3. Model Training

X = df_encoded.drop('survived_1', axis=1)
y = df_encoded['survived_1']

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

4. Evaluation

print("Accuracy:", accuracy_score(y_test, model.predict(X_test)))
print("Classification Report:", classification_report(y_test, test_preds))

5. Prediction on New Patient

# Create input_df using X_train.columns and fill values
prediction = model.predict(input_df)

Performance Metrics :

Training Accuracy: ~98%
Test Accuracy: ~95%
Balanced Precision & Recall: Strong performance across survival classes

Dependencies :

numpy
pandas
scikit-learn

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Detect Lung Cancer using patient diagnosis data.pdf		Detect Lung Cancer using patient diagnosis data.pdf
LICENCE		LICENCE
README.md		README.md
dataset_med.csv		dataset_med.csv
lung-cancer.ipynb		lung-cancer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Lung Cancer Survival Prediction

Overview

Dataset Description

Workflow Summary :

1. Data Loading

2. Preprocessing

3. Model Training

4. Evaluation

5. Prediction on New Patient

Performance Metrics :

Dependencies :

Author: Sushree Bandita Das

About

Uh oh!

Releases

Packages

Languages

License

SBanditaDas/Lung-Cancer-Survival-Prediction-Using-ML

Folders and files

Latest commit

History

Repository files navigation

Lung Cancer Survival Prediction

Overview

Dataset Description

Workflow Summary :

1. Data Loading

2. Preprocessing

3. Model Training

4. Evaluation

5. Prediction on New Patient

Performance Metrics :

Dependencies :

Author: Sushree Bandita Das

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages