Titanic Kaggle Competition

The main idea for this is to go thru the whole proccess of data analysis and machine learning model building, from data cleaning to model evaluation. the final result is on this link My Titanic Kaggle but anyway gonna go step by step with the code
Main idea
The competition is simple: use machine learning to create a model that predicts which passengers survived the Titanic shipwreck. I will get the dataset, create a couple of DataFrames to analyze the data, clean it, create a model and evaluate it, and finally predict the results.
Loading the data
First of all, I will load the data from the urls provided by Kaggle. and use pandas to read the csvs, then make a train and test batches to work with
import pandas as pd
import sys
import numpy as np
from sklearn import tree
from sklearn.model_selection import train_test_split
# Load the train and test datasets to create two DataFrames
train_url = "http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/train.csv"
train = pd.read_csv(train_url)
test_url = "http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/test.csv"
test = pd.read_csv(test_url)
Then I labeled the dataset assiggning integers to the different characteristics and classes and also create new features to improve the model.
#Setting the dataset
train["Sex"][train["Sex"] == "male"] = 0
train["Sex"][train["Sex"] == "female"] = 1
# Impute the Embarked variable
train["Embarked"] = train["Embarked"].fillna("S")
# Convert the Embarked classes to integer form
train["Embarked"][train["Embarked"] == "S"] = 0
train["Embarked"][train["Embarked"] == "C"] = 1
train["Embarked"][train["Embarked"] == "Q"] = 2
train["Age"] = train["Age"].fillna(train["Age"].median())
train["Fare"] = train["Fare"].fillna(train["Fare"].median())
#Creating New Features
train["Child"] = 0
train["Child"][train["Age"] < 18] = 1
train["Child"][train["Age"] > 18] = 0
train["Child"][train["Age"] == 18] = 0
train["Family_Size"] = train["SibSp"].values + train["Parch"].values + 1
Train and test with my custom data
training:
X=train[["Pclass", "Age", "Sex", "Fare", "SibSp", "Parch", "Embarked", "Child","Family_Size"]].values
y=train["Survived"].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=35)
test:
test["Age"] = test["Age"].fillna(test["Age"].median())
test["Fare"] = test["Fare"].fillna(test["Fare"].median())
# Convert the male and female groups to integer form
test["Sex"][test["Sex"] == "male"] = 0
test["Sex"][test["Sex"] == "female"] = 1
# Impute the Embarked variable
test["Embarked"] = test["Embarked"].fillna("S")
# Convert the Embarked classes to integer form
test["Embarked"][test["Embarked"] == "S"] = 0
test["Embarked"][test["Embarked"] == "C"] = 1
test["Embarked"][test["Embarked"] == "Q"] = 2
test["Child"] = 0
test["Child"][test["Age"] < 18] = 1
test["Child"][test["Age"] > 18] = 0
test["Child"][test["Age"] == 18] = 0
test["Family_Size"] = test["SibSp"].values + test["Parch"].values + 1
Making the classifier
Now the fun part, I use MLPClassifier from sklearn.neural_network to make a classifier.
--> MLPClassifier offcial docs
from sklearn.neural_network import MLPClassifier
X = X_train
y = y_train
clf = MLPClassifier(solver='lbfgs', alpha=1e-5,hidden_layer_sizes=(9,27,22,20,9), max_iter=2000, random_state=1)
clf.fit(X, y)
and the predictions
test_features_clf=X_test
test_target_clf=y_test
my_clf = clf.predict(test_features_clf)
print(clf.score(test_features_clf, test_target_clf))
test_clf_to_submit = test[["Pclass", "Age", "Sex", "Fare", "SibSp", "Parch", "Embarked","Child","Family_Size"]].values
pred_clf_to_submit = clf.predict(test_clf_to_submit)
PassengerId =np.array(test["PassengerId"]).astype(int)
my_solution_clf = pd.DataFrame(pred_clf_to_submit, PassengerId, columns = ["Survived"])
my_solution_clf.to_csv("ClfSolution2.csv", index_label = ["PassengerId"])