lunes, 22 de junio de 2020

COVID Decision tree Classification

MACHINE LEARNING FOR DATA ANALYSIS

VARIABLES:
SEXO (SEX): {F:1, M:2}
ESTADO (STATUS): {LEVE:1, ASINTOMATICO:2, GRAVE:3, FALLECIDO:4, MODERADO:5}
ATENCIÓN (ATTENTION): {RECUPERADO:1, FALLECIDO:2}

The data has been retrieve from INS (Instituto Nacional de Salud) in Colombia.

  • predictors = arbol[['Sexo,'Atencion']]
  • targets = arbol.Estado
  • sklearn.metrics.accuracy_score(tar_test, predictions): 0.8841027080117861
Code: 

from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import os
import matplotlib.pylab as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
import sklearn.metrics

import pandas as pd
arbol=pd.read_csv('COVID/COVID_coursera_v2.csv',encoding='latin1', delimiter=';')
arbol.head()

predictors = arbol[['Sexo_cat','Atencion']]
targets = arbol.Estado_cat
pred_train, pred_test, tar_train, tar_test  =   train_test_split(predictors, targets, test_size=.4)
pred_train.shape
pred_test.shape
tar_train.shape
tar_test.shape

classifier=DecisionTreeClassifier()
classifier=classifier.fit(pred_train,tar_train)
predictions=classifier.predict(pred_test)
sklearn.metrics.confusion_matrix(tar_test,predictions)
sklearn.metrics.accuracy_score(tar_test, predictions)

from sklearn import tree
from io import StringIO
from IPython.display import Image
out = StringIO()
tree.export_graphviz(classifier, out_file=out)
import pydotplus
graph=pydotplus.graph_from_dot_data(out.getvalue())
Image(graph.create_png())

Decision Tree






Decision tree analysis was performed to test nonlinear relationships among a series of explanatory variables and a binary, categorical response variable. All possible separations (categorical) or cut points (quantitative) are tested. For the present analyses, the gini criterion was used to grow the tree.
The following explanatory variables were included as possible contributors to a classification tree model evaluating ESTADO (STATUS), SEXO (SEX) and ATENCION (ATTENTION).
7 nodes: 3 internal nodes and 4 terminal nodes.
The first variable to separate the sample into two subgroups is ATTENTION. ATTENTION with a deviance score less than 1.5, the recovered that are females have the following status (Leve: 4172, Asintomatico: 327, Grave: 0, Fallecido: 0, Moderado: 17). For a score greater than 1.5, the recovered that are males have the following status (Leves: 4578, Asintomatico: 767, Grave: 1, Fallecido: 0, Moderado: 24).
With a deviance score greater than 1.5, the deceased that are females have the following status (Leve: 0, Asintomatico: 0, Grave: 0, Fallecido: 308, Moderado: 0). For a score greater than 1.5, in the second split, the deceased that are males have the following status (Leve: 0, Asintomatico: 3, Grave: 0, Fallecido: 493, Moderado: 0).




No hay comentarios:

Publicar un comentario

Covid 19 Práctica Rmarkdown

covid covid Julian Uribe 2023-12-05 ## ── Attaching core tidyverse...