{ "cells": [ { "cell_type": "markdown", "id": "eafb1be2", "metadata": {}, "source": [ "# Seleccion de variables mediante la Entropia de Shannon\n", "## 1. Introduccion\n", "

No siempre es obvia la cantidad de información que aporta un atributo al resultado de la variable objetivo. Aquí vamos a ver un método relativamente sencillo de medirlo.

\n", "

Es un muy buen método para obtener relación entre variables independientes con respecto a la variable objetivo. Usa conceptos de Entropía de termodinámica aplicada a la información (Shannon, 1948). Puede obtenerse un ranking ordenado de las variables según cuán fuerte es su aporte de información hacia la variable objetivo. Y se usa como un metodo alternativo al calculo de las correlaciones estadisticas.

\n", "

En este trabajo se presenta un ejemplo del uso de la entropia de Shannon para la selección de variables a partir del dataset público del hundimiento del Titanic: titanic.csv basado en la informacion presentada en el sitio del Laberinto de Falken: https://www.ellaberintodefalken.com/2018/09/seleccion-atributos-relevantes-entropia-shannon.html

" ] }, { "cell_type": "markdown", "id": "6494f268", "metadata": {}, "source": [ "### Carga de Librerias y Datos" ] }, { "cell_type": "code", "execution_count": 51, "id": "5d70187f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tamaño del dataset: (2201, 5)\n", "['passenger_number', 'Class', 'Sex', 'Age', 'Survived']\n", "Eliminacion de columna 'passenger_number':\n", " Class Sex Age Survived\n", "0 2nd Male Adult No\n", "1 3rd Male Child No\n", "2 3rd Male Adult No\n", "3 Crew Male Adult No\n", "4 Crew Male Adult No\n", "... ... ... ... ...\n", "2196 Crew Male Adult No\n", "2197 Crew Male Adult Yes\n", "2198 3rd Male Adult No\n", "2199 Crew Male Adult No\n", "2200 1st Male Adult No\n", "\n", "[2201 rows x 4 columns]\n" ] } ], "source": [ "# Librerias\n", "import pandas as pd\n", "import requests\n", "import numpy as np\n", "import math\n", "import os\n", "\n", "# Datos\n", "link=\"https://rudeboybert.github.io/SDS220/static/PS/titanic.csv\"\n", "data=pd.read_csv(link)\n", "\n", "# Acerca del Dataset:\n", "print(\"tamaño del dataset:\",data.shape)\n", "\n", "# Eliminaremos la variable 'passengerId' y 'Name'\n", "columnas=data.columns.tolist()\n", "print(columnas)\n", "columnasAeliminar=[\"passenger_number\"]\n", "df=data.drop(columnasAeliminar, axis=1)\n", "print(\"Eliminacion de columna 'passenger_number':\")\n", "print(df)" ] }, { "cell_type": "markdown", "id": "77a262d4", "metadata": {}, "source": [ "## 2. Las variables\n", "### 2.1. Variables" ] }, { "cell_type": "code", "execution_count": 52, "id": "010024c8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Variables= ['Class', 'Sex', 'Age', 'Survived']\n", "valores de Class = {'3rd', 'Crew', '1st', '2nd'}\n", "valores de Sex = {'Male', 'Female'}\n", "valores de Age = {'Adult', 'Child'}\n", "valores de Survived = {'No', 'Yes'}\n" ] } ], "source": [ "# Las variables\n", "print(\"Variables=\",df.columns.tolist())\n", "# Valores de cada variable:\n", "for i in range(df.shape[1]):\n", " print(\"valores de \",df.columns.tolist()[i],\"=\",set(list(df.iloc[:,i])))" ] }, { "cell_type": "code", "execution_count": 53, "id": "e997a20d", "metadata": {}, "outputs": [], "source": [ "# Binarizaremos la variable objetivo \"survived\" (reemplazar 'si' / 'no' por 1 / 0):\n", "df[\"Survived\"] = df[\"Survived\"].replace({ \"Yes\": 1, \"No\": 0 })" ] }, { "cell_type": "markdown", "id": "e1fe8960", "metadata": {}, "source": [ "### 2.2. Variables dependientes e independientes" ] }, { "cell_type": "code", "execution_count": 54, "id": "044c31da", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "vars_indep= ['Class', 'Sex', 'Age']\n", "vars_target= Survived\n" ] } ], "source": [ "cols=df.columns.tolist()\n", "vars_indep=cols[:len(cols)-1]\n", "vars_target=cols[len(cols)-1]\n", "# Variables independientes:\n", "print(\"vars_indep=\",vars_indep)\n", "# Variables targets:\n", "print(\"vars_target=\",vars_target)" ] }, { "attachments": { "entropia01.png": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVEAAABTCAIAAACpq6OqAAAAA3NCSVQICAjb4U/gAAAAGXRFWHRTb2Z0d2FyZQBnbm9tZS1zY3JlZW5zaG907wO/PgAAIABJREFUeJztnXVcFF0Xx8/MLA0ijT6ACIIuII2EiYGBgtjd4oOChd0dj4EKio2tj/ooJjZiIi0IIiUh3SwLy+7O3PcPEImlFF4R5vuPH5nZmTt37u/GOeeewRBCQEND027Af3cBaGho/q/QmqehaV8wfncBaJoVKvvlwY2evnFsjel79jhgDw8fuRdbWpaXi+vP2rJpfHfR310+mt8PRq/n2xCI9WLjkvCJ7taPx027hLT1uw36e90cq85CCScnjfYUX33fa+pf9MSu3UM3gbZE8Vufb71sdfDsrDxEEQaO252sOosCYELCQhi/ML+Y7t5paM23LYT0526a1A19DgorIYymzLeSwQAAgP3pYxwprqmtQvzm8tG0BmjNtyVEO/fQ6EglBQZngLqpiWLFy2UH+QWWiJn1MxX/vaWjaR3Qmm9roLzgwDhKzthMo2JUZ7194FcgYTmsnwwqSErMI39v8Wh+O7Tm2xqlYYERPAkjc11hAABABa8evGZJ9RvZv2NZ4LFN/ybQmm/v0JpvY/CjAkNLhPQsjSXL/0/lpqaVMvQtTBkxl71SbSYaCf/e8tH8dmjNty1QSXY2R76/XT/5cvMdEOq2c+y7JZ13mne4aOrmSeq0Ga/dQ/vnaWjaF/Q4T0PTvqA1T0PTvqA1T0PTvqA1T0PTvqA1T0PTvqA1T0PTvqA1T0PTvqBzZvwBkHEnp048GsWlENQMp6g3vKKpsReE6rSzd9aZ0Yk12ja05v8AcCUDfWUU8ZX8LmG8g8n0ZWO7iwg6GSEEgBCFyqEQAkSRJEnyeVxOSVFhfl52ZnpaSmJ8YnpRGVWtUyBT7199udB0WEesxR+J5vdBx+H9GZSEHZo251Q0p+JlYcLas85cXmki8QuX5BWlRAW+fvHkyeOXwSms8v4EEzVbe9drmgq95GvD0Jr/U+B+OT1nmlsI+7vqCdXxR//d3K8ZxmSKFfPsgqfHhafxLAoRmo5Xby3VE/rlq9K0VogtW7b87jLQNAZC3sBUOvz+6+SKsR4VRQekqY8YoiXxq6rHROQ0ew0bO4LJi3z/MT0jmTQYP7ALvf2uzfJ/nMRR+VG+L78UNfO0ApUkvHnxKbc97Aon1MZvXz9EofKVkVmPd2y8mdJMjy78l7Xr6Qsb+svmPrn2JIee/P1fQMVxr5+G51DNeMWCz76vYlj1vb+6Nd+8c34y8/nWGYsuJ2PCzWwfwoTg2/XF09c/TG0HsseVRmzcOlqF+F6HVMGb/evOx3Cb6/qimhP3/DNJ4cM176/toDZ/N6gw4NDceR5RpGgzagITJqNPOM7a8ya3zo6kjvU8x2/94EXeBRjBYJQ3MIQoiqIQhcSHH/LbN1iUjDzsMPn0V4whxGAwGDgGAIjk8Xh8PmG+8cnJCfJVHwMV+O+fs8TXYP/lTX1lBDwfGX5g1LRzqQwREaGK5owokqIoTHXG2f+W9iQA2M9XD3V9WiYiIizEIHAMEEWRvDL5SWdvL+tJoEL/3dNd3prsu7hpgGybNzkjVsA/UxwvxHO/L+xFmY7nLi3RF2uu6xf5bRy7R2L37bWmf7LTjox2HzvxTLqMsoIEycotNtn84uDw5qqiZqH006kFjlfk1l3ZP7JT8yY1oLKerJmyMXGk59klxpIC9FDHeh7roKJj2nfIQH0U+fpzLo9E0qaz17rOHmNnP364uUoHIcBEFbsZ9x48xFIq4VnIt9IyHkPbYcWKeePs7Mba9u0mK1LlVijn6eYFB9NG7D84tZuIQEliwnLqhhb9LHt2zAwJSSws43L5ksyRs+fPnmpr1lmSAABcQqGTRHbYuy9ZJaVcEdVeQ+1Hj7Z3GDfUrJMkDpioimkP1q29xwLlhgzXlW7jNmdM5C9TffLtvaAsfvkf+LnhIWyjUX1UBFdu06+vPmDqRKtOwjj+J/efmIhCVyZTgxH/+l1sHge62swdrtV6DJOI9WGf06YQ460eiwwFyfKXwCQ0zNSSzu/2itcaMUhDvPblUX3wAncO0mMymbpW63xLBZ3ATzwzqSeTydQxcryZQwk6g8p9vKKfXu/lPnkCD9e4X8RBWz0mk8nU7bvOt7jGsQSvqUaGw1yvRRYJuBJV5Lt+gJ7FwlsZZMO3+fPhxnnN6KXL/I6O3sA1TwXXfzun1GeJkQ6Tabj4QcnvLkoV2P47h/U0mX0lpcUaK5l5a6G5nvV634LaraLeQZFKi/iURQJgwj1N9AUGgJR8CoslAYDQMjUW6DXiRV089jiv26T5gwVN6mvCYNrb6whhAFTuM++XBT9WHSj/7d7F7mn9t57aO1FHSsCVMKm+c6fplrxy93zHbvg+fzxCmtN3ruon+/3tITL93tbNd9LoRXhNcILR6mYrVPJ/R29+U7SbP7rl4iBwRdv5Y1Wz77qdi+TVOlbfD9kRoTEkABCaxobSghUdEsFBAEQnYxNVAYsSVPDszL9xuIG9nXbjAv4I9VEOJiIYACp+7f0kq8IMwflyYcXqB7JOHrvs6v4sA9Fl5Bgz4awHFx5kNqMZtNVCqDhs3ThCqbI6qNyXuzdcTeD/zjK1RlrfCoUTeP5cKK+L7RizFrUvCPd0sOsO8Tcu+NV0ldWn+UpFyxsaCVI0kEmhYbkUACZpYMoUIGqU53vvdSGhO3hQozs0XMnGobckBoA4gXd8vpEAZLrPJhePnOH/uM3VqbeScMUBNoZCJR/u+KS2B9EDrmCzfvs4tcpxjGJ9cFt3OorzWwvV6sBa2zayEv+7TzJBbeBgQYppToiugwdr4fm+3r4F1UVfT4WQiSHlihY3MtMRZP5A+SHBCSQAJqxnaiDAxotyfB/5l+Dq5uadG1/vmKy1g7UMDoC44d53Y/KDj7hsDuyxzmN1nwZN8picmaU2wY987pvWLkQPmHSfFbtmalfa7lBpxPG17sF/7mfpyOTLjg4rvRPrdT5y4m4sc1h0vZHvGGtlwzzb3+dlPsj1suze4hmICXVzi054ScCzt4XVWkQ9/vmCj6EJJAAmpGNiIHCA5XwMiuAiAELDxEjQap0d+j6sDCSYPbs26fEkrUbbKBIAQMbfWDNzyTXhWUf2OKg2plMkVHR1ZDBeVGBYe1jTAwCAhLHzbieDStss4sZdWLfvdUGLqR4VfzrvMryXsbn1+GUnP+RRQBV+urnLaXRfY32jPg6L99/5XG84SAMQqiPmWCXumbviRkKZ4DM4MVeWzz+UPWT+8E6NHEdal+h5ke8Ci5BQd73uAp0IZQm31zhYGpv1s3c65JtGAiqJ8zm0dLy1ib6B5cgFO6+GNCn4jKGlyxSBkrDA6mv6uiuuLDzoEw8B4GqGhvKCTuNHB39kIwBC3shIUNp0fvzHyBJEqGt1baKPRNRktK0qAQBkdkKx1fbDf/ds7IfWCA1tDQJxo8Kj28+6VqTH3F2uVtI/zHkptzZt82kZmwZivTuw4jwx+9yVxdqJT44sWe3m6Tpp0ZUCI5fTDx9dXtMz4dL6GfM9P/388gKTsXA9cWBI6v65rtdia12GE3NlmaN7wRj3Y06Gggy5gi9Z1wFOiv/No5udxlqbGZsPdJi5eMPh/0Kyaxm8ABA7/vGxdbPtrS1NjM2tR05bcdg7PD3Jz3PFpCHmJuYjll6PF/CjOiDTPkXmkLhSN80OAspV9unESresEUf/3WJR8PqU6/I9JzZOnXM0TmPWkbtP/tvZv8B755yZe981IZJVTFNLFacKP0UkVe0q6hw++V+CwlgUAJBxJ8b1PFH3dTExA1NdAapG7K8JmRTGUOqk2NRZjJDuqJFa5z2i+YA4FDQhSgmXUVYSxcjcpBQWMmvQUcBjZeeweE0dljDhDgrykq1pDzKhPnH7+g8T1zzJLhc6mfVkx6abPY9NEGiE+QWoLJ9zT7s4/TdOR/ZDZ3GMyn7rdd5k0YnzfxtIYQBg6zzt1t1tH84cvO1wenJjh+Fa4LK9V508gOYvn7+MOuk2Rfv7opETc2X5Ao/CMe6ezsaCBFMHAs9EJdH/rl/6z0vSbKbTEreV3VVFc8OfXfXcP/Pq9cm73FYP7PzDNJrzcsec5f8myQ9asuXsYA2RrKDL+/ZsmHVBUtFy5pLFMsfXXX5++UHc2MXMxlU1lRifSAKu1FmpdvUglt/5/xjTzs4xVo952xFD2eFXTnKnH76wqo8sDgB/LZzb7+ayR9f2Xxp7Y2EjFwa4YidFAmK+JX7jg1blT+pqu1R6aGg6CYCJmc3fNVP/h6ZRedweIvNeHd12M5YEBtPUQNA4TOVm5ZIIk5OTa3IoBD8tMjoPYQCIKnx153n+kNGNDa/DZeVlcfiWmZZFgUz99YJy7q8avsGvpD7NY+UTw+r/iPff7uPe6BL9X8CVbTdt8Y9afPtb+ZZYquDN/vUXjc/M6tacYShUxtP74bqj9yriZEp0dD4FuNzwbe4VggcArKO8HANDpWFvAliT7H84eigulxIWbkIvicv1WX3qIMxf7rgMymXPibm8zNGjwMH9uIupQA9SPdQ6nZ9803Xejlc8i/U3j06pWDQqWc/YZqbXYcqsc64L4OiltVbSGACgHJ+dG67HcWVGrd41p78kAKiprtuWFjLuWBwlw7SwUaNieFHdhjd67YpY2dmlCCNk5WVqaR4VvXrwVnnIQnUCFX35nEIBJma56liF4AEARGTlJHHEin/zPu3v7qqVv+dzebiwkOAeFpOQkxfDUElGegEFld1MHW8CFYUGfeEDAENn8PghA/+qfcmy137bEAAQakaGioLuiFhFLASYqLhY03p8VPB+n8vOT90HGrCfhbER+/2dxxl2jR02MDFxMQyguKjhVSUm7+AZ5NCkorViMNkBa3ZOD59/Pq48JhcVBx9Ze8L0orNe8wXQUjl8RYcxvTtiiBUWHEMCJm48wLJKVAYqysvnI0D8nOw8CqTLhcD56D5t3tnSSWfvuBr9nOzR4QXIc6lHwWj3E4ubLHgAqNESqKx7ew++zgPpUVOqW4kwScNpEwwv7Qq+uuPEsFurjEUBFby+75dPAaHYRbXSokWoqKsw0JfUu6fuOF6c5brFtglFoYpZLApATFy0di9B5ZZKDx4/UI0AbmRwOBdhDN0Bfasqi5+fz6IAIDcrh4JyzVOZ91dMWv9Od8s9dwcFQTWDi4mL4cAuZhUjUPr+xzpeA/dT4McyBECoGhspC5IbGRscWkgB4NKGJt0Ed3M8Lg8AhEWatCuT8+X88hX3pJ3OHJqQtiV4yb08qizkzsOkCXMb2ZcKiwgDIA6H83tt17wP223mXm3KohojlKeceby+108PzJik2eLdjsEzjkaUlqueE3V63RGLq6tMfyWxRlUY+rP26wMAlEUEh5chjKHTy7jqtXmxn+P4AIBLdehQaV4oy8lkiahZGak2ea5fLntytsvUaQRz9rHTS8x+QvC1topRaY9uvSmigKHO1KoZZobLa2hK40E5yXdvvF9kbC0BiM+najUkHMMBA0RlZ2R/116j4XF5AJiwsABNEBrjtm0DACCjg8IKKcA1zUyqrgDIxOiYUgSASUlX2jJQQWYWKadn2qPOmhEWFgZAnNLSKn8TrHkyNii0oD5FUxmhwSkkACasb6xXh6gJBgEAiGp8wyczH2928cgZcejcHKYY1tXBRunhtXSSF3n3fswsl8atmFD5O/rdxloh0+XXfWY1yZaFicl1/sWZuKiu4+5lgZP3fGCV1zkhxAAK1WPG+jnI+ODQfApwTVOTquMQ+bXct0soaGpWtkFMetCOx4N+8kaI/S0yJldIVhbLjI5O5/SSboYgFn5sZAwfAYCIgMEIExISwgDIwsiIZNKaSUib99UXexdUlvE1qRS0y7/0S2ZlZJIICPleFlrVmiSVF3LlsOed91FJbAkVbZOhsxbN6q9So1vBCQIayFNIZYaGJJOAKxqbala5PpUTFpJEAWDiGpqV5gai+9xLr+bW97gUQgCAVWsCArspKj0kqFzReiZ6AmNu2WFB0XwAILSMDeswqGCioiIAwOVyGzfkIlbQEeeNAT3WH61wxYuaOth2IQCATLjvHdbI/aJcLhcBJizc3Ft2mwohoaDaNFTkxX89eoShMXnnWms5HAAjOg3f5r7EtNH27UZDZYWGJFGAyxqbalRrlCFBXykAXMaij24zmDhRcfgp54Ve/InH7t7xGFvsuWDppS8/4xCooS+Kyy03s/N4gnw7FWeXz1GBUJu8xcVEHBW+OOnx/GsRl8tKeX921/lIUlRr0vbl/avWLTvksNO2z702XPB5/uCEEzP78RHnqa63aiQ3wETFRDFAZWX1aIIdFvSZD5iYgVm1wZQdFhDFQ4CJm/YxbnzXh3hlXASYULX+TdDLQcVhgRWKNhEcc8uNCAwrRQCEUh0RegCAdZCRxgEVsxoVIlIWf3XFsmtCs05XccUzdEbZap9z/8wnUx/d9l9i0q9hlx0qLioGwDrKNDwNRDm3Fw7f4Mdust1eYsCOx0cdWpUNrwq4qJycFI7zjJZ67LDt3BJxH+ywoM98hIkZmFZtlCjn9ctwHsIYGmMmWogCACoI8Fy7/1ESS7L3Cre1gwQYqusBFUeccl54ljfRw9PFVBqDxcfdMSeXBUuxk4emajfJQlFzTBVS11TDIYZEpSW1zbdUSUkpAsCEumiWR46i0qTYtE5D55gV+250uFTIpUBYVrvPjJ3Of9v3qKp4Ku2O+03GgMPmKmIEIabnsGVPUsTkUy8PHnk++B+bH4MiLtVRmsCAW1zMBRA8PeZFBoaVImD0MDWoum4qDngRwEZAKA2fOFAWA+Al3t26wSssh1SfsmvfDL26egGqmMVGgHeUrbrbVJDmueEBYZxyRRsLjrmNCwzOowAwcQOTHnV16bh8506iWHBxTk4pQP2TVjLz6bZFB5IH7L9Y3RVPaI600z8RHcKlcp7d8nXta9ug+42fm5NPYYRiJ4FmxWpg8rZ77pkWNd63+h1haeXWKnjgfDnvuupmutrEI25zmC2z/Z0XFRhWgoDQNNCt8q7IBO8b/hxgqI5dNUtXCIBK/W/rGWrBqTOxK4ZsP3l3yoD5jQ/MQuxPp12cvLgTKgQPAJi0qctxD+xvlwXL8JNuk7Ua/2hUhZ/pu/SJbsNtdU7HRvC+xX4tBS3JqueSqbFxbApw6b4j+5U3Nf7H5y9ZzG3Omwa5biJLCwq5otLSAuxvwI+JiCoIDVi0s8czt2FSAELMoYM1zhyPf/sqjGfT74e6RTt1lsdRZm52HgWSAu1kCUGhuRQQf+n3rNJLUpk+/77Ip3AFG9eFvSUBSoMObXtnsffy6Av2s49e8p+4x1rgbBxQYU4OF2FiSkpVx0AB9yVjAoILKABMXF+woqmMoKBkEgATYhoLjtADAACGuqYaAWR6anq9sUOcZJ/tC9Y8lJ5/aKN1zdgfXGWEg4U4BoBYr24+Sm/QMkBmpaZxEd5JS7MxU1phaeUmzr9VVVVVVZU6tJ592NUgM59scTkYJNR/nceavi3VLZEJQaE5FAD17XNM5QyOn3TjH69IvrjBwv2r+3bEALjh1+53nD7PAA8NiCLFOqvINXqUR+xPZ53/9uJNPlrDLYd1MHE+7u5QeMTR9Vpc4yf5FfYkqnK4JzSnrp3NFIVivwvXY6uuGFFx8LXbUSQu23uZ63C57/5HgJLXF868T87OKyoDAuOVlnC4tdszo1svS2VJOR2DrhV6wCQ7SOCAOMXsao2WUNPswsDI9G91aILKCQlOpAConOgvlRmzqKynB469LxHWnLJnq60yDijL52K89aIRyqlBQZnQSaVznSspMj0lnQRGV22Nqk32x+lkzmf/T+nskoK4hw+TSQDARUtinz9hdBCT1TIz/EsUUGF8QFhSUWlRqu/NCD4AgBA/+e2T57LiHboYmXatpTJCTV9PBo9Kjokrhe6S1Y9RebEBH+PSkr8EPb//KDiNA4RarP/rSNn+ugqVJSrLCH/z7kNoPgMDQKgs6Nw+rw52WtKi0l0M9FUFZAIAAF78l68kLtnTSKslJrWtGsQK8Vi87n4ec/7JvePUW6xXqli1Y6ISvBe71p2XWWuvyY287bbd/b1Y35WH9s6scA0SKnbrnTVE8u/f9SuSHjiqb2ODaDhfLi118qKme3o6GtTutrEOJi6e7uC02HGV0JkDY+uN70TFicEhX/NSfWP4AMCPe3njgUgXua7G5t2kxQ2dPY+Qy9Z4uc2dl7Fojq2xujQUfA24c/LYzTRl61Vue8arfW8+DAObgQp3/vOYO8yjSjEIUZku+pb9BttNHtdXVRQAAFdxOPS8quOXmxifQgKhoa1ZrZCYjF5PVfxtUlxcDmUuwP9cEhYYxUMYQ5Lx8fBqj05bpukx4p8c27Hfh2foeNTN2bK8ExQ3W7heVoWKOnQ/BtNZPKIOvxkAYsXHZlCEkoFBtQ0vlQojP19a5Xw2GeE4jmPCYmIAiPXh9Ka3fL6Q1dbHx8eKotQ7W532fyRxHMcxXFRMDICMur599WUe9Fh865qjVK07C+lZmkneePzp42eerVm1Ryc/X1g2/9RXCsNxDADHcUSlPPpnHVJ+dnCoVMU5Zf7ufy/xLkQYhhEEAQBpT91WP0FIyGDFnYuzuwhec4R9YoOEdT/jPzmr08/AS7y5Zunpz7Ij9x9xFpgPqbmoaJSiVmsuTkg6sHPh8P1logrMvmN3Xps6TOdHFBQhr6UFVPpV73elCvb2VpL1XbIKVGFylprTieWTdetK5otJm7qc8Ozo5ptYQHVVqGf2QKXc27n8fCrBICSkOwKV//rY5hd8uQmnHqwxYQCh0Gf5+bs29y9d9L64YfaO/DJMQrmHxdCFx3aP7qNe/dYKPXTkiIysqsMyIjl5CQEPEgIe3rgz79jZpWa1eieU//K2bz7IDZ43roYgCU0Lc6VT8dGhEaVTO9VyovIiA0NLEDB6Op1ehk5vXWl/soQh083Sbu35mXbGit8lhEmqdpeEsg93HiQJGa0ZISjuvZyyyNAoHiZn1Uen+kygpRJ1IIQQVfRombmO3shDkfxah0h+s6d14cefnNBT12L5I0GJdNowVK7fVltDXcvZXtECkxk1I1z/7QP0mEz9yWeTGkzxwk84PVFfb9g/oVyKnRqfyv7zXkppxLGJZrrGY7Y/ji/iVRSfInmlhelf3l3fPa2Pno6O/ljPmJqNmyp6t31oT6NR21/nCnjmMv/tA/V0+29+w6l1iB/tMVqPydQbvv9jLcHUpPjF2j56pn/fyqb4uQnxuYLO5wXvtdHTG7jdv8adWnRzMSbVb/yITpD45FFUTb8IhhPNPRqRCY8ffUaKNg59mt8/1YopjTy1ZOX1NLUp+w7M7N7C8xsyISg4hwSik5FRg9ujydj79yKpbiPteuKJ1zYd8v/ztjoW+509F1GqaL9yhY2G1Pc0BRjOEO2grG05fs3x3WOVcV5cQHBuNfs/lfV0x6Z7YuMPnForcPu3sMk4B008x9fnQ0mNIxXrJlzKwFi7gbUpKvS7+6JAsq/9IFn2a7dNt5IEmLrKQh49TcO6jhptUjNIoIHn/kXEek2bqi+UfO/G++KWvREAJ+SmdwzDcNZcy+aKO/sDIFPvr3dxDxMftNFjVe/GZB/7Jaic4KCvFGAS+sbdG/TAk+mp6dDZyqpLus9xf51ZwwXGhrZmELeIVQYgJSMt2G4gqqwsgwHi88kqmi/9dGr5vjT7o14brJXq0C1De/zMvlJ5z64/q/EVAXZoQCQPYUJ6dSSiqwKVl5ZWQuj3NYdwr6vsMVMNahURFfrd9MmQGuA4vVawREsnESE0Jq+Yqp53/8TNls2YTqV6H7+V2XXysnFq7cZ8hwo/HHDe9IStv/DI7sYlGGgcvIyIN+FptTew87L8/CJ4CBiaTK2GA6qFe02aa0G83LtoT3i/dYuaEEfSWsA69hlm0QEl/HfM+6uAykh56HnjCylpPLRfpV+YzPDZvPGd1b5ji0w7YgBk/Lkle9/VDibDleyWz9fnvTp1IaLKdcl8/5fBJQjwv7prNThRJdSGz7Tvlnhh6corovM3jq5tDeTHXD75pLTnHOehcrWv1dC6oRlgh7qNNbZ0vJ7S4CLlZyEz7rr0MRrtFtqakpu2MJyYC3OtdA1GbHqR3azZU8n0644Woz2iq70s1uMVFj/S7DKZTCZTf+r5lkvb2kogc/095gzQ1+01crHbvy/C4r9lZGSkJkS8unlk+WhzvZ59px94lfm9nihWyJF58z2CK41JVO5t56nHaq32y+HGnZ9lbjLpRHQZQogbUJ5eugp6tm4RPy8XXvy5Gb16TfeK4wk6+v/5RiUv6Zbr7AOsGWePzxL8AeVfght/edGsM2JLzx4Y23JOqtYFlflkw4wV90p6bzh/eKJGM35bjsp84DphQ/r0W5fnVQ2i4WWE+8cW4QSBY+UftiZJopNB7+6tNjip+UAlKQGP79x98CIwOjmrkEMSIlJyKtom1iPs7IdbdZWqGGH5ybeWzdgagP0lL4FjgCiSz+OwcotMtvu52wr2WlCZzzfPXf950DGvpYbc6HeRuRijvHYpkuSTmLyOlZ7Sz83d2B+PznV62G3b2a2DBa8vfrovaSKlcdeXDLXd5CvIlPkrUEXvdtrZLLzc4ibr1gNVFHxogomuyYRDwc3roSCz/XaMNtYxnHHxW1sfwZuZsucrzXSYNdEb+k+YwJG2Av63RxvsbFy8m7O2yUyfFUNtV99PrvvG/89vUZPsnDwkoyDZvAtukp2TR8ko1A4PaKPwvl5fMnP7G7GR+y/ssKnLTtR0yNygc5vXe/imcEV6b/Y5MaFp8fE0PwkqzcvmSSk2Y2gnn5VVSCjICQ5aA4B6cmO1AISEvMKfctVWCpX7cqfzrlek+WqPLc0leIoV9+LqqRMXHkblkQgwCUvbwQ3vVqBpHjAxWcXmNW8ypBTlGjijWe9H06KUfDq5ZPXN9K4zPA9Mq5XwoUlukaojAAACDklEQVRQnIL0xC+R4cHv/Xxfvv+cxfmeGgLv0M9uYIs7/Wh+J7Tm/xT4Kd7rXI6GFjM6S2X8t3PNrbp1ib5vJaNQpQ2ZIkmS5HLYbFZhfm5OTk5+Mbd2BhgAXHaQXaOD42n+TGjN/xGg/Lf/OG97mkkiIFODHqW20G0IZZtRFo1NLE7zh0Jr/g8A5T3asfZaAhfDsAYSK9UFVjVZWJ3/IbqMsKsZqEnT5vh/2u1paGh+P7SBloamfUFrnoamfUFrnoamfUFrnoamfUFrnoamfUFrvq2DeFw+7Zqh+QGt+TYNmXBpTn+zQRtf1szDRNN+oTXfpiFzM3NxZWMTTTrShuY7dEwODU37gh7naWjaF7Tm2yglkZeXTRptO2zsmtsJTf8mH00bhtZ8WwTlP999IHHM0cvL9VMeHK/40hgNDQDQmm+TkF9vXSsZs7CP1OcPYaWMzioCPoxG036h99K2QXBZ6xUr/+pY6u/9OF2k14IhyrTmaX5Aa74NgnXU6A6o6LH38zyJvvaD5Om8NzRVoEeANgrKfe79urjjQLv+0ryshOQi2iNLUwGt+bYJlf74zgeO4lB7K7GcBzt2Ps0R8BFDmvYJrfm2CZWZms6TMOttUPzuxEOZaWO6tpf0/zQNQsfhtVG4sf+uWu6VKt+VOXTx6olMCXpNT1MBrXkamvYFPbenoWlf0JqnoWlf/A/WPRhWrIQzwAAAAABJRU5ErkJggg==" } }, "cell_type": "markdown", "id": "4fda7e65", "metadata": {}, "source": [ "## 3. La entropia de Shannon\n", "### 3.1. Definición de Entropia\n", "La entropía es una medida del desorden de un conjunto datos, y se define como:\n", "![entropia01.png](attachment:entropia01.png)\n", "Donde pi es la probabilidad relativa de aparición de la propiedad i en el conjunto de datos. Por ejemplo, p1 podría ser la probabilidad de tener cáncer de pulmón y p2 la probabilidad de no tenerlo." ] }, { "cell_type": "markdown", "id": "10dfb9ed", "metadata": {}, "source": [ "### 3.2. Cálculo de la entropía\n", "Lo que nos dice este valor es la medida de desorden de la variable explicada con un valor que va de 0 a 1. Un valor de 0 indica orden total, o lo que es lo mismo, o todos sobrevivieron o todos murieron, pero todos los valores son iguales. Cuanto más cerca esté de 1, mayor desorden. Un valor 1 querría decir que el 50% de pasajeros murieron y el otro 50% sobrevivió (hay que tener en cuenta que la entropía no es una función lineal).
\n", "Fuente: https://www.ellaberintodefalken.com/2018/09/seleccion-atributos-relevantes-entropia-shannon.html" ] }, { "cell_type": "code", "execution_count": 55, "id": "36296b85", "metadata": {}, "outputs": [], "source": [ "def calcularEntropia(data,variable):\n", " #serie_numerica de la variable = pd.Series(lista_numerica)\n", " serie_Variable = data.iloc[:,variable]\n", " # Calcular frecuencia de cada valor\n", " frecuencia = serie_Variable.value_counts(normalize=True)\n", " # Calcular entropía de Shannon\n", " entropia = -sum(frecuencia * np.log2(frecuencia))\n", " return entropia" ] }, { "cell_type": "code", "execution_count": 56, "id": "3a0e5796", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "entropia(Survived)= 0.9076514058796559\n" ] } ], "source": [ "print(\"entropia(Survived)=\",calcularEntropia(df,3))" ] }, { "attachments": { "entropia02.png": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZMAAABUCAIAAADI5E2OAAAAA3NCSVQICAjb4U/gAAAAGXRFWHRTb2Z0d2FyZQBnbm9tZS1zY3JlZW5zaG907wO/PgAAIABJREFUeJztnWdUFMnXxm93D0MWJYqKCoo45GjOEhSUYFjjGtasmFkT5rS6Kuia1uy6+jevCUUxsJhQchAlKkFBkDzMMMxMd70fBhSkhziswNu/czwemKarpur2U1W3bt3GEELAwMDA0KLAf3QFGBgYGOoNo1wMDAwtD0a5GBgYWh6McjEwMLQ8WD+6AgyyBxWHHd/g6/c2T9tty95ftIKP+FyO4QoLv4gMxq/bNsNaFfvRFWRgaCwYs7fY6iiL3Dv/Vq/9M1MWuPrkdDM36D1jrae9vlL+1bnDt+b8cuWfJRziR1eRgaGRMKvFVocw6l5UD5c+yrmfcyggu0ze4uWgr4wBxmbLYaiosIj60RVkYGg8jHK1Ogj9cRvmWBPp4RE5YOAxd4SuZIYlfBP1TixnYKTPTLgYWgGMcrU6CG0jI028ICw0mdKwtutWLlSiuGcvc3HjQf20mC5naAUwZtw6KY0MjREpWdqZsiU/CyP9AzJZ5k72HTHupw85ZT+2dgwMjYVRrlaJOC40gs8y6WWjLPlZEHrvYbac9SinDijx7/XHo4U/tnoMDI2FUa7WCJkWEfEF69bLTlPSv4iX+akAM+xj1/bTrT8jbKYMUv3BFWRgaCSMcrVGqJzs/DZWro4G5U4uTH349CnmxbdWzdzy1mHjPFP5H1s9BoZGw8RzMTAwtDyYORcDA0PLg1EuBgaGlgejXAwMDC0PRrkYGBhaHoxyMTAwtDwY5WJgYGh5MPm5Wgroy7X5LttfCUgKwXehLDVGttQ37AVXd/G597tjGyaJF0NzhlGulgKmZmypjz+PFVYIEabQc+zyn+nzBCKEANBXKAoBokiSpEixUMDnFhXk5+ZkfcpI/fAhs0BAVpE2quDxRb8s+8kdmOk4QzOGiURtQYhS/pozeU8ItzzDFkZ0dD9wefsw9UZMj8iSrITIF4EBAQ8ev0opFEtsgTCcf+X6Eg4zqjE0X4jNmzf/6Dow1BFC3cxW/e3doNRSicIgbmJIht5IJyOVBmsXzlbV6mLca9joiZNczNR4afFJX0oRVZRBWf40tLOczGrOwCBjGOVqUeBtevbqlOb/MJFXrl381JCktvbO5m0bu7bD5NS6WA0fM9qWnRIclpaTxus6ZmRPJcbZxdBM+a+cGVTB28B/E4qbZGWKuCkR8fkyTVJM5UQ9DE4VyPKWMgLXcvDeNq4Lq0JTqOJg37Wn42WUcIvQ6jX38IXfR3cqfX75Vhopm5v+P6Is43VAWGbTtJvoc2xkRnM0yaYBFb4LfJrIlaoYUpRLts4vMvvxlmmLLqRj7CYZw8ueH5y3/U6mLKULY/NDd8+Y+2eU9Jb7YWBqA7x2TO8hX9GWiB9zdO2hCJ6s7s/qMGKzzzyjpGtXY1p5Gi8Z+3hL355dOG17MF+hSaYDZMrVjcuOh4ma4t7NEYxNxh+bO2PX8zz6B5vOCysI8rZfdLMQI1gsAgMAQIiiKApRSGnk/qA99gpk3AGPSSc/YCw5FovFwjEAQKRIJBKLid4bAo7/pFlZoFDhK58F60Jt9174uYcCTWlkzL7RU89+YsnLy0lKA0SRFEVhetNOX19mRgDwHq92WvmwTF6eLccicAwQRZGiMs2Jp28sN5OkcUHl/yQ/FPuvcFz9r1hRQV6OwHHJH1CkSCDqsfjGXzP0KtkVGX943KRjqbiCgiKbhQESC/ml+PA9gXsc2vZbtX/+oqmeSxVPHZ1u1Nyywihbe/62IHzqgSg+AgBAZYl/rdvb+9LGAW1lMzQomMxaNylg6cWgeZYOrTU8gkw7PdV1XwyFswiJCQOiKIqiEMJ7Lr5+aV53vPjuMvs1TwQEi8WSk9gmosRikUhE6Uw+7b/WtsqzI/pw7ddFx6mZf68dRLdjIni63n7R7RI5eXm58pRpiKJIiiJMl14/M10PLw96CQN5tlyFlZNioaj7wusVJouqzifId4fGTjqRzlZUYLMIHMcAIQqRYqGA7fTH420DKvkoEff+CodVgUJ5eQV5NgtDYqGgVKQ/7/Kl+YbN+Z0ESqbzfNd9mLxygeLR00utq7ly6fxcWJtOxrYDHYaZo7hn7/JEJFKznbl25cwxrm7jR/bu1EYOMAXt7tb97R36qr5/FPGxtEzE6uHh5TV7nKvrWJeB3dXlKxWCch9umueT6bzXZ0p3edpnAGNrdLXsM6ivWdvsiIjUojKhUKzCGTVzzswpLnYdVAgAwJW1dJW/RL1MyOGXCuX1ejm5ubu7eYxzstNVwQEAxEn3z8TqTvCwVMMAADCkpNPTtt9AOwPRuxexn7h8vgAMRi9dMvMnt6Gm2oqVK4HJtdPVJZNfRGVwBaBpNXLchJ8mTnUb0KWtHAZ4W2PbdpF//n690HpUv470df9xsDQtbJSj77zIKJNYM1UUH5Jl4GzfXVk2FWW17/PTlKH6igTezL64zMDYmgZWfYc7DNLJfh6WXiqm8K4uK9csnOjm5uE2xFRLEcMwNT3TXoMdh3TnhT6PzxcISfX+c9csnuru6jbWwaa9SuWnvuzdiUUr7+kuP7pukDrtjAtT1Da07juoD4ed9jrmE69MKCLVrX+aPWf6REdzbSUMACNUtXWIjNDQlHy+QKzSvb+zm7u7+7gxw021JCaLcl9f8eP3mz6ss6RgjNW2s2Wfwf3M1LJCXyfn8filwra9f1k2b/KEkX301CqrKoYra3dqkxsZnJBbKlTQH+w+YcKEiRMcLXWUmnngC6ZsYNc57a/fzqQYOg83+N7piqQiCt0x3JTD4Zj0WxdYSneBOPXURDMOh2NsNfdaLkV3BZX3wGuQaf8V/vm0H39XXqyPiymHw+GYDFwXWPLdZ+/PTLGyHLHyUlxx9Tvx7y6xnXI2naxWevbl2VbGHA7H2HziqVQxfamCd0d+srJ1876RyKt+Z/H705OtzEftiaD9/j8aMuvW4oEmnK+Y9Jt/JV3K12SQAvXl8mwLYw6HYybNRkr/XdfPhMPhmDrsChfR3kP47uh4K4sxh+LpP64C7+n6wSYcDodjOnJPlLDqZ/xoHzczW4/Nd9/T2Jv47QG3geufCat9IAzb6SB5bvqvC+TTl0rl3vcabD541pHg7BZmIGT2Pwt7mw71Diz87vGUrrpUZuybHBIAY5vZmNMul/hvopJIACAMba1plymit38feZDffeIc+3Z1GLlZHDc3YzkMgMp7dPPfwm8zY1TwYveSg5mDt5zYPcG4Hi9oxrSGOtkqYABI/PZx4Cea5TLKf7Zr5YkCxx3Ht7ob0mykEfrjZo9QS/2fz5X0ZuitxtuP3rzJtQPx1Vlf8HSP999J/288ITKhLC76rRgBELrWVh3pFk9kSmRMMQWAq1r3og1xQ3kPjp57Jz901uQedYiAU+ozxqUTAQBkxt2boZUc7uTnextXnBeP3Xtig7M+nVsFAb1XTs7caXgHAgCowmePQktprhAmX1i19WmnRYcPLeij3ZwXiDTg2i5zxup9ue17Nq6qYUtXLl5sZCIJAEQ36/Jl2HeI3kbECiR9bqNH0x6o8NGpy8m4hZtrXXoUAIiuoz1s5DEAVPLsZkBOudIIEs55rb6rvuDQTtdOUlpdmqMV0xwyoly63jx88vF76RK9v7R27b128303O2pLawfVfh4jdEVRFy5ENcddHUx92Nodk/TlvjrrS8L/WPtnLJ31MtBCvo+KKaYAMGUL6550ZkrlxkRnUAAY28zOgtZPm3jpVGCxhv2YoXUZngHkzN1cDVkAQH4JuPmMK/kl4kYcXLI5tOfaw+sGa0izRWnbCXJmjuXSlf80oJp0ocKXe5f7Zgzc5jPbhK7+zR62mYerEaRcPRdUJTRBqnJ91SVNSys6XQIyLTIqjwLAVCxs6cYilB9451kRYWI/vFNdl9O4jqNHfxUMAAlCb/l/JAHILP+Niw/ljvzdd5axorQ/k75BhGkMHtFLUSJdAY8zKs+bUNGrvcv3ve+/dV/NHSpv6ThEBz753whtjtIFmGrv5TtncRS+apfg3cm1vq+b4ZZos4TKjYrKoAAwORMbM1ozEMREvBUhAKK7jRXdAC5+c+duMtVugKOdVPv8DqL7aDdzNgZAFf5783E+AhC+v/zrsous6X/sHtulIdG/cmaO9p0k0hUUEFJFukSpV9asvtVmjs8m6aNzc4fQt7c3xAsCbwZWWohJVS4yNUKiS0pWdsZ0rYkKIsLfkwAY29SWbixCuYH3X/Hxrr171+MAHKY+1GNoOxwACWNu3k4sCP9j8abQnusOrR7QwBMumPpgp95KGAASxz149E26xGlX16y6qfSL72YnnVqqJ2fa104V8p89CmuW0gWgaD7/t8Xf9l6Q6MPF9bsC81uJdpFvDk2asPVJdk2rdcSNOTnPfW1A/cMFBdHluqRvY6VBZ2Lid2GRfARA6FjZdKYZwMUx9wMyKCWbfpZ1n8/gnUa691bEABAv+Kb/p5zA7Z5704fs+mOBeUO3V1imTvZ6BABQBf8GvOZX/BoVv/ZZtie57xafuaZ11dXmCNG1dx9dnB/y6EXRtx6WFs9VGB35ngTA5IxtLGi/tCA6LFaIAAgDGyu6aTIvMjiqDJQ5ZvV7G7xKP3dHbQIAyJSra6YvvcSe8ccuD72Gn6DD2g2qkK63AY8k3ipU/Npn2e8Jdht959WlQ9kc0x4ElR8RltIMfV0AAMA2nLbTa2C7ir5E5Kdbm7fc+SzT2NwqlL2/scajr7XdILcF+wMzSUD8ZP/9y8YPtTG36Dtq3o6LEXkyayqCM26m4eu1szYFZNHfE3Ejjy5ccB6f8MugegdwiOMjYvgIANewtqG1UzI9PCKHBMCULGkXFuT7V68/k4S+CUepHsViWg7uA1VxAFQWftZz1ppAvZWH1g/VrHWEly7MLGMnhy4SX1fQg3LpEqddW/PrdYWZPptH1DY6/5cgbvzVzdNG9O/Vz2nqhusJVRe3ZPaL0zu3HwnKqmq7LEMTjjzwo0Ir+bqkfKWymLA3IgSAd7a0pG1RcXx4NA8BEJpWVl3pxqKU6Dg+Iroa6tdz+qtg4+6iRwAA+eV9Sb9tB+ab1cEmahhrsXYDnfooYwBI/O7BwzRS0qHX2DN8t7q0r5Oo4hrdu6njVEZsXPOdxxB647ZuGKHz9ftQuY93el/8IG6SwsreHPvVN8f58OXNfQqfnVi5YtexDVN+OZxsMOOP2wHXdwwuvLnjl+m7X8rqvATR3nHL8c2m4d6/rL/38fsvhLiRRxd4XlZacMx3kmG9vThkRkTkZxIAUzC3pl9YFEWFp5QP4OZ0CwtuXOwHElM26N6hXuMzpjbY3V4dB0BkZgpy27tnogG71j+qMWyWxXG0L5eup/7BJYCKX+9bujvedpPvfPP6iGpTQ2ZcX73wXNnoned2OaHof7auPZ1QaUASBB9Z53P+4jn/+O+2mRS7GerhVNGb2G/nOuhnM+KEsCguBQBk8rFxZsekVwRTtLA1oelzxPvwPpvCWDq69d7LkDMZPcrwr0PxYkACChTqMIzW/IhgbQeO6Kvy5CEXiRMe3I8wLt6x653NlgsLLJTrWiNcu702jnI/pn0kQau26R8SFGQXlFL1fGwxXLGdTru6fFlp4NpO67e9frPwSrok4QNV/MrX+7TNmbk9ZRxGi7hBf11nTT39i3XXxBdtMfQl5n/HhT8fOLdqgDoOAB0Xzhp0bfn9S3vPj7260Eg2G1lEx5HbTyDv2RtnrUOndrp0YlXUJPLoQs9LSvOO759q1ADnMyqOCk8iAQDxH/7a2/jXGiqgb2tFN4CTaSmpYoRr6bav76pAqY+rk+6tC59IBGWAySBakOjp6KB/6lgySRU99w+KzfX79Rp72oltzroN6gIR90suV1TfsQdjt9HSVKmhKaiMy5sO5I8/7jueg7++K0JInJr0XghG5SsfcXzw63wKWEZmnO80BdfW1SYg8WPqRzGUh8/SFkNlRUZmkQCYot2cndPNv90FSXQfkflPD2+9lkQCi2NrQSfpVF5OHokwDQ2NenscxZlx8fkIA0BU0dNbjwsc3BuTxQUAMLUBTn1VHgVwkTjh1IJlmMG0E9tc6jNG4hpa6jhQXzKzRbUmNBNHHZw07Ww6WVOnYxj27b/yH4jOM/66tdKiMYllsLYDvHZOj5p1OkESnYr40UfXHra9sMJaloMuKn5690V7h4VdCVSc8C6DAkyx76oj5bIFACCvrqGCI27K8+DM+UZ6slqmsDo57zgJ3rM3zloHp3a6dGIhbsSRBYsvK807fmBaz4Y5ccpiwyUOj86ua70c23+raoWZi+KvbDn6vIjC1S2t6ReTeTl5FOAamvU2UUFqXHIJAgAgs+7ffrWs1+DGdhJh5ORgcDI5iaSKHm+a+Vh14Nbziywb5jhDuX6rRq4P4tfZiCv+Uxq8zf+g9OdVHHv5QqLd4v095UEc/Sw4lwJC36j719kmmR4SmkkC0cXSQus7u8GUNTQVMcT/nFVIgWTtSxugUhwZliAGAJax/XiHYR2rW1/Zs6CtCACIzlaWtFsWiFvMRYApKCnWz3RRYfCexTveGA2z4D2K4iFe8K0Hn10n6dZyk1qOn2FtBozop/rwQTFCQtUBO3w9reqXFQaTV1LEMUSVFPMRKNb8pyyrX+/H1jB6Ny3K1p67FkRMPRDJqzgVdNZ7X+/L6/vL7ggPlVeqZj9+WGcChHHhMUKEsUyGDKxsA+KCAi4FAHk5uRTITLmginghn7Gpvy2/ojjvWINlC4BMCossogDwdr1Hj7PvX325RmVknOFTAJiCuQ3dwgJQWXGJEAGmWE8zJ7MfbFp8OMdmiNHzwAQhlf/41rNVg5xq6yNp8VwVEIaOjt2PJyWQSIj05+zbNoo2PK0uYJoeR8M8GvjH0kHFedBz2uRBbTAQRt19kEFiLLMRThXvYQdUEB6WTALe1sK6e7Wa44pKijjwSrglCHQkv6EpQfgmNLoMARB61lbt6S4gk8Ilfa5maVO9EAAAEAlFAMCWr335XglBwl8rvO6oLTi8f/usYe1wAFQWcete41MWYAoqbdgYABAmU5e51L9DMba8HAAqFQiaraOrAnmjWTtX9lf76qwXp1/Z+NsTGTroCINxW7eOMyCATA6LKqIA72pnU9kBTKbGJ5YiAExVrR5Bw3WE1cl5x8kt5pHrJsz+n9ysPw9M5zR8y4zKjIj4SAJg8ua2prS6xI0qH8CNbCzoxzqJlWP1MnPEjTi4eMPrHmuP+Gyc0lsRA6CKn918nNf4LsJUVFVwAMDbOi2aX/2g3w8HUx/mtW+urRKAIOzuwywSkzMfOeKbk7w0MiSmDGFsUxszGgcHm80GQILSrx59usV7UlhkYU26RH2ODM8gATC2ubWplD4jWAQAIKru+1tk9oNNiw/lOv/u+wtHUbW/h6MOAYBEcbf9EhsrXeSHqOgCCoBob2UlLZq1RhAFAIA1O1ugg+jy07b1Dt+m27gcgdXX7VYHqOzIiHQScA1r226VmpTKjYpIowAwJYNu9fNa17HY4rTYlCJl9bbCj3FJuY04LoC4UaHxEl2ytaTVWNGbsBgBAiA6WFlJmfXjLBwAgKp7+wrfX1m17CIx/cDuMV3YX7cY+a9u3m98rhN+bGSCGABjm0o59NJcEIQHBH6hMDkzR/tvCzpxfHgUDwFhaEsb+E4hBAAYfP2oeo9QWRFhEl0ytTGlbQBeVFi8GAAIQ2tLKVNcTEFBHgCEQmHd+hRxw/7w3BDS0/tweeiWgq2HSxcCAMj3fjejGpdtBeVHR6VRAJiypQ1tnHRtUEKhCABjs5smS4+swXWcN27x6ERgALh6/1WHNgyrfcO93vCiwt6JAVO0sKsyePGiQt6KEGBKtgOsZR5CROW/3Dtn+T1NzzP3ru0ZkLxzzro7GQ0VL2FsqESXdK0saXWJTA6LLKAAcFULGyN6q8EUFBQwACQUltXJzKncwO2ee9IG//bHAgtlDABTGzzGQQMHQMLIW34fGjlAi+MjYktRDYdemgviuKcv8iggjIYM+tbyZHp4ZDYJRHv6uDkkKhMiwOS+zW6rdQkqKR+LCEMb+gYQxoZGlSIAQkdKdD0AYG3aqeGASrgldenSspSLXssvyc04WSl0i2U82qXH2YPvxOSn+zdeLbUZ1HAPZmlMWJwQASZvYksfm1YLiF9cQiGMUGtX+/pHHLlnVG0eehowosvM83e8GuWhr4xcW001OSzXcLrP3sndm2L8FcWFRpUiYPW0rbJHWxLyJISHgNAZOaEiPT6ZHXzu6IWnSZ/zhB1Gea0eknly/53UAr7CwJW7F9jVw/9G5b/cO3e5n4anxCXfc/3JXTvmrJntjZ/c6VL/gD8yKSyigALAVSxt6U/9ZEeEp5Hl0fXSWpCl1lYFh+ISLg9BbQmGEC/66JI1gXpef20Y9nVGrNTbzanDzfMfSXH87Vtxs1aYN9wAyPSIyBwSgNCysqZ79usByr2xcOT6IF699xaVh2x/cNijlu0KKvvt2xwSCD0b628LIFQUGZZEAtbG0o72cChVwuUhwNuqf3WEVLtKGBMSJZDokjX9qZ/k0PB8CgBTspA+g8E1O+gqYOElubmlADVvL5LZD7cu2pc+ZO/fVUO3iG6jXM2PxUcIqdxH/wSuHOhSt2Nh1RG/C4ksQQAsAzurBm1TUrlf8ijA2+tq125XLItF5/0nltZ7dYYpaXaUlWyRGbe8l5+Mb+e069CKXk2TXot8HxaZRwHR0dyskpOLyva//KSAwrUcVy7srwIAgIqe717zT/e1v5/uIZ/850/jFo35Z+C6E97m+z23nfwr6Ge70ap1K4/KD943b7mf+qJvO4mE9tD1J3fvmLN6jjd2codzp3q1HpUVLllYyNF7VQB4kaHvxABA6FtbSrU8XLejLg6F+V/yEHSssUBR6vXVS88KJxzdO6FK6Bbb0tW5y8Xj70ky/c6NkIXm/Rp6tBAVhIcmkwCYkpUtbWxaPcA0XXbdsS2u/3SWrda+9icM8Ut4CADX0qm0gyiIDo0VIkxeyuFQVJSbK0SYoo7O18nU9/1NJoaEF1IAmJI5vS5Rn8PC0kkATI5jXcMMhtW1W2cC4rM+ZZHQRvoIIEj337Vk/T21OZc2VIshxjs5e/Q5FPmUh7hPr93PGjmpYe/RIj+EhGaTAISmlTVdzGztiLM+faYwOYMedTkOgCtpdvqRoX+oMHjP4q2P+ZZLz2yvV+xHfaByI8JTKQDIjU/IpSwl2zhUzsN9R4L57G5Tdm1xkfyKTLxwqmD8vvE9lDAgKZKiyrRGzHLRSF//+qO8+aS6xp2jwte+85fd1VhywneKUWWTw7WGeJ/YtX32mjnrsZM7RtZ96wUVR9SiS2XRIZGlCADXsKzBaghdg66KWFzOx8wykB7wiXgJV9cv2RlquPb6Utvvp+0sjqu78RnfWBH55cG1f5f1HdHAhR4/4nWMEAHGNm7YwqIqbLX2emqNvgs9eFsNdQIDcVFhRbAy4sWcPv64iAKiu60NbXeQWRlZJLD0exh8FeVycSJz3716k8XjFybfu5dOAgCuwE96HMBqo6huaGfZUQFQUUpIVFpxafGnwGuxYgAAOXH6i4DH6kptuljZ6ldbRhGdzU3b4W/TE5NLwUil6mdUflJIdHJmekLYY7/74ZkCIDonvXoWpz7Y5FucZ9nnmOcvX0cWsDAAhMrCzu4508bVUE1BrYuFuV5dXuxA5ieGvcspKytKunMziQQAkCtK/PepUEleq6etkUY9HmkyMym5hCI4VmbNb7vme8oSz6/0Op/aYfwfB2ZLP6LeaPhRoW9FCGOpsKIPrD6ku3mqKSsl4Mj2vf4iy7mHfT37Vjx9mJbDGq+u6hgAoPzoyFSk7t6rB0vJdOeTMEpOnlWX5kTciIPzl/lprzi5d0L36sMxrjXU+8SuHXPWzt3EOr3FQafGfkW89MjwlPzSkuxXF14LEAAGkB0Z8OijknJH8z492mFQlhUbGp/D5+fFXHucTwEAxi548+AhX1VRm9PXpHpUtbyJRU+WX0RKYho5glP1U1HO25DYD5mpb18F3Hn0JlcECkpxL18kOQw0/PpsIn5G+LPg1+9KWQAioIoCj+35HwzvrKKkYWhprFOXRT7iZ8REp3HL+JnB557xEAAmJ/4YFvRMTV61i4WFnozSTMoWTNNh6uhTIddT75697bC8j3z6i4u+PpdieAgIXVtb2oUu4qYkfaYIHQuLb7MXyfsWyTf73X8+nY5wHMcxDMMkCQcpsVgs12/Lgz/HaqKP52a47o0mKy4ov4IUiaDnkn8uze1WvTxugJfDigfKk0/fX2dXZfZKxu13m3jiA4XhkjzQCCGEQMXJ55GPU8XioSzIe+iim0UIKy+s4jI5C69bf8/sUrU0np/nkEt2/5ybXil6iIw/NHb80STq2w0qClJ1PvBkj33dp0Wo6M7i4WuCdOdcurnMpFknN6KyA9ZP87rDH7Dh3IGf6nvqqj6IXm93nP2/bNzq1/PL0cktxwI/8Fntuvd1nTpruqu1Nn3BpY9XDV36b989j31H1nGFKIH8cGPXFdbkZaP1a3iSqdyXxw6+sfl1Tq+aBhdUdGeJo3eQACe+M2Ix1eHnc3dXWbGEzzc5LLpRiFV7Dkjlkfsf/z68utWQiUfGjzv8wW7jg5MTtCuXLXqxaei8awWA4ZjEyimEANeedDJgfe/yFkKFNxcN8w4qA6z8oorLFAbveHj4+4BO8s1+94V5q6smaub5Lx/sFVBa5Q4IIQQEZ/E/l+dLCVn68QgzAk/4HrsdkpwjUOhgNnxEx7iTN96Sbd0PPt45lGa1KHi+wWH+DWLcsfub+3/9uMmSGVLF95f3NjYdtT+uWhZGihTXIUdqnSm5s4g2J6ps4D1Z3c/E1P1QfPNOJklxIw78ZGtiM843jCZtrEwRxx9yN+VwTEfuja69TSiSRKg8wa7FjAufJb0kzk9NzW/SRiGxAAAGuUlEQVTeDVpHxPGH3E2NbRf7FX3f6hRJyrIjxLG+o2hzorZ4qLzbnr2MOSZ9lvtXa0SEEEKi8N2OpqbDtr0SVPpl0x0ix1QHjXfWhdSA+2+/PyeL4URznMbSgbjP7z4tYlt6jG624xdAuQP4xLu2ztsPLraRfQRoFajciLAPFOCqFtY9am4T6uONJcOsB3kFFIsTngdnQ8XxfVT85Hfva+lNl8viP4QwdBtnI1/66t6Tgu+2ZcrXFAyVKU24vGbCyFFzT8Z+zRlFZQfcfFGCCC2nsYNp95PKIu4/zMT0R7vbVJ52N2X6C8VeU6eYy6XfuRpc0oSlNClUtv/VIK6O67wxjdxobkpQ/rPfPHcEiWyXH9w6om75LxoDLzIkToQwudrjHctig57nqfW06Mz798KjbDmcrSAPAILUO79dUpg1w7x1vEIb7+g601mr9OW1OxnNNQtSswHl+u3bfSc27f2rgODyt1Ii7stjp0MEWNsBnnP70nlmUVHQNf/PqkPm/mxSZcewSRP3EAaTvKZ0zfc7dq2xUXY/iNKw06deswcsnNtPpfaLfxClcSeWel3J7Dx5n8+MnrJL10vlxb8MT6se0kMWvPo3nI8A72hkWNvkTqHfpBm92/MebFvz0GTX9UOz2vktnTR1yuy9ScM3rx1Km8ivJYKpDly8ZJBC9NkTL5hUtDVD5WXniBDgalbOQzoTAEB+frhz2/VPuJ7Lxo3utLED4sQLxwNKzX7xdPreYJp6EcuL9B1r3XfulYym82o0lZ+rNNrX3WqA562sJvKgyQDxx9vLh5qaDll6Q7Yv/aEKH60aYr/1VZV30AhDJC+DqoSpi29svQqmmtgH94MQZ95eOsjSZVdISe3XNrSI1uDnKnm+ebhpryn7g5I/52UnPDnm6WRp0stj093UMvrrRSlnp/Xq9fOZ5OpvVZJV9KNUlCwX7fN+P3Pfhr/N/5zR7F65Kh1U8HyP92XW5P3bRtOeOm8GoKLX+zw3BfAsFp/a6S7tNEODblz88sCeBwqOZ62rzOEIA4/NhwdhLALHEEVSJCkmMU3jnvXLqddaZlpVIXRHbdmTOGfpxt29zm9pPdNJmaPc79ejO+T3nt448Wwxrta+m+XwVScnj+nTkV4XeNHH1h3Lddhx+ududC+6+C8oTb6y1MllY2Bek4y4/Acr+s3832dZzozEH/43y37sjqCc5jvdEiSem9XPxMJ5UyD9uy4bClUcfmhKbxPT0Qeqbwoz1ASZ92LPBPvpZxKbZF4kTjg81n7bqzq80rF1QGb7ezm5rPZLl/KNJfFc/wEkLzcftdNSaQoHMhIKhCwFeVlOjcoKs/kKOu2a62ueyOwA72led0sHbPhLlqFbiBt3dee6PbeTeITxkuYcD9RsKSvM5ivqtGuStQUpECAFhSZfJTUXxNycIkJLQ1rY+X/WDoSyplZT3RuTbFrJEvm2dYph/iEgbsTBxev88jlzj/0+XkayhUozXlw9dezMzYhsIQKMbTbKuX6vPmEAgKa1G0KhuQ6kTQJLVVujpo//s4owyAbh+yurlp6K13Dd94dn49LHUUJuTlrSuzfhwU8DA5/HZvIqMlxg8tajRjQokRkDw38Eo1wtCio3cIfnzqf5oNmR93Df2kfShavcC4AQhb65wShSLBaW8UuKiwoL8nJz87kCunQ8mFJfV8fm9KIrBoZqMMrVguDFHlu65nqqCAF8iXnoF9M0peBtBrkOa+RbSxgYmhhGuVoM/GAfrz9jeIBhGDRoXwWrHJIg9QdMY7jrwKbJ6sXAIDP+s71FBgYGBpnBeDMYGBhaHoxyMTAwtDwY5WJgYGh5MMrFwMDQ8mCUi4GBoeXBKNf/D8RCUatIQcrAIIFRrtYPle23wsFu4JIbX5gAGIbWAqNcrR9UmJ1Dapja9mzW72xnYKgPTCQqAwNDy4OZczEwMLQ8GOVqzYhSb6+f6jFqhKvnuTelP7oyDAwyhFGu1ktp2P6tL/vsvrB5cGnQ4fOvyn50fRgYZAejXK0VlOP/d8rQRc7tP4WFZYNupw5MWhCGVgSjXK0WJbuF3mM6UW/9/BIxYxdnJqc8Q2uCUa7WCqaiZ6SnLAy/dTdNzsrVuSsjXAytCUa5WjW8l7ceZCv0dnPqgPI/vM9vmW8aZ2CoDqNcrRhUFHT7SaHKQLfh6rxnvhv/SWMOADG0FhjlasVQ+ZmZfMJ8YG+IOXORN2aKhcxezMjA8INhYuhbM+Sne5uXHnij3NVwwJy1s23bMad/GFoLjHIxMDC0PJjVIgMDQ8uDUS4GBoaWx/8B+kEUisNZWA0AAAAASUVORK5CYII=" } }, "cell_type": "markdown", "id": "c631139d", "metadata": {}, "source": [ "### 3.3. Entropia condicionada\n", "También vamos a definir el concepto de entropía condicionada a la entropía generada fijando, a priori, el valor de una segunda variable. Por ejemplo, H(survived | Sex) es la entropía de la variable survived condicionada al valor de la variable sexo. Se calcula con:\n", "![entropia02.png](attachment:entropia02.png)\n", "Px es la probabilidad relativa de aparición de la propiedad x en el conjunto de los datos en los que Y=y y H(X|Y=y) es la entropía de la propiedad y en el conjunto de los datos en los que Y=y." ] }, { "cell_type": "code", "execution_count": 57, "id": "843c1436", "metadata": {}, "outputs": [], "source": [ "def calcularEntropiaCondicionada(df,target,condicion):\n", " entropiaCondicionada = 0\n", " cols=df.columns.tolist()\n", " for valor in df[cols[condicion]].unique():\n", " p = df[cols[condicion]].value_counts()[valor] / len(df[cols[condicion]])\n", " counts = df[df[cols[condicion]] == valor][cols[target]].value_counts()\n", " entropia = sum([-count/counts.sum() * np.log2(count/counts.sum()) for count in counts])\n", " entropiaCondicionada += p * entropia\n", " return entropiaCondicionada" ] }, { "cell_type": "code", "execution_count": 58, "id": "465ead2e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "entropiaCondicionada(Survived,Class)= 0.8483634692722222\n", "entropiaCondicionada(Survived,Sex)= 0.7652602113304224\n", "entropiaCondicionada(Survived,Age)= 0.9012406875470709\n" ] } ], "source": [ "entropiaCondicionadaClass=calcularEntropiaCondicionada(df,3,0)\n", "entropiaCondicionadaSex=calcularEntropiaCondicionada(df,3,1)\n", "entropiaCondicionadaAge=calcularEntropiaCondicionada(df,3,2)\n", "print('entropiaCondicionada(Survived,Class)=',entropiaCondicionadaClass)\n", "print('entropiaCondicionada(Survived,Sex)=',entropiaCondicionadaSex)\n", "print('entropiaCondicionada(Survived,Age)=',entropiaCondicionadaAge)" ] }, { "cell_type": "markdown", "id": "c9bab41e", "metadata": {}, "source": [ "### 3.4. Ganancia de Informacion" ] }, { "attachments": { "entropia03.png": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAApCAIAAACa6lS4AAAAA3NCSVQICAjb4U/gAAAAGXRFWHRTb2Z0d2FyZQBnbm9tZS1zY3JlZW5zaG907wO/PgAAHO1JREFUeJztXWdYVEcXPnPvNroUKSoWFHXpCyiWKKiAGpViTaxRozGixoYxmsSW+NlLjDFqjNFUE1Q0FiAQJBJFKUtTkab0IiB1+73z/Vg0AnfZxaCuefb94/O41yln3nnnzJkzI8IYgw466KDDvwPxqhuggw46/BegkxIddNChE6CTEh100KEToJMSHXTQoROgkxIddNChE6CTEh100KEToHVSIi26FZVUSr2QsuXlGcIiSWeWiEX58X9mVr+Y5mop6Md3Y6/dr+/kHAJ58e2o5DJ55xaqlXi9GA649l7sX9kN6odbu6REfPe7pXM/uynivZBmUXm/fbryWFJn0hWxofjXFXM2XinRWjXp3LwhqiJmy9yQHwsRB3VmsQAkuyZ6w+yVv+R07kT41+jkvKvXjuGAOFTW0cXv7Iivptv/kKVZeVhSkng1/Oq1hOT0nOLqRinmmPYW+ExeuGz2MGs2AP0o9rP3wwYe+XJ612cZJk/cPm7Bz49INptFkgSBnv5Gy3njv4je+gb72TrkD8JCQ47R87//aKQZA09xTfiysZ/EKzgcDks5DpimaZqi2SO2RB6YZIKAyjs2c9qRXBaXy2aRJIEA05RcSg36JOqLAGMEALjlvKLufTnl7eOFHD0eh0USBAKMaUwpZBLO2C9itj3bONwQsdpvXayMy+VxOSyEFTKJWN7nvTO/LLG3m7Fze/6c5Ys/0//+Ux+mdr9SSOI2+oaE1yKSxSIRAADGNE3TmMb64w/E7fblUXcOBr/9zQPEYrNYLBaBAABTcrlcoSC9Pok6Nt3i2R7h2oR9729I9Nzz45z+vLaVUQUn5wTtu4O4XLayMsA0RdM0bTTpi6gtb3AAFCk7Jyw4U8PlsNnNFqcVMinHf1/UthFW47bue7hgwXsf6Z/eE9CDfAnGUQ+q4NvZAXvTaYJFKm2j5ByNMTFw+dlf3utH1F9e6bv+TwnJYrHYyl5jWqGQy+W01cxvr37k2WJ+vXSGgyxh69jF5+r19HhsklAanKYpuUxmOfNU+CqnZ6xMl/60cNIOIeby9LhsEmG5TCxWuG+IODa9q9N7+zc8mLnmfb0j337gbqiS4uTmzZvbN6c4P+rI5tDQz769mlKMuwnGTAwOnjwl0Ne9m+zu2YOHIiVOPr3v73537Vn56PcXDLVsIbaIZ9HXbYjPSM+ulbeSHjZI5ZS196KQeYHjx00I8h9sa/KskkjvHQ9Zc8Vm1ZENI80YFRsRRt0Gegwf4Wknz755r0IslckJm+EzFy2cP9XH3pyLABC7i3VXWe6t1MI6kQSbOvpMDAoKCn4rcISdGRcBAK669esl0bB5o3sqTYhYXXq6DfEe5mxSlngrt7pJJJZ18Vqw8r2ZM8YPsTVhtajbwLKHcZXw5v0qsYzXxztoxowZb83wd7PSJwDxengObDi386tEc7/xjiba5eYh4x4OniP8RrvgO9fvVcspbOI5/6M18ycHBE4b79XDmA2IZ9nPfbiv31Cj/OiUYrFUzuofvHbtu1MDAqZMGNFPabdm4Ko/Nr23r/TNPftm9eMyEQqxuvR0HjxiuMCmPv12To1EJlNw+oyZt2jhO0HDe5mwAQBxza1N6+/cyCxrlMhIa4HvpMDAoOC3Jgy1NWEBsCzdXFkx+/f9gUZM9LDQcJF7oUAcCzvB0DF+I60q4pMKxQqa6D1hzfqlbwUGBgf6OHXVQwiZ2DoN9vb36deUGJ9VI5FRZsMXr18+OyggcIqfh7Xhs4r4ChgOwDHvKxg6cshAdv6NlII6kUgisxm9fOW70yf7etgYtmgFaWzdnV18Oym/VkwZDfSdMmPGjFlTRg005yFABnaDehac+t/JPPs3x9jpqxIT3A4UFTePhvgLHBychk7/5IeEEnGr38VZ3y0cPnj4MFcHvuMbn1yXqCiGrg5bLHDg8/nOU7/OVTB+Irt3ZJrAdfKXWfL22vOkuAshgx34fL6Dy1snHrQsj669tnGM89BZ++JKZW27c/dg4IiPr7f9QZa03c+Jz+fzHYdviBWpqLUqYq23i/fCr25WtO0CXR+70cdpyNJz5ZT69r8CyBM/H+PE5/Mdh22IbT2IGGOMFQ9PvOXM5/MdBIvDqmimL+jqyLUjnYavvlrD+HMLUAUnZ7k68Pl8B88l5x61/J4qPx8y1MVn8bGkaiYqSFL3THAWzDyhgievBvSjM++6OvD5fOe3TjxkbJj42oZhjnw+38lvRzIzhV8tw3FDVOhQRz6fz3catztVRRMUhT+/O9jN/4Pv02rb0piqOLfUy2nUxthaVeOvehGV5J0NnbPk4J8lbOd5h87+sHWWV7fWTi1vwOyPF9g11sow4rkNduaoKEmakZIpwwCktcegXkyeK66OPHL6HnfUwpn9NViLkNnoKaPNCAAsz7zwe/YzQQpZ/s+hG6PMF355ZOVIG3bbf4mBed/Ldhk7phsJAHTt9ehEMcMXstwf1239q0fI4S/fH2LZtgvIaMTC2Y6ivw4dudGkvgMvHXRpRmYlBYA4zh4uXKYvRJmpORQAkPae7l2YVh353e+/iqzp99YiX1P1mzjCdkLwED0EgEUJ4RGlz+yxRelHVv4vlb/m64OLPMyYqMB1mbtgOJl2/ODVau25Gya9k3ZXgQFIG3dBd6ZWU3nC9HoagDByH8xnovCrZjgYDhs7woQAAKo4NuaeguGLprSv1u697/HR0T2zXRh8a8JywqIpto8u7v/ujopQjAopkeb++MG8zZFFcj3nd786FurD1GwAANJ2tO8AEhDbcYiHkQqOUTnJqfU0AGEoYLYzlf3Lidh6c9/JozSgKQCA4bCgcdYkAFD5v4cLZcq/pKvj/rdiX6H39kPL3VU1RRU72c7+zVpS81dUGy3BtTf2rNpfNGLbvncdGUIEAABA9po4eRCn8vLpyxVqolOvAE0ZwmwKAMi+7m4mzEKRkiFRzhUPW4a5gmujT5zJJVwDAzSZCQDIwi/Y24QAwDJh+KX85rlAFV/cuPI76dS9++YMUGVHQBZ+wT4mDXGnz+ZpSySbyk9Nr6cBkIGr+0Cm/tNV6WlFNADiOA9yZerZq2c4GAwZN6ILAQBUYcwfbbSErojYvPo76dTdO6b1VjXCHOfggAGQ99vpOObTOyYpwfUJe1fuiq+hUJc3Qncv9zBup/ukTc8ebET2HeplqUKV6HJhSjEFgNgq7KzI/P1yLm36hv8gPdX1tATXI3BiLxIAqNKI8AQRADRlHF3xYXT3NYc3+apqSHtgO/v79lBqSVzU7RZaIn/46/oPLxgv2rfJv72SCUsffze26NaFqyXapiVPhcLCTcAkFEAVCFOraQBk6OrJJPa4Jvb363Wko++YHhraFhmPDPbrSgBgxf2LF+8oAPDjv3cu25bpueXwai9GOXv6T42G+w0zVNy7cPEu0+L58kFXpaYW0QCI7ejhzCiBkvSUu3IMQPbzEDD1TRsYDgZe40Y2a0l0ZEvTijOPrd6UYL9u/2qvdud6H19fe+JxbHhsLZOWMDSq8fahzT8/kGOk57Z4/dSeamLpiMUiiW5DhvRW9V1TavJ9BQCQ/Zh9Z0V6RFQRre8xzE3lStUWLH5AAJ+NAOjq6PPXqovCN6w4KZ2+f89bfVXtstQV6DTW15YEAPrxtahboid/jetv7Vu5O3foln2LndTQAJkPGtqfVNyJiS3VLi2hHqYohUJfMMiBybvEj1OS8ykAxHHyZBJ7XBUbkSAient5ddOcw3pegeO7kwBAFV4OT6y9f2p16GXTkC+3vmmj9nBG33OoK5cu/DPmvlb4JZK0ZqHo4yEwZ5ppintJQhEGIK0EHkzTRTsYDvpeY31MCQCgimIi7zzVEroyasvq4/WBO3dM76Ni7/EEZG+vITaE6Hb033UMWtKGG4qsUzt/K6IwkJYTl0zvo3bcFQrSyG70SMadCwCA/G5yuhgDkJYCdyY7U/kJt8opso8jX19dVc+C7D0xUMBFAHRD3KEFiz+/O2jLlx94qnL7nkL1/pvlMNavlzJeEhfZrCWKgrD1oWd58/dtHmelfhKRPRwdTJH8bmKqVsVLcG2aMJ8CQGwHD1dGOZSkJWXIMABp5yFgcsCbhDdTpWDAd1bPhmfAdg2Y1JcEAKriyufzlh6umbhr3zzVG5tngEwcnGwJqig5pVwLRFmRlZIuwgCEubsHowGowuSUSgoA6bsx+nTawnDQGzzWx1wZL4mJylBGPCR3v1n9SXyv1QdDhzHGyFqCZe/I54IoNZEpXtJ6hohu/hSWLcMAZK9JU4do0Hd9v+0xF9Z5MQbzAIB6kCxULokqfOeGOxkPKGRg169bx1IJCJtxgUP0EQCWFBZ1WfjF1vHWagtoN9uIxff3bdaSv67ebARcf2vvBzuzPDftX+KiGQdIu/52JJbdTc/SDs9cCWl6UqYcAxA93dwsmARRkZWc1oQBSAuBgMm3VOSl3RFhsre9ukWrFUj7SQFOHARA1z981G/toXXDNQwUANnT3o6LFNlpd6UdqvFFgCpKEZZTAIjn4s7s09WlJuc1S7ULk0+nNQwHvUFjvZVaUhIdlSEHuvKPLauO1k74386ZdpqNrV5fe1uCrsvMKGjrMLaa3ZJbV6If0QBAdhvpo9LT0By4JjXlodLOnoxLIlWQ91CBia421h2tDJmNCfLe/deVWhorpDTZMZozgRzo79fnxNFciq6LvxqXUXUpNIwz9/g2DTzyZhCm1lY8RFUXFDXgQWqnjbzhUVWDvKPHFIhj3NXCsAO2UtxPSm2gAYDKPTrV+Wg7Jeu5ejoyHQo0PcivoBHLyobh6KpdkLZvBnp8mX5TgrFEAdwObPA5ljbmBC4rKqigQOXO+aUA16cm51AAgEV/hHo5hKr+kuzjKWCSai1iOPA8x47qeu7XCpoqjb56wy/x+Md/9Vh1av0IjVMrCUsbSxKyix8WK8C+1cC07J4iK0nYQAMAYezp5dAJSUKS9CTlOXBvd0Y7A1VdWU0DYW7R4URR3Jid+VCKAQAUOZcv3V282vlfNpgcMNbP7pvcHIqui9k0P8ZoxNYfQtwMOtAuwszCjIDiitJKGkzbnwG46tK68R/HidqTEqRMD275h773tquHgjS2Fl0mFJZRAEhv0KLt81z+oSNWLmCYqvnr8NawHApYfE9XJu+Lrq6spjAyNzfvKJfpmnt3lHdNsDT5wpWiqfMZUwEYgMwtzBCUVJSW069YSqQZycrNX8+Aj9b6W/9D4Sf2k2f9uuVIfB1NmLm5M+9/tIjhwPMc69P17JkKiio9u2YRtgw4+MOsfh0IviADcws9hEXlZbU0tNr0t2ybvKykkgYAIPvy7RlqwE0FKcIiKas5CxcAANMURdGExcBB/dvm8CnuKX1nwszNw44xo0Ra3yjDgPT09ToWlZbl/xq6+gwMf6Nn7PVCiiq8ciE5xFnlNutJbe3tJAGAtPf373cs5z6FZbjPor3bJjImEagG0tPXQwCN9eovPyGL4CNJwR0q/XmA64VJ9xUAwHLwneY3untbI0uvx23FAED2FLgxngzghvoGDIjX0RESZR5buT7WctTgxuhb1bQs4+Kl3LkhAzQzKKFnoEcAbmxofMXJJVROkrCOBiBMvSZN9R3edkrQRUUnRTQA4rl4MPl0WsZw4LqPG9017OdyCsvYLqEH1jNm8LcDQk9fj4CmxoZGDFYtf2ohJVgiFtMYABDXpjuTEyGK2784NErcurmItJpxPPLTIa0tTZWkppQp95meTsxrmlwmBwDE4XYkLk1XX9u+fPfDkTtOb7ILn5VwMFNOlUdeuLHKa5RBB0phADI0MiQAKKLL2JAl7Vw3UAUOlwOAJRKJtqRXyTIT06QYgLR1F1gzeoU5ycq5YuLm0Y95niuHqGMjRBVd2LD8hHja4ZPLWUcCkk4VUVTupYsZi0PdNHNtOBwOAKbEEjlAu/U2RKzyXROpwb3Vp0DsgSG//rZUI1GjS1OUiQxcFQTGDanNUj3Aw5WZL9rFcGAZGukDAJA9Jn8wq78aZWKAcmQk4raJnC2kBBmYmnIRyDAg5llk8OaB5DeBVkjqUw/NnX8ylwKWU8hPJxY7GLEZeIobhEnZFLRnZyCUF5doWnMyiDKPfbA+psfq7z71tWTTEwPdv75zS0LXxFy4VuszQYMwdHtFZwjvKwAQx0lFVqgaYGU3VFjv5YPKSRLWticUdLkwuYgCQBwXdycVXCdZJABgWuPTFFx7Y9fyz+54bjm9arAxSQUF9Pvh8H2KKr5yIXG52zCNDkSbtw/qDWnku/VixGqZpk0DACAMumrobeKG1MQspVB4ujGencgzk9IlGIDsJhDYMHsd2sVwuiotrZgGIAxdPQY8z3aJxhgAELRtRavtTr+B9iwEgKVVj1Q/SEGweNzaygoaAMhew3z4jDoCALLM5HQZBiBt3FTZGfF4PASAZTKpZpZWFJ3/aPm30un797zdlwMAhM34oGEGCAA3Xg+Pqvx3h4eKrJQMMW4nK1QdZDIZBsThdPYN/OcEXZaSpBQKJw8nRmlsSk3KUgAAae/upiI7CfF4XGjumiaQ3D+1KvSS6dJDzQFrst+kQBcOAqAe/REe36BRGVgmlWFAiMNWy3aWkZVtx9DdTNP0DllGolIobARujASmcpOEj2kAwkjlxNQuhoMkXXhXjgGxnRiPm9QCy6UyDIjN4GO17D/RzXsU/4AwXSbPiL9VN02lAiruC5Vn7SZunircYqWd65oFUGW+NcukiyEB9Y0NTRjU6i2uid8Z8vndQdu+/+eAHZmODvA2uXa5lpYkhl8pmqJxaI+hwYUpwkoKgOzKnAKjFrixvhEAdTFVr0O46vzS8R/HNXX4BMfA57PIw8Ea7XBxY/OiStp7MEujLCMxVYwBSCsVebAAgIxNTTSOW1BlVz5ddrhmwoHv3hn4hKlkj/GBnl+k3pDQj6+dj6keE8SY59Wy5U0NjRiILmZdXmXMlcpJSnlMAxCGbp7MCfMVKckFVHMerCovVpsYDoqs5DQRBmCpSCFSC7qxoQkD0cWs7TWdVhYie09bMvGH5ecqGuOOHLk9cr0Xo1tHPbx1u0zpFnuocouBrkhR+s7tCiBh092GgNqaR9UYurffC/Gdb1aGXjZfdrLlATsyGhHoa3E1rJKWZYRfyJ67gv+clsaPkxNzKQCkL/BkzCBQC0V11WMakZY26jObkcWEHb971nf8kRqOibWmkTJZ+u1UiVIo3JkT5nMTk2uUST8ejHMFAICw6GbDQ8mNVVVigHbNgutu7QvZlMjfcLplCglh6Rs4fO/NmAbcdPP81ZKA2WrT73HVo2oMyNLa8hW6d3RZctITAjMLRZMw8Z4CAMg+7m4qJ6YWMRyogsTkcgqA7Orh8VwnY7iuqkqGkZ6VVdulqfWgImPvdTvmDOCB/MFPa5YevF7emux0Q9b5zaHfZmEEQNoLXFTeK2pIud1sZ4FqOwNpY9dbD9GVxaXtZiNRtSnfrHj/UOGIzXvbZkzqDQ6eYKu8+3QxLOm5X+ESpdxKl2HVKTBqQVWWlMowYWPfV21OIgBwTKw76Jnb2tra2loZa6pyVPbt5FoaAOm7MAsFXZ6UVEgBIDbfvZ0es3r37UkCVVZS1l4eu6Iift/7H3wvCd65Pci2VW3IbNRk5U1XqfDshRy16fC4qbSkliaM+tl3MKurM4HrU9QIhTTttlCMAQhzN3fVE1OLGI6rk27nNuflPt9iSZUVlVHA6tOfIaWtLcWQsdfaE9+YfvzhV3HJx5aMv+Ayepy3oH9vawP547IHGTeiY1LpIcu+Pm9zblno386urVc7qjorIbNMKm0qu3E6QYwBEEBlalR0iT7X3H6wS7c26s51dB3IupSSl11AjWuttuKStKQ7hSUPMuIjLsVl11LIuDT1r9v9fQf3fhrExfV5N64n3H5AIwSAqfJLX+znzxtqbaBnyffsz3iNvRWwqCg9raBBKiq9efp6EwZAbEVxUtx1E65RL1dX246klcjz7j+gCENnQevknZcKqupeQmZZk6g298qVQgoACJ4oJyaKZaxnZj/IrTsPcF3e7dSCenF9SWxYhgIAgK0o/DsqxkzfuJfAs08bGSR7ujiZEncLs3PFMMCwxU+44WFy8v3ikhxh3JUrNwqaaMKyIOl6avdRgu5PZwNVnXU9PuFmufKFNkXOmT2Hu89wtdA37O7k3oc5PkPlZ+fJgTPIjfF49cUCNxUKk/NqxI0VCT/ekigJXCGMii7WN+juMqS/KQJpWUZiVqVIVJ0eFlNDAwDiPM6M/ENkpGfJH+rYNo/vVTMccH1+SmapSNpYGPOLUIYBCK4oP/5avAGvSz+Bk3UHzhdwQ15OOU1auboy3MZCWFWurbTs9sUzF2NuJt0rfFQrAr0uFj36uw4e7hsQOGagKQkgzzvzc9GEuT4tyEXdOzR15vF8miAIAiGEkPJtJVqhoHjen0ceCmgr71T2V9OmHn4w6NPIb2a09GcbLi33+fBPCULKpxwxpjEGVv+lv4Y9zU+gCk/OnrgnnULNlT35DJkGH47+zLululOZB4KWVn/Y8q3FpqurvNdGieFJLU8egwKSv/zcmSUqI0FtQWXuD3z7m0ejdkUe/Jdh9n8DKvNA0JxvC/GTEXg6AAr2sC2RX0+xwMWn3wnYk0a1GiJKLoeBK879srhv2x43RK31Wx1pMPPbiA2Dnp3cdMnpeW/uTFEgZZJRs+G4Xh9f+ebtJ0fPirQ9k2afLKT/GaHmh3J6Lfjh4hpXRm+p6NTcibsyXT68dGqupleROwu47vcV/hvjJATZyjoKutuc05fXCViy+E1+IedrURsDUwbjD8TsGtM2ze8VM5wu/3HBuO2J8hYlKEdq2KaIY9M7sImUxH/it+Q8OfVoxObhbWMWKp5EeolQZH0Z5OTgufxSXZv3mSgFpf7NLs0rytg/kfmNqU4pPe/YdGfHIasj6juxydoBuj5ilZeD08QDd1o/IUZTik5/N46uClsscBDM/6lMO5+k6zj+IwzH8uSd/k5Oo7clML6XqAVPkZL2gVM9uOKEK38+bu0gESShHceq6kHlR0bcw5b+wW9oEih5vYCMRk570wYeRkW0fkIEEWRnM4iujL6SKDN+I9hfg9vYrwf+GwwHaUrEH6Woz6QgD8Y9kTaMFtE9YP6bXcU3wn4v0ooHKp4HkpSw8GyW2zsLh/7bdESthN7g2bNc2IW//3az8QXXROWdD0tS2M1YPFb9mfFrg/8Cw3FdXNjVciOfxXMcmQ/7tEFKABmNWL5iJC/tu+N/dyQFWntAl4R/fa6iz9ur1L4U9bqCtHt77azeNZeOhj14kZMBP44++mOWxcRV8xyf94UfrcRrz3BQZP94LErsvGCZSonXCikBIKwCPt4whr6066skrXo3SCPQFZd3Hhb2eGfrMvcOPW7zeoEneH/bfLt7x3acK35RYoLr4g/sjdYL3rTO57nyp7QZrzXDQZH/4/ZTRS7Ltr3TX+WhmpZICQBpM3HL7tn64Z/ujNWit8Y1gCzv5493ZQo27g9xe65slNcH+m4hezd65hz45Pv7L+JJIqrsytYtMd2X718/8tUdgL1AvLYMh6a0oxuOVvl9vntO33YuMmiNlAAgk8Grvt47Jnv7mlM5L+b/juVwuFwuuzNpihtu7ll9mvP+kV1Ter/8HIiXDnavybuOrzT5Zc32azWdPBukmUdX7q+a8sUXCxz/s67d68hwoCsjNn94tffGrzf7WrW7e1edV/KqIK2tEOlZmT7PxVy1oCQSzON15v/7RjVV1dCmXY3+oyESRlBNVTXYtKth5/ZZUlMhNbQy+U+FSJjxejEcFA2VdWRXc5X/Kd8TaJ+U6KCDDq8htGiDo4MOOry+0EmJDjro0AnQSYkOOujQCfg/iFRMJynofhwAAAAASUVORK5CYII=" } }, "cell_type": "markdown", "id": "92120b5c", "metadata": {}, "source": [ "La ganancia de información muestra cómo se reduce la entropía cuando se añade una nueva variable. Una mayor reducción implica una correlación mayor con la variable explicada.\n", "![entropia03.png](attachment:entropia03.png)" ] }, { "cell_type": "code", "execution_count": 59, "id": "c51e2414", "metadata": {}, "outputs": [], "source": [ "def obtenerGananciaDeInformacion(df,target,condicional):\n", " ganancia=calcularEntropia(df,target)-calcularEntropiaCondicionada(df,target,condicional)\n", " return ganancia" ] }, { "cell_type": "code", "execution_count": 60, "id": "a4768afa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GI(Survived,Class)= 0.0592879366074337\n", "GI(Survived,Sex)= 0.14239119454923344\n", "GI(Survived,Age)= 0.006410718332584997\n" ] } ], "source": [ "gi00=obtenerGananciaDeInformacion(df,3,0)\n", "gi01=obtenerGananciaDeInformacion(df,3,1)\n", "gi02=obtenerGananciaDeInformacion(df,3,2)\n", "print('GI(Survived,Class)=',gi00)\n", "print('GI(Survived,Sex)=',gi01)\n", "print('GI(Survived,Age)=',gi02)" ] }, { "cell_type": "markdown", "id": "a6bdf074", "metadata": {}, "source": [ "## 4. Armado del Ranking de aporte de Ganancia de Información" ] }, { "cell_type": "markdown", "id": "b9fae6df", "metadata": {}, "source": [ "A partir del conjunto de datos, una vez definido el conjunto de variables independientes y variables objetivos, se presenta un ranking ordenado decrecientemente de las variables independientes en función de la ganancia de la información de la variable objetivo." ] }, { "cell_type": "code", "execution_count": 61, "id": "c40b3392", "metadata": {}, "outputs": [], "source": [ "# Definiciones:\n", "# - Dataset()\n", "data=df.copy()\n", "# - Variables\n", "vars_indep=['0','1','2']\n", "vars_target=['3']" ] }, { "cell_type": "code", "execution_count": 62, "id": "3c32de10", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ganancia de Información de la variable Survived\n", " id nombre Ganancia entropia global\n", "1 1 Sex 0.142391 0.907651\n", "0 0 Class 0.059288 0.907651\n", "2 2 Age 0.006411 0.907651\n" ] } ], "source": [ "def armarRankings(df, vars_indep, vars_target):\n", " nombreVars=data.columns.tolist()\n", " for i in range(len(vars_target)):\n", " # Dataframe:\n", " gi=pd.DataFrame()\n", " target=int(vars_target[i])\n", " entropia=calcularEntropia(data,target)\n", " print('Ganancia de Información de la variable ',str(nombreVars[target]))\n", " var_id=[]\n", " var_nom=[]\n", " var_gi=[]\n", " list_entropia=[]\n", " for j in range(len(vars_indep)):\n", " var=int(vars_indep[j])\n", " var_id.append(j)\n", " var_nom.append(nombreVars[j])\n", " var_gi.append(obtenerGananciaDeInformacion(data,target,j))\n", " list_entropia.append(entropia)\n", " gi['id']=var_id\n", " gi['nombre']=var_nom \n", " gi['Ganancia']=var_gi\n", " gi['entropia global']=list_entropia\n", " gi_sorted=gi.sort_values(by=[\"Ganancia\"],ascending=False)\n", " nombreFile=str(nombreVars[target])+\".csv\"\n", " #gi_sorted.to_csv(nombreFile,index = False) \n", " print(gi_sorted)\n", "armarRankings(data,vars_indep,vars_target)" ] }, { "cell_type": "code", "execution_count": null, "id": "e3e57bd2", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3" } }, "nbformat": 4, "nbformat_minor": 5 }