Read Aloud the Text Content
This audio was created by Woord's Text to Speech service by content creators from all around the world.
Text Content or SSML code:
<speak> Describe will give the statistical measures of all the variables. <break strength="strong"/>For example, if you see the GRE score, its mean is 316, maximum is 340. <break strength="weak"/>So most of the students are getting the average values near to the maximum. <break strength="strong"/>So it is skewed towards the maximum that is the right side. <break strength="strong"/>These 25%, 50%, and 75% are the quartiles. <break strength="strong"/>Quartiles give a hint of whether there are outliers present or not. <break strength="strong"/>The minimum is 290.<break strength="strong"/> By using describe, you will get some basic understanding of the data.<break strength="x-strong"/><break strength="x-strong"/> Suppose a variable has the same value in all the rows then it is not useful. <break strength="strong"/>If the variable has the same values in all rows, then the standard deviation is zero.<break strength="strong"/> We have 500 entries. You can seethe the count is 500. <break strength="strong"/>That means there are five hundred students' data. There is data from 500 students. If all the students got the same CGPA, then it is difficult for us to predict the chance of admission.<break strength="strong"/> Because all the students got the same CGPA, there is no variation. If there is no much variation, then that variable is not useful. <break strength="strong"/>If you have any such kind of variables, you have to remove that column. <break strength="strong"/>Describe function provides the first-hand information of the data. <break strength="strong"/>You can measure variance, standard deviation, mean, range, minimum, and maximum values using the describe function. <break strength="strong"/>For all the variables, numerical values will give this kind of information.<break strength="x-strong"/> The next one is seeing the data types.<break strength="strong"/> In pandas, you have a method called dtypes. <break strength="strong"/>Data.dtypes will give the datatype of each column. If you see the serial number, It is an integer. Here int stands for integer. GRE score is also an integer.<break strength="x-strong"/> TOEFL score is an integer. SOP and LOR and CGPA are floating numbers. <break strength="strong"/>All are numerical values, so there are no categorical or text variables. <break strength="strong"/>All are continuous variables. <break strength="strong"/>There is no discrete value in this data, so our work is somewhat simpler. <break strength="strong"/>If you have categorical or text variables, we have to convert them into numerical. <break strength="strong"/>So, now we don't require any conversion because all are numerical values. <break strength="strong"/>There is no need for encoding and decoding.<break strength="x-strong"/> </speak>