(Of course, this is a generalization of the data set. The greater is the height value, the greater is the expected weight value, too. This above is called a positive correlation.
Pandas plot scatter code#
Note: this article is not about regression machine learning models, but if you want to get started with that, go here: Linear Regression in Python using numpy + polyfit (with code base) regression line) to this data set and try to describe this relationship with a mathematical formula. Looking at the chart above, you can immediately tell that there’s a strong correlation between weight and height, right? As we discussed in my linear regression article, you can even fit a trend line (a.k.a. Scatter plots play an important role in data science – especially in building/prototyping machine learning models. So, for instance, this person’s (highlighted with red) weight and height is 66.5 kg and 169 cm.
Very informative plots can be created with just one line of code.
Pandas plot scatter series#
They help to discover relations within dataframes or series and syntax is pretty simple. With that being said, we cannot just ignore the plotting tools of pandas. When it comes to data visualization, pandas is not the prominent choice because there exist great visualization libraries such as matplotlib, seaborn, and plotly. It provides numerous functions and methods that expedice the data analysis process. Pandas is the go-to Python library for data analysis and manipulation.