Appendix - Plotting in Python
Matplotlib Plot A Line (Detailed Guide) - Python Guides This is the best guide on drawing lines in matplotlib. And this is an addition to drawing smooth lines python - Plot smooth line with PyPlot - Stack Overflow
From Matplotlib docs: And here's the color guide Specifying Colors — Matplotlib 3.4.3 documentation To find specific function arguments, say on axes, search for the full path, like so: "matplotlib.axes.Axes.axhline". In this example, we're looking for axhline
arguments on the axes. Here's the main page, for all available functions on ax[0], see this link matplotlib.axes — Matplotlib 3.4.3 documentation
Plotting in a loop
Example code - Plotting in a loop: Plotting errors (actual - predicted), after I've already sorted the dataframe w.r.t. actual value. The sorting step is an important step to see if error, and thus model, is biased towards high or low values of the target. Another plot is to not sort, then plot a scatter plot of actuals vs. predicted values, to check visually for homosckedasticity.
In the code below, I divided the range of the actual target variable into 7 intervals, and I'm plotting 7 figures of lines of predicted and actual values. The buckets are those intervals, and are based on a graph of the target variable distribution histogram bins, so it's case-dependent. You can use the target max instead of float("inf")
Density (probability) plot
Using Seaborn's distribution plot (kdeplot) gives the historgram, the curve, or either of them. using the depricated distplot, or the new one displot in version 0.11 The better solution, with rescaling on the fly, use Numpy:
Wonders of Numpy in plotting: how this code works I really need to study Numpy in detail, it has magnificent applications and properties you can use. One of them, is useful for plotting with matplotlib. checkout np.histogram()
which returns two arguments, 1) data count in each bin of the histogram, and 2) bin edges. The latter has 1 element more than data count of course, because it includes the leftmost and rightmost edges of all bins.
Bar plot, 2 on same graph, each bar of first right next to the bar from the other
This code example is repeated in "Create Categories/Buckets Manually & KS Test" note in here In the example below, I also do Kologorov-Smirnov test on the two arrays I want to bar plot, and I show results on the plot itself. I also format the result to have only three decimal points.
VERY IMPORTANT NOTE: The two arrays must be of the same length to start with in order to execute the KS-test. To do that, in case the two arrays are two columns with different values, i.e. two different distributions, then we need to create a new array for each, putting their values in the same "bucket" values, BEFORE precedding to plotting them as per the plotting function above. Find how to do that in "Create Categories/Buckets Manually & KS Test" note in here. Resources for this example numpy set ufuncs numpy combining and reshaping -- scroll down to NumPy concatenate section in the middle of the page grouped charts example
Last updated