In my first four posts about Stata and Python, I showed howconfigure Stata to use Python,three ways to use Python in Stata,how to install python packages, youhow to use python packages. It might be helpful to read those posts before proceeding with this post if you are not familiar with Python. Now I'd like to turn our focus to some practical uses of Python in Stata. This post will demonstrate how to use Stata to estimate marginal predictions from a logistic regression model and how to use Python to create a three-dimensional surface plot of these predictions.
we will be using theNumPy,pandas, youmatplotlibpackages, so you should make sure they are installed before starting.
Predicted probabilities for continuous by continuous interactions
Several years ago I wrote aStata Newsarticle entitledVisualization of continuous interactions with margins and bidirectional outline. In this article, I fitted a logistic regression model using data from the National Health and Nutrition Survey (NHANES). The dependent variable,alto pb, is an indicator of high blood pressure, and includes the main effects and interaction of continuous variablesageyweight.
. I use web nhanes2 of course. svy: logistic highbp c.age##c.weight(running logistic on estimate sample) Search: Logistic regression Number of strata = 31 Number of obs = 10,351 Number of PSU = 62 Population size = 117,157,513 Design df = 31 F(3, 29) = 418.97 Prob > F = 0.0000--------------------------------- ----- --- - ----------------------------------------- ---- | linearized highbp | Default odds ratio Err. tP>|t| [95% conf. Range]--------------+--------------------------------- ------------------------------- age | 1.100678 .0088786 11.89 0.000 1.082718 1.118935 weight | 1.07534 .0063892 12.23 0.000 1.062388 1.08845 |c.age#c.weight | .9993975 .0001138 -5.29 0.000 .9991655 .9996296 | _cons | 0.0002925 .0001194 -19.94 0.000 .0001273 .0006724------------------------------------ -- --------------------------------Note: _cons estimates reference probabilities.
The estimated likelihood ratio for the interaction ofageyweightis 0.9994, and thepage-Value for estimation is 0.000. Interpreting this result is challenging because the odds ratio is essentially equal to the zero value of one, but thepage-value is essentially zero. Is the interaction effect significant? It is often easier to determine this by looking at the predicted probability of the outcome at different levels of the covariates, rather than just looking at this odds ratio.
The following block of code usesmarginsestimate the predicted probability of hypertension for all combinations ofageyweightfor values ofagefrom 20 to 80 years in increments of 5 and for values ofweightfrom 40 to 180 kilos in increments of 5.save (predictions, replace)stores the predictions in a dataset calledforecasts.dta. Then I use the datasetpredictions, rename three variables in the dataset and save it again.
use web nhanes2 of course
svy: highbp logistics age weight c.age#c.weight
margins calmly, in (age=(20(5)80) weight=(40(5)180)) ///
vce (unconditional) save (predictions, replace)
use predictions of course
rename _at 1 years
rename _at2 weight
renomear _margin pr_highbp
save predictions, replace
In meprevious article from Stata, Used to bebidirectional contour plottingto create a contour plot of the predicted odds of hypertension.
In this post, I'll use Python to create a three-dimensional surface plot of predicted probabilities.
Using pandas to read marginal predictions in Python
Python must have access to the data stored inforecasts.dtato create our three-dimensional surface plot. Let's start by importing the pandas package into Python using the aliasp.d.. So we can use theleer_stata()method in pandas package to readforecasts.dtain a pandas dataframe calleddata.
. . . . Python:------------------------------------------------ ------------------------------------------------ -- --------- python (type end to exit) -------->>> import pandas as pd>>> data = pd.read_stata("predictions.dta")>> > data Pr_Highbp EDAD weight0 0.020091 20 401 0.027450 20 452 0.037401 20 503 0.050771 20 554 0.068580 20 60 .. ... ... 372 0.954326 80373 0.9586165 lines>83. [377] columns ----------------------------------------- -- --- ----------------------------------------
We can then refer to a variable within the data frame by writingdata frame['Surname']. For example, we can refer to the variableagein the data framedatawritingdata['age'].
. . . . Pitão:------------------------------------------------ ------------------------------------------------ -- --------- python (digite end para sair) -------->>> import pandas as pd>>> data = pd.read_stata("predictions.dta")>> > data ['age']0 201 202 203 204 20 .. 372 80373 80374 80375 80376 80Name: age, Length: 377, dtype: int8>>> end---------------; - ----------------------------- --------------------- -------------- --------------
We can refer to multiple variables within a data frame by writingdata frame[['varname1','varname2']]. For example, we can refer to the variablesageyweightin the data framedatawritingdata[['age', 'weight']].
. . . . Python:------------------------------------------------ ------------------------------------------------ -- --------- python (type end to exit) -------->>> import pandas as pd>>> data = pd.read_stata("predictions.data")>> > data [['age', 'weight']] age weight0 20 401 20 452 20 503 20 554 20 60.. ... ...372 80 160373 80 165374 80 170375 80 175376 80 180[377 lines x 2 columns ]> >> end------------------------------------------ --- - ----------- -------------------
Using NumPy to Create Lists of Numbers
We will also need to create lists of numbers to place marks on the axes of our graph. Let's import NumPy package to Python using aliaspublic notary. So we can use theorange()in the NumPy package to create lists. The following example creates a list of numbers calledmy liststarting at 20 and ending at 90 in increments of 10.
. Python:------------------------------------------------ ---------------- ----------- python (type end to exit) -------->>> import numpy as np> >> mylist = np.arange(20,90, step=10) >>> mylistarray([20, 30, 40, 50, 60, 70, 80])>>> end-------- ---------------- ------------------------- --------- -------------------------- ----
You might be surprised to learn that the resulting list does not include the number 90. This is not a bug or error. It is a feature oforange()method in NumPy. can writenp.arange(20,100, step=10)if you want to include 90 in your list.
Using Matplotlib to create three-dimensional surface plots
Now, we are ready to create our plot. Let's start by importing theNumPy,pandas, youmatplotlibpython packages
python:
import numpy as np
import pandas as pd
import matplotlib
fin
We will also import thepyplotMatplotlib package module using aliasplease. The new line is shown in red for easy identification.
python:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
fin
added the statementmatplotlib.use('TkAgg')to the code block below. I need this instruction for Matplotlib to run correctly using Python in my Windows 10 environment. Matplotlib uses different rendering engines for different purposes and for different platforms. You may need to use a different rendering engine in your computing environment and you can find more information in the Stata FAQ titled “How do I use either Matplotlib com or Stata?“.
python:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')
fin
We can then use theleer_stata()method in pandas package to readforecasts.dtain a pandas dataframe calleddata.
python:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')
data = pd.read_stata("forecasts.dta")
fin
So, we can use theaxes()method notpyplotmodule for defining a three-dimensional set of axes calledMachado.
python:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')
data = pd.read_stata("forecasts.dta")
ax = plt.axes(projection='3d')
fin
We can then use theplot_trisurf()method notpyplotmodule to render our three-dimensional surface graph.
python:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')
data = pd.read_stata("forecasts.dta")
ax = plt.axes(projection='3d')
ax.plot_trisurf(data['age'], data['weight'], data['pr_highbp'])
fin
The surface graph is rendered in solid blue by default. let's use thecmap=plt.cm.Spectral_roption to add color shading to our chart. the color schemeespectral_rshows lower odds of hypertension with blue and higher odds with red.
python:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')
data = pd.read_stata("forecasts.dta")
ax = plt.axes(projection='3d')
ax.plot_trisurf(data['age'], data['weight'], data['pr_highbp'],
cmap=plt.cm.Spectral_r)
fin
The default axis ticks look reasonable, but we might want to customize them. HeyThe axis looks a little fuzzy and we can modify the increment between the marks with theset_yticks()method notaxesmodule. Heorange()The method in the NumPy module defines a list of numbers from 40 to 200 in steps of 40. We can use similar statements to add custom flags to thexyzaxes
python:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')
data = pd.read_stata("forecasts.dta")
ax = plt.axes(projection='3d')
ax.plot_trisurf(data['age'], data['weight'], data['pr_highbp'],
cmap=plt.cm.Spectral_r)
ax.set_xticks(np.arange(20, 90, passo=10))
ax.set_yticks(np.arange(40, 200, passo=40))
ax.set_zticks(np.arange( 0, 1.2, passo=0.2))
fin
We can then add a title to our chart with theset_title()method notaxesmodule. And we can add tags to thex,y, youzaxes with theset_xlabel(),set_yetiqueta(), youset_zlabel()methods, respectively.
python:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')
data = pd.read_stata("forecasts.dta")
ax = plt.axes(projection='3d')
ax.set_xticks(np.arange(20, 90, passo=10))
ax.set_yticks(np.arange(40, 200, passo=40))
ax.set_zticks(np.arange( 0, 1.2, passo=0.2))
ax.set_title("Probability of hypertension by age and weight")
ax.set_xlabel("Age (years)")
ax.set_ylabel("Peso (kg)")
ax.set_zlabel("Probability of hypertension")
fin
You can adjust the viewing angle with thever_init()method. Hehighoption adjusts the elevation, and theazimThe option adjusts the azimuth. Both are specified in degrees.
python:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')
data = pd.read_stata("forecasts.dta")
ax = plt.axes(projection='3d')
ax.set_xticks(np.arange(20, 90, passo=10))
ax.set_yticks(np.arange(40, 200, passo=40))
ax.set_zticks(np.arange( 0, 1.2, passo=0.2))
ax.set_title("Probability of hypertension by age and weight")
ax.set_xlabel("Age (years)")
ax.set_ylabel("Peso (kg)")
ax.set_zlabel("Probability of hypertension")
ax.view_init(elev=30, azim=240)
fin
I prefer to read thez-Make the title bottom-up instead of top-down. I can change this using a combination ofset_rotate_label(Falso)method and therotation = 90option onset_xlabel()method.
python:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')
data = pd.read_stata("forecasts.dta")
ax = plt.axes(projection='3d')
ax.set_xticks(np.arange(20, 90, passo=10))
ax.set_yticks(np.arange(40, 200, passo=40))
ax.set_zticks(np.arange( 0, 1.2, passo=0.2))
ax.set_title("Probability of hypertension by age and weight")
ax.set_xlabel("Age (years)")
ax.set_ylabel("Peso (kg)")
ax.zaxis.set_rotate_label(Falso)
ax.set_zlabel("Probability of hypertension", rotation = 90)
ax.view_init(elev=30, azim=240)
fin
Finally, we can use thesavefig()to save our graphic as a Portable Network Graphics (.png) file with a resolution of 1200 dots per inch.
python:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')
data = pd.read_stata("forecasts.dta")
ax = plt.axes(projection='3d')
ax.set_xticks(np.arange(20, 90, passo=10))
ax.set_yticks(np.arange(40, 200, passo=40))
ax.set_zticks(np.arange( 0, 1.2, passo=0.2))
ax.set_title("Probability of hypertension by age and weight")
ax.set_xlabel("Age (years)")
ax.set_ylabel("Peso (kg)")
ax.zaxis.set_rotate_label(Falso)
ax.set_zlabel("Probability of hypertension", rotation=90)
ax.view_init(elev=30, azim=240)
plt.savefig("Margins3d.png", ppp=1200)
fin
Conclusion
The resulting three-dimensional surface plot shows the predicted probability of hypertension for values ofageyweight. The probabilities are indicated by the height of the surface in thezy-axis by surface color. Blue indicates less likelihood of hypertension and red indicates greater likelihood of hypertension. This can be a useful way to interpret the interaction of two continuous covariates in a regression model.
I've compiled the Stata commands and the Python code block into a single file below. And I've included comments to remind you of the purpose of each collection of commands and statements. Note that Stata comments start with "//" and Python comments start with "#".
exemplo.do
// Fit the model and estimate the predicted marginal probabilities with Stata
use web nhanes2 of course
highbp logistics c.age##c.peso
margins calmly, in (age=(20(5)80) weight=(40(5)180)) ///
save (predictions, replace)
use predictions of course
rename _at 1 years
rename _at2 weight
renomear _margin pr_highbp
save predictions, replace
// Create the three-dimensional surface graph with Python
python:
# Import the necessary Python packages
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')
# Read (import) the Stata dataset "predictions.dta"
# into a pandas dataframe named "data"
data = pd.read_stata(“predictions.dta”)
# Define a 3-D graph named "ax"
ax = plt.axes(proyección='3d')
# render the graph
ax.plot_trisurf(data['age'], data['weight'], data['pr_highbp'],
cmap=plt.cm.Spectral_r)
# Specify axis ticks
ax.set_xticks(np.arange(20, 90, passo=10))
ax.set_yticks(np.arange(40, 200, passo=40))
ax.set_zticks(np.arange( 0, 1.2, passo=0.2))
# Specify title and axis titles
ax.set_title("Probability of hypertension by age and weight")
ax.set_xlabel("Age (years)")
ax.set_ylabel(“Peso (kg)”)
ax.zaxis.set_rotate_label(Falso)
ax.set_zlabel("Probability of Hypertension", rotation=90)
# Specify the viewing angle of the chart
ax.view_init(elev=30, azim=240)
# save the chart
plt.savefig("Margins3d.png", ppp=1200)
fin
FAQs
What is a three-dimensional response surface plot? ›
A 3D surface plot is a three-dimensional graph that is useful for investigating desirable response values and operating conditions. A surface plot contains the following elements: Predictors on the x- and y-axes. A continuous surface that represents the response values on the z-axis.
What is three-dimensional data plot? ›The most basic three-dimensional plot is a line or collection of scatter plot created from sets of (x, y, z) triples. In analogy with the more common two-dimensional plots discussed earlier, these can be created using the ax.plot3D and ax.scatter3D functions.
What is marginal prediction? ›Marginal effects measure the association between a change in the predictors and a change in the outcome. It is an effect, not a prediction. It is a change, not a level. Adjusted predictions measure the average value of the outcome for specific values or levels of predictors.
How do you interpret surface plot in Minitab? ›Use Surface Plot when you have a stored model and want to plot how the fitted response relates to two continuous variables. A surface plot displays the three-dimensional relationship in two dimensions, with the variables on the x- and y-axes, and the response variable (z) represented by a smooth surface.
Which function is used for 3 dimensional plot? ›Three-dimensional plots typically display a surface defined by a function in two variables, z = f ( x , y ) .
What are the 3 dimensions in 3 dimensional design? ›3D, or three dimensional, refers to the three spatial dimensions of width, height and depth. The physical world and everything that is observed in it are three dimensional.
What is three-dimensional diagrams in statistics? ›Three-dimensional charts provide a visually effective display that is suitable for presentations. Three-dimensional column, bar, line, and area charts plot data by using three axes. Three-dimensional pie charts have a three-dimensional visual effect.
What are the characteristics of a 3 dimensional drawings? ›Three-dimensional art is defined as art with all the dimensions of height, width, and depth. Unlike 2D art, it occupies greater physical space and can be viewed and interpreted from all sides and angles. 3D artists use various materials manipulated into objects, characters, and scenes to produce these artworks.
What does a marginal plot tell you? ›A marginal plot is a scatterplot that has histograms, boxplots, or dot plots in the margins of the x- and y-axes. It allows studying the relationship between 2 numeric variables. The base plot visualizes the correlation between the x and y axes variables. It is usually a scatterplot or a density plot.
How do you interpret a marginal plot? ›...
- Step 1: Look for a model relationship and assess its strength. Determine which model relationship best fits your data and assess the strength of the relationship. ...
- Step 2: Look for indicators of nonnormal or unusual data. ...
- Step 3: Look for group-related patterns.
What is a marginal model plot? ›
Marginal model plots display the dependent variable on each vertical axis and each independent variable on a horizontal axis. There is one marginal model plot for each independent variable and one additional plot that displays the predicted values on the horizontal axis.
What is 3D surface plot in Minitab? ›Use 3D Surface Plot to examine the relationship between a response variable (Z) and two predictor variables (X and Y), by viewing a three-dimensional surface of the predicted response. You can choose to represent the predicted response as a smooth surface or a wireframe.
When would you use a surface plot? ›Surface Chart (3D Surface Plot) displays a set of three-dimensional data as a mesh surface. It is useful when you need to find the optimum combinations between two sets of data. The colors and patterns in Surface Charts indicate the areas that are in the same range of values by analogy with a topographic map.
What are surface plots used for? ›Surface plots are diagrams that show the functional relationship of independent and dependent variables. Surface Plot is used when you have a model that has been stored and want to show how the fitted response compares to two continuous variables.
What is the name of 3D plots? ›3D plots are, also known as surface plots in Excel, used to represent three-dimensional data. To create a three-dimensional plot in Excel, we need to have a three-dimensional range of data which means we must have three-axis: X, Y, and Z. The 3D plots or surface plots can be used from the insert tab in Excel.
What is a 3D graph called? ›Also called: 3-D surface plot. A three-dimensional Stream Graph is the graph of a function f(x, y) of two variables, or the graph of a relationship g(x, y, z) among three variables.
Which command is for plotting a 3D surface? ›surf( X , Y , Z ) creates a three-dimensional surface plot, which is a three-dimensional surface that has solid edge colors and solid face colors. The function plots the values in matrix Z as heights above a grid in the x-y plane defined by X and Y .
How do you make a 3D surface plot in Python? ›We could plot 3D surfaces in Python too, the function to plot the 3D surfaces is plot_surface(X,Y,Z), where X and Y are the output arrays from meshgrid, and Z=f(X,Y) or Z(i,j)=f(X(i,j),Y(i,j)). The most common surface plotting functions are surf and contour. TRY IT!
How do you make a 3D plot in Matlab? ›- To plot a set of coordinates connected by line segments, specify X , Y , and Z as vectors of the same length.
- To plot multiple sets of coordinates on the same set of axes, specify at least one of X , Y , or Z as a matrix and the others as vectors.
Cubes, prisms, pyramids, spheres, cones, and cylinders are all examples of three-dimensional objects.
What is the meaning of 3 dimensional design? ›
3D design is the process of using computer-modeling software to create an object within a three-dimensional space. This means that the object itself has three key values assigned to it in order to understand where it exists within the space.
What are the five basic elements of three-dimensional design? ›To begin to understand the creative process involved in hairstyling, it is critical to learn the five basic elements of three-dimensional design. These elements are line, form, space, texture, and color.
Why can 3 dimensional graphs be misleading? ›3D Graphs
In general, 3D graphs are misleading. They throw off proportions and make things look big or small depending on the angle. Here is the same pie chart, now in 3D. We already know I am using this pie chart completely incorrectly, but when it is presented in 3D, the data are even more skewed.
Although six different sides can be drawn, usually three views of a drawing give enough information to make a 3D object. These views are known as front view, top view, and end view. The terms elevation, plan and section are also used.
What are the benefits of three-dimensional art? ›- Motor Skills. ...
- Self-Expression. ...
- Language Development. ...
- Visual Learning. ...
- Cultural Awareness. ...
- Boosting Confidence and Self-Esteem. ...
- Health Benefits.
A two-dimensional structure has only two surfaces; length and breadth. A three-dimensional structure has three surfaces; length, breadth, and height. Two-dimensional figures are also referred to as “plane” figures or “flat” figures due to their appearance. Three-dimensional figures are only referred to as 3D figures.
What are two different styles of three-dimensional drawing techniques? ›Isometric drawings and perspective drawings are commonly used in technical drawing to show an item in 3D on a 2D page.
What graph shows marginal distribution? ›It is usually a scatterplot, a hexbin plot, a 2D histogram or a 2D density plot. The marginal charts, usually on the top and right, show the distribution of 2 variables using histogram or density plot.
What does average marginal effect tell us? ›The average marginal effect gives you an effect on the probability, i.e. a number between 0 and 1. It is the average change in probability when x increases by one unit. Since a probit is a non-linear model, that effect will differ from individual to individual.
Why are marginal means important? ›Marginal means are useful because they tell us the overall average value for a specific level of some variable. For example, in the previous scenario we knew the following: The mean exam score for males who used studying technique 1 was 79.5. The mean exam score for males who used studying technique 2 was 88.7.
What is a marginal model in statistics? ›
Marginal models are a type of linear model that accounts for repeated response measures on the same subject. They extend the general linear model by allowing and accounting for non-independence among the observations of a single subject.
What are marginal means in regression? ›Marginal means are basically means extracted from a statistical model, and represent average of response variable (here, Sepal. Width ) for each level of predictor variable (here, Species ).
What is marginal regression model? ›Marginal regression (also called correlation learning, simple thresholding [6], and sure. screening [15]) is an older and computationally simpler method for variable selection in. which the outcome variable is regressed on each covariate separately and the resulting. coefficient estimates are screened.
What is response surface plot? ›Response surface plots such as contour and surface plots are useful for establishing desirable response values and operating conditions. In a contour plot, the response surface is viewed as a two-dimensional plane where all points that have the same response are connected to produce contour lines of constant responses.
What is a three-dimensional representation? ›Three-dimensional representation systems allow design to use image and video in innovative ways and enable organization of more suitable content for effective understanding. This paper addresses the use of three-dimensional representation systems to disseminate complex concepts related to science.
What is the three-dimensional feel of a surface? ›Texture is one of seven elements of art. It is used to describe the way a three-dimensional work actually feels when touched. In two-dimensional work, such as painting, it may refer to the visual "feel" of a piece.
What is the difference between DOE and RSM? ›The key differences between the two broad types of DOE's are as follows: In Factorial/RSM the factor levels are set completely independent of each other. Examples of the factors could be temperature, speed, type of material. In formulation and mixture DOE's instead of factors we have ingredients.
What is an example of 3 dimensional? ›Cubes, prisms, pyramids, spheres, cones, and cylinders are all examples of three-dimensional objects. Three-dimensional objects can be rotated in space.