`vignettes/piecewiseSEM.Rmd`

`piecewiseSEM.Rmd`

Structural equation modeling (SEM) is among the fastest growing statistical techniques in ecology and evolution, and provides a new way to explore and quantify ecological systems. SEM unites multiple variables in a single causal network, thereby allowing simultaneous tests of multiple hypotheses. The idea of causality is central to SEM as the technique implicitly assumes that the relationships among variables represent causal links. Because variables can be both predictors and responses, SEM is also a useful tool for quantifying both direct and indirect (cascading) effects.

Piecewise SEM (or confirmatory path analysis) expands upon traditional SEM by introducing a flexible mathematical framework that can incorporate a wide variety of model structures, distributions, and assumptions. These include: interactions and non-normal responses, random effects and hierarchical models, and alternate correlation structures (including phylogenetic, spatial, and temporal).

This release is version 2.0 of the package and contains substantial updates to both the syntax and the underlying calculations. All functions have been replaced and rewritten from the ground up.

The first part of this vignette will briefly introduce the concepts behind piecewise SEM. The second part will introduce the new syntax using a worked example. The final part will briefly compare the old and new versions of the package.

Broadly, structural equation modeling (SEM) unites a suite of variables in a single network. They are generally presented using box-and-arrow diagrams denoting directed (causal) relationships among variables:

Those variables that exist only as predictors in the network are referred to as exogenous, and those that are predicted (at any point) as endogenous. Exogenous variables therefore only ever have arrows coming out of them, while endogenous arrows have arrows coming into them (which does not preclude them from having arrows come out of them as well). This vocabulary is important when considering some special cases later.

In traditional SEM, the relationships among variables (i.e., their linear coefficients) are estimated simultaneously in a single variance-covariance matrix. This approach is well developed but can be computationally intensive (depending on the sizes of the v-cov matrix) and additionally assumes independence and normality of errors, two assumptions that are generally violated in ecological research.

Piecewise structural equation modeling (SEM), also called confirmatory path analysis, was proposed in the early 2000s by Bill Shipley as an alternate approach to traditional variance-covariance based SEM. In piecewise SEM, each set of relationships is estimated independently (or locally). This proces decomposes the network into the corresponding simple or multiple linear regressions for each response, each of which are evaluated separately, and then combined later to generate inferences about the entire SEM. This approach has two consequences: 1. Increasingly large networks can be estimated with ease compared to a single vcov matrix (because the approach is modularized), and 2. Specific assumptions about the distribution and covariance of the responses can be addressed using typical extensions of linear regression, such as fixed covariance structures, random effects, and other sophisticated modeling techniques.

Unlike traditional SEM, which uses a \(\chi^2\) test to compare the observed and predicted covariance matrices, the goodness-of-fit of a piecewise structural equation model is obtained using ‘tests of directed separation.’ These tests evaluate the assumption that the specific causal structure reflects the data. This is accomplished by deriving the ‘basis set,’ which is the smallest set of independence claims obtained from the SEM. These claims are relationships that are *un*specified in the model, in other words paths that could have been included but were omitted because they were deemed to be biologically or mechanistically insigificant. The tests ask whether these relationships can truly be considered independent (i.e., their association is not statistically significant within some threshold of acceptable error, typically \(\alpha\)=0.05) or whether some causal relationship may exist as indicated by the data.

For instance, the preceding example SEM contains 4 specified paths (solid, black) and 2 unspecified paths (dashed, red), the latter of which constitute the basis set:

In this case, there are two relationships that need to be evaluated: `y3`

and `x1`

, and `y3`

and `y2`

. However, there are additional influences on `y3`

, specifically the directed path from `y2`

. Thus, the claims need to be evaluated for ‘conditional independence,’ i.e. that the two variables are independent *conditional* on the already specified influences on both of them. This also pertains to the predictors of `y2`

, including the potential contributions of `x1`

. So the full claim would be: `y2 | y3 (y1, x1)`

, with the claim of interest separated by the `|`

bar and the conditioning variable(s) following in parentheses.

As the network grows more complex, however, the independence claims only consider variables that are *immediately ancestral* to the primary claim (i.e., the parent nodes). For example, if there was another variable predicting `x1`

, it would not be considered in the independence claim between `y3`

and `y2`

since it is >1 node away in the network.

The independence claims are evaluated by fitting a regression between the two variables of interest with any conditioning variables included as covariates. Thus, the claim above `y2 | y3 (y1, x1)`

would be modeled as `y3 ~ y2 + y1 + x1`

. These regressions are constructed using the same assumptions about `y3`

as specified in the actual structural equation model. So, for instance, if `y3`

is a hierarchically sampled variable predicted by `y1`

, then same hierarchical structure would carry over to the test of directed separation of `y3`

predicted by `y2`

.

The P-values of the conditional independence tests are then combined in a single Fisher’s C statistic using the following equation:

\[C = -2\sum_{i=1}^{k}ln(p_{i})\]

This statistic is \(\chi^2\)-distributed with 2k degrees of freedom, with k being the number of independence claims in the basis set.

Shipley (2013) also showed that the the C statistic can be used to compute an AIC score for the SEM, so that nested comparisons can be made in a model selection framework:

\[AIC = C + 2K\]

where K is the likelihood degrees of freedom. A further variant, \(AIC_c\), can be obtained by adding an additional penalty based on sample size:

\[AIC_c = C + 2K\frac{n}{(n - K - 1)}\]

The `piecewiseSEM`

package automates the derivation of the basis set and the tests of directed separation, as well as extraction of path coefficients based on the user-specified input.

Let’s make up some fake data corresponding to the path diagram above:

And we will use `piecewiseSEM`

to fit the model. The primary function is `psem`

and we supply the regressions corresponding to the relationships specified in the path diagram, separated by commas as in a `list`

:

```
##
## This is piecewiseSEM version 2.0.
##
## If you have used the package before, it is strongly recommended you read Section 3 of the vignette('piecewiseSEM') to familiarize yourself with the new syntax.
##
## Questions or bugs can be addressed to <jlefcheck@bigelow.org>
```

`## Error: Duplicate responses detected in the model list. Collapse into single multiple regression!`

You’ll note that this formulation produces an error because we have incorrectly broken down the component regressions. A common mistake is to list each path separately, but the proper specification is to collapse multiple pathways into a single multiple regression if the response is the same. Thus, the properly specified SEM becomes:

To evaluate the model, we call `summary`

on the `psem`

object.

```
## $name
## [1] "model"
##
## $call
## [1] "y1 ~ x1 + y2\n y2 ~ x1\n y3 ~ y1"
##
## $dTable
## Independ.Claim Estimate Std.Error DF Crit.Value P.Value
## 1 y3 ~ x1 + ... 0.0220 0.1414 47 0.1557 0.8769
## 2 y3 ~ y2 + ... -0.2119 0.1431 46 -1.4807 0.1455
##
## $Cstat
## Fisher.C df P.Value
## 1 4.118 4 0.39
##
## $IC
## AIC AICc BIC K n
## 1 24.118 30.921 43.238 10 50
##
## $coefficients
## Response Predictor Estimate Std.Error DF Crit.Value P.Value Std.Estimate
## 1 y1 x1 -0.0061 0.1297 47 -0.0473 0.9625 -0.0064
## 2 y1 y2 0.3511 0.1226 47 2.8635 0.0062 0.3854
## 3 y2 x1 -0.0088 0.1527 48 -0.0575 0.9544 -0.0083
## 4 y3 y1 0.2665 0.1452 48 1.8350 0.0727 0.2560
##
## 1
## 2 **
## 3
## 4
##
## $R2
## Response family link method R.squared
## 1 y1 gaussian identity none 0.15
## 2 y2 gaussian identity none 0.00
## 3 y3 gaussian identity none 0.07
##
## attr(,"class")
## [1] "summary.psem"
```

The output should be familiar to anyone who has evaluated a linear model in R previously.

It shows the call of the model (the component equations), the AIC and BIC scores (derived from that C statistic), and then the tests of directed separation. The last column of the table reports the P-values, which are summarized using the above equation to yield the global goodness-of-fit below. The next table reports the path coefficients, including the standardized values (scaled by standardized deviations). Finally, the individual R-squared values of each regression is given, to aid in evaluation of the model fit.

Standardization of model coefficients is useful for making comparisons about the relative strengths of predictors in a multiple regression. In a network approach, it allows effects to also be compared across multiple responses. Standardized coefficients are also necessary for the calculation of indirect and total effects (because predictors may occur on wildly different scales). Because of their utility, a variety of approaches to coefficient standardization have been proposed, many of which are implemented here.

Let’s first create an example dataset:

```
# Create fake data.frame
coefs.data <- data.frame(
y = runif(100),
x1 = runif(100),
x2 = runif(100)
)
# Evaluate linear model
model <- lm(y ~ x1, coefs.data)
```

The function for returning coefficients in *piecewiseSEM* is `coefs`

.

While standardization is useful, it may be unnecessary or unwanted in certain circumstances. In these cases, one can specify `standardize = "none"`

as an argument to `coefs`

, which will return the raw (unstandardized) coefficients.

`coefs(model, standardize = "none")`

```
## Response Predictor Estimate Std.Error DF Crit.Value P.Value
## 1 y x1 9e-04 0.1054 98 0.009 0.9928
```

```
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.4876552766 0.05572766 8.750686922 6.19140e-14
## x1 0.0009478366 0.10542595 0.008990544 9.92845e-01
```

To return the intercepts using `coefs`

, specify the `intercepts = TRUE`

argument:

`coefs(model, standardize = "none", intercepts = TRUE)`

```
## Response Predictor Estimate Std.Error DF Crit.Value P.Value
## 1 y (Intercept) 0.4877 0.0557 98 8.7507 0.0000 ***
## 2 y x1 0.0009 0.1054 98 0.0090 0.9928
```

The most typical implementation of standardization is placing the coefficients in units of standard deviations of the mean. This is accomplished by scaling the coefficients \(\beta\) by the ratio of the standard deviation of x over the standard deviation of y:

\[\beta_{std} = \beta*\left( \frac{sd_x}{sd_y} \right)\] We can do this manually for our example dataset:

```
# Obtain the raw coefficient from the coefficient table
B <- summary(model)$coefficients[2, 1]
# Compute the standard deviation of the independent variable
sd.x <- sd(coefs.data$x1)
# Compute the standard deviation of the dependent variable
sd.y <- sd(coefs.data$y)
# Scale Beta
B.sdscaled <- B * sd.x/sd.y
```

The argument for standardization based on standard deviations is `standardize = "scale"`

:

`coefs(model, standardize = "scale")`

```
## Response Predictor Estimate Std.Error DF Crit.Value P.Value Std.Estimate
## 1 y x1 9e-04 0.1054 98 0.009 0.9928 9e-04
##
## 1
```

`## [1] 0.0009081817`

Note that the `coefs`

function will return both standardized and unstandardized coefficients when specifying any argument other than `standardize = "none"`

.

Instead of scaling by the ratio of the standard deviations, one can scale by the ‘relevant’ range of the data.

The default range standardization considers the full range of variation exhibited by the data. We can again compute the range standardization by hand:

```
# Calculate range for the independent variable
range.x <- diff(range(coefs.data$x1))
# Calculate range for the independent variable
range.y <- diff(range(coefs.data$y))
# Scale Beta
B.range <- B * range.x/range.y
```

The argument for standardization based on ranges is `standardize = "range"`

:

`coefs(model, standardize = "range")`

```
## Response Predictor Estimate Std.Error DF Crit.Value P.Value Std.Estimate
## 1 y x1 9e-04 0.1054 98 0.009 0.9928 9e-04
##
## 1
```

`## [1] 0.0009250381`

If one does not wish to use the full range of the data, and instead restrict the range to a more ‘relevant’ subset of the data, this can be accomplished by providing a named `list`

to the `standardize =`

argument. The names should be `x`

and `y`

and each entry should be a `vector`

of length 2 with the first entry being the minimum and the second entry being the maximum values of the relevant range.

```
# Consider the range 0.1 - 0.8 for x
relrange.x <- 0.8-0.1
# Consider the range 0.3-0.6 for y
relrange.y <- 0.6-0.3
# Scale Beta
B.relrange <- B * relrange.x/relrange.y
# Compare to automated calculation
coefs(model, standardize = list(x1 = c(0.1, 0.8), y = c(0.3, 0.6)))
```

```
## Response Predictor Estimate Std.Error DF Crit.Value P.Value Std.Estimate
## 1 y x1 9e-04 0.1054 98 0.009 0.9928 0.0022
##
## 1
```

`## [1] 0.002211619`

For generalized linear models, the response variables are inherently non-linear. The common solution is to apply a link function to linearize the response, and derive the parameter estimate \(\beta\) on this new linear (link) scale. Common link functions include logit and probit for binary responses, and log for Poisson, although many others exist.

This transformation, however, means that the modeled response is not actually observed, and thus the true error is not known. Obtaining the standard deviation of the response used in the calculation of standardized coefficients requires further assumptions about the distribution-specific variance.

We implement two solutions for obtaining the error variance to produce standardized coefficients for GLM, focusing for the moment on binary response models: the latent theoretic (`standardize.type = "latent.linear"`

) and the observation error approach (`standardize.type = "Menard.OE"`

).

In the latent theoretic approach, we assume a fixed error variance for the binomial distribution based on the link function: for logit, it is \(\pi\)^2/3 and for the probit, it is 1. This value is added to the predicted fits on the linear (link) scale and then the square-root of the variance of these values is taken to obtain the standard deviation of y.

In the observation error approach, a rough estimate of the error variance is obtained through the calculation of a pseudo-R^2, which is simply the correlation between the observed and predicted fits (in this case, in the original units). The error variance is again added to the variance of the predicted fits and the square-root is taken to obtain the standard deviation of y.

Let’s work through a simple example where we first compute the standardized coefficient by hand, then compare to the output of `piecewiseSEM`

:

```
# Create fake binomial response
coefs.data$y.binom <- rbinom(100, 1, 0.5)
# Fit using GLM
glm.model <- glm(y ~ x1, "binomial", coefs.data)
# Extract linear beta
Beta.glm <- summary(glm.model)$coefficients[2, 1]
```

And apply the latent theoretic (`standardize.type = "latent.linear"`

) approach:

```
# Extract predicted values on the link scale
preds <- predict(glm.model, type = "link")
# Compute sd of error variance using theoretical variances
sd.y.LT <- sqrt(var(preds) + pi^2/3)
# Compute sd of x
sd.x <- sd(coefs.data$x1)
# Compare to automated output
coefs(glm.model, standardize.type = "latent.linear")
```

```
## Response Predictor Estimate Std.Error DF Crit.Value P.Value Std.Estimate
## 1 y x1 0.0038 0.7345 98 0.0052 0.9959 6e-04
##
## 1
```

`## [1] 0.0005725077`

A problematic case arises when intermediate endogenous variables are non-normally distributed. Consider the following SEM:

In this SEM, there are two independence claims:

y3 | x1 (y1, y2)

y2 | y1 (x1)

Considering the second independence claim, in a Gaussian world, the significance value is the same whether the test is conducted as y2 | y1 (x1) or y1 | y2 (x1). This is NOT true, however, when one or both of the variables are modeled using a generalized linear model (GLM) fit to a non-normal distribution. This is because the response is now transformed via the link function (see Section 2.2). This transformations means the P-value obtained by regressing y1 against y2 is NOT the same as the one obtained by regressing y2 against y1.

The following example will show this is the case:

```
# Generate fake data
glmdat <- data.frame(x1 = runif(50), y1 = rpois(50, 10), y2 = rpois(50, 50), y3 = runif(50))
# Extract P-values
summary(lm(y1 ~ y2 + x1, glmdat))$coefficients[2, 4]
```

`## [1] 0.1737639`

`## [1] 0.1737639`

```
# Repeat but model y1 and y2 and Poisson-distributed
summary(glm(y1 ~ y2 + x1, "poisson", glmdat))$coefficients[2, 4]
```

`## [1] 0.134332`

`## [1] 0.1767188`

This effect is problematic because the d-sep tests are wholly dependent on the significance value. If the P-value is biased based on the direction of the test, then the goodness-of-fit of the model can be over- or underestimated.

`piecewiseSEM`

version 2.0 solves this by providing three options to the user.

One can specify the directionality of the test if, for instance, it makes greater biological sense to test

`y1`

against`y2`

instead of the reverse.One can remove that path from the basis set and instead specify it as a correlated error using

`%~~%`

.One can conduct both tests and choose the most conservative (i.e., lowest) P-value.

These options are returned by `summary`

in the event the above scenario is identified in the SEM:

```
# Generate fake data
glmdat <- data.frame(x1 = runif(50), y1 = rpois(50, 10), y2 = rpois(50, 50), y3 = runif(50))
# Construct SEM
glmsem <- psem(
glm(y1 ~ x1, "poisson", glmdat),
glm(y2 ~ x1, "poisson", glmdat),
lm(y3 ~ y1 + y2, glmdat)
)
summary(glmsem)
```

```
## Error:
## Non-linearities detected in the basis set where P-values are not symmetrical.
## This can bias the outcome of the tests of directed separation.
##
## Offending independence claims:
## y2 <- y1 *OR* y2 -> y1
##
## Option 1: Specify directionality using argument 'direction = c()' in 'summary'.
##
## Option 2: Remove path from the basis set by specifying as a correlated error using '%~~%' in 'psem'.
##
## Option 3 (recommended): Use argument 'conserve = TRUE' in 'summary' to compute both tests, and return the most conservative P-value.
```

In option 1, the directionality can be specified using `direction = c()`

as an additional argument.

```
## Independ.Claim Estimate Std.Error DF Crit.Value P.Value
## 1 y3 ~ x1 + ... -0.0055 0.1479 46 -0.0370 0.9707
## 2 y2 ~ y1 + ... 0.0119 0.0067 47 1.7728 0.0763
```

In option 2, the SEM can be updated to remove that test by specifying it as a correlated error (see Section 2.4).

```
## $name
## [1] "update(glmsem, y1 %~~% y2)"
##
## $call
## [1] "y1 ~ x1\n y2 ~ x1\n y3 ~ y1 + y2\n y1 ~~ y2"
##
## $dTable
## Independ.Claim Estimate Std.Error DF Crit.Value P.Value
## 1 y3 ~ x1 + ... -0.0055 0.1479 46 -0.037 0.9707
##
## $Cstat
## Fisher.C df P.Value
## 1 0.06 2 0.971
##
## $IC
## AIC AICc BIC K n
## 1 16.06 19.585 31.356 8 50
##
## $coefficients
## Response Predictor Estimate Std.Error DF Crit.Value P.Value Std.Estimate
## 1 y1 x1 -0.2505 0.1538 48 -1.6285 0.1034 NA
## 2 y2 x1 -0.0071 0.0700 48 -0.1009 0.9196 NA
## 3 y3 y1 -0.0238 0.0138 47 -1.7304 0.0901 -0.2482
## 4 y3 y2 -0.0054 0.0061 47 -0.8883 0.3789 -0.1274
## 5 ~~y1 ~~y2 0.2714 NA 47 1.9331 0.0296 0.2714
##
## 1
## 2
## 3
## 4
## 5 *
##
## $R2
## Response family link method R.squared
## 1 y1 poisson log nagelkerke 0.09
## 2 y2 poisson log nagelkerke 0.00
## 3 y3 gaussian identity none 0.09
##
## attr(,"class")
## [1] "summary.psem"
```

Note that the claim no longer appears in the section for the tests of directed separation.

Finally, option 3 can be invoked by specifying `conserve = T`

as an additional argument

```
## Independ.Claim Estimate Std.Error DF Crit.Value P.Value
## 1 y3 ~ x1 + ... -0.0055 0.1479 46 -0.0370 0.9707
## 2 y2 ~ y1 + ... 0.0119 0.0067 47 1.7728 0.0763
```

The user should be vigilant for these kinds of situations and ensure that both the specified paths AND the independence claims all make biological sense. In the case where the underlying assumptions of the d-sep tests can bias the goodness-of-fit statistic, `piecewiseSEM`

should automatically alert the user.

As noted, Shipley (2013) used the Fisher’s C statistic to construct an AIC score to facilitate model comparison and selection. This can be accomplished in the `piecewiseSEM`

package with one important distinction.

Let’s consider comparing the following models for the mediating role of `y1`

:

One might think that the models could be coded like this, and then compared:

```
AICdat <- data.frame(x1 = runif(50), y1 = runif(50), y2 = runif(50), y3 = runif(50))
sem1 <- psem(
lm(y1 ~ x1, AICdat),
lm(y2 ~ y1, AICdat),
lm(y3 ~ y2, AICdat)
)
sem2 <- psem(
lm(y1 ~ x1, AICdat),
lm(y2 ~ y1, AICdat)
)
AIC(sem1, sem2)
```

However, this does not account for the potential missing relationships with `y3`

to be in the model, which is critical as AIC incorporates Fisher’s C, which is determined by the d-sep tests. `y3`

must be present in the d-sep tests to make the comparison fair (i.e., the models must be nested).

To do so, we can use the following syntax:

```
## df AIC
## x 9 25.459
## y 6 24.450
```

Now the comparison is fair and the model selection procedure is robust. In this case, the models have equivalent support (i.e., \(\delta\)AIC < 2).

Comparison of saturated models–ones that have *no* missing paths–and unsaturated ones using AIC is currently not possible, although we are looking into possible likelihood formulations in the absence of a C statistic.

The new version 2.0 of `piecewiseSEM`

replaces all of the old functions from version 1.x. This section will walk the user through the same worked example included in version 1.x from Shipley (2009), emphasizing the new syntax.

Shipley included an example dataset in his 2009 paper ‘Confirmatory path analysis in a generalized multilevel context.’ The data are included in this version of the package and alternately hosted in Ecological Archives E090-028-S1 (DOI: 10.1890/08-1034.1). While not actual observations (the data are randomly generated), the hypotheses correspond to actual ecological phenomenon.

Briefly, the data concern a forest survey where 5 individual trees of a particular species are followed at 20 sites every year beginning in 1970. For each individual, there is a measurement of: *the latitude of the site ( lat) *the degree days until break (

`DD`

) `Date`

) `Growth`

) *a binary variable indicating whether the tree is alive (1) or dead (0) (`Survival`

)Shipley notes that this model would be difficult to test using traditional SEM because it represents variation occurring at multiple levels (between sites, between individuals within sites, and between years within individuals within sites), and one response, survival, is not normally distributed.

Shipley hypothesizes that the data adhere to the following hypothesized causal structure:

This example was included as the primary worked dataset in version 1.x of `piecewiseSEM`

.

In the previous version 1.x of the package, the model is constructed using a `list`

and then supplied to various additional functions to measure fit, extract coefficients, and so on. Version 2.0 of the package uses the new function `psem`

and uses `summary`

to extract all that information at once.

Next, we will walk through the analysis of the Shipley data using version 1.x and version 2.0, and compare the output.

```
# Load required packages
library(nlme)
library(lme4)
# Load Shipley data
data(shipley)
# Create list of structural equations
shipley.list <- list(
lme(DD ~ lat, random = ~ 1 | site / tree, na.action = na.omit,
data = shipley),
lme(Date ~ DD, random = ~ 1 | site / tree, na.action = na.omit,
data = shipley),
lme(Growth ~ Date, random = ~ 1 | site / tree, na.action = na.omit,
data = shipley),
glmer(Live ~ Growth + (1 | site) + (1 | tree),
family = binomial(link = "logit"), data = shipley)
)
```

Note the use of mixed effects models to account for the nested non-independence of replicates as well as the use of a generalized linear mixed effects model to account for the binomial distribution of survival.

Next, we extract the goodness-of-fit and path coefficients.

`(old.fit <- sem.fit(shipley.list, shipley, .progressBar = F))`

```
## Warning: `sem.fit` has been replaced. Use `psem` instead of `list`, and
## then call `summary` on that object
```

`## Conditional variables have been omitted from output table for clarity (or use argument conditional = T)`

```
## $missing.paths
## missing.path estimate std.error df crit.value p.value
## 1 Date ~ lat + ... -0.0091 0.1135 18 -0.0798 0.9373
## 2 Growth ~ lat + ... -0.0989 0.1107 18 -0.8929 0.3837
## 3 Live ~ lat + ... 0.0305 0.0297 NA 1.0281 0.3039
## 4 Growth ~ DD + ... -0.0106 0.0358 1329 -0.2967 0.7667
## 5 Live ~ DD + ... 0.0272 0.0271 NA 1.0042 0.3153
## 6 Live ~ Date + ... -0.0466 0.0298 NA -1.5613 0.1184
##
## $Fisher.C
## fisher.c df p.value
## 1 11.53 12 0.484
##
## $AIC
## AIC AICc K n
## 1 49.53 50.069 19 1431
```

`(old.coefs <- sem.coefs(shipley.list))`

```
## Warning: `sem.coefs` has been replaced. Use `psem` instead of `list`, and
## then call `summary` or `coefs` on that object
```

```
## response predictor estimate std.error p.value
## 1 DD lat -0.8354736 0.119422385 0 ***
## 2 Date DD -0.4976475 0.004933274 0 ***
## 3 Growth Date 0.3007147 0.026631405 0 ***
## 4 Live Growth 0.3478536 0.058407891 0 ***
```

Note the warnings produced by calling the old functions.

Now, we repeat the exercise using the new functions. `as.psem`

converts the list to `psem`

object, but you could also run the code using `psem`

instead of `list`

.

```
shipley.psem <- as.psem(shipley.list)
### NOT RUN
# shipley.psem <- psem(
#
# lme(DD ~ lat, random = ~ 1 | site / tree, na.action = na.omit,
# data = shipley),
#
# lme(Date ~ DD, random = ~ 1 | site / tree, na.action = na.omit,
# data = shipley),
#
# lme(Growth ~ Date, random = ~ 1 | site / tree, na.action = na.omit,
# data = shipley),
#
# glmer(Live ~ Growth + (1 | site) + (1 | tree),
# family = binomial(link = "logit"), data = shipley)
#
# )
```

Now we extract a summary of the object.

```
## $name
## [1] "shipley.psem"
##
## $call
## [1] "DD ~ lat\n Date ~ DD\n Growth ~ Date\n Live ~ Growth"
##
## $dTable
## Independ.Claim Estimate Std.Error DF Crit.Value P.Value
## 1 Date ~ lat + ... -0.0091 0.1135 18 -0.0798 0.9373
## 2 Growth ~ lat + ... -0.0989 0.1107 18 -0.8929 0.3837
## 3 Live ~ lat + ... 0.0305 0.0297 NA 1.0281 0.3039
## 4 Growth ~ DD + ... -0.0106 0.0358 1329 -0.2967 0.7667
## 5 Live ~ DD + ... 0.0272 0.0271 NA 1.0035 0.3156
## 6 Live ~ Date + ... -0.0466 0.0298 NA -1.5622 0.1183
##
## $Cstat
## Fisher.C df P.Value
## 1 11.535 12 0.484
##
## $IC
## AIC AICc BIC K n
## 1 49.535 50.237 149.591 19 1431
##
## $coefficients
## Response Predictor Estimate Std.Error DF Crit.Value P.Value
## 1 DD (Intercept) 196.6524 7.6605 1331 25.6708 0.0000
## 2 DD lat -0.8355 0.1194 18 -6.9960 0.0000
## 3 Date (Intercept) 198.3382 1.1842 1330 167.4825 0.0000
## 4 Date DD -0.4976 0.0049 1330 -100.8757 0.0000
## 5 Growth (Intercept) 10.7892 3.5189 1330 3.0661 0.0022
## 6 Growth Date 0.3007 0.0266 1330 11.2917 0.0000
## 7 Live Growth 0.3479 0.0584 1431 5.9556 0.0000
## Std.Estimate
## 1 161.8746 ***
## 2 -0.6877 ***
## 3 250.3448 ***
## 4 -0.6281 ***
## 5 13.7208 **
## 6 0.3824 ***
## 7 NA ***
##
## $R2
## Response family link method Marginal Conditional
## 1 DD gaussian identity none 0.49 0.70
## 2 Date gaussian identity none 0.41 0.98
## 3 Growth gaussian identity none 0.11 0.84
## 4 Live binomial logit delta 0.16 0.18
##
## attr(,"class")
## [1] "summary.psem"
```

The output will look familiar to anyone who has run a regression in R.

First, we have the call, which represents the structural questions.

Next are the AIC and new BIC values, formerly accessible using `sem.aic`

. The two produce identical values.

```
# Old function
sem.aic(shipley.list, shipley, .progressBar = F)$AIC
```

`## [1] 49.53`

`## [1] 49.535`

Note that because the values are stored in the summary output, they are much quicker to access than having to recompute the d-sep tests using `sem.aic`

.

Following the information criterion values, we have the tests of directed separation. As in the previous version, the independence claims are truncated to remove the conditioning variables (they can be shown using the argument `conditional = TRUE`

).

The d-sep tests are, once again, identical between the two versions, but are faster to retrieve from `summary`

than recomputing them from scratch.

```
# Old function
sem.missing.paths(shipley.list, shipley, .progressBar = F)
```

`## Conditional variables have been omitted from output table for clarity (or use argument conditional = T)`

```
## missing.path estimate std.error df crit.value p.value
## 1 Date ~ lat + ... -0.009051378 0.11347661 18 -0.07976426 0.9373049
## 2 Growth ~ lat + ... -0.098862826 0.11072020 18 -0.89290690 0.3836896
## 3 Live ~ lat + ... 0.030496786 0.02966350 NA 1.02809131 0.3039069
## 4 Growth ~ DD + ... -0.010613707 0.03576722 1329 -0.29674399 0.7667083
## 5 Live ~ DD + ... 0.027190372 0.02707564 NA 1.00423751 0.3152641
## 6 Live ~ Date + ... -0.046565402 0.02982419 NA -1.56133000 0.1184459
```

```
## Independ.Claim Estimate Std.Error DF Crit.Value P.Value
## 1 Date ~ lat + ... -0.0091 0.1135 18 -0.0798 0.9373
## 2 Growth ~ lat + ... -0.0989 0.1107 18 -0.8929 0.3837
## 3 Live ~ lat + ... 0.0305 0.0297 NA 1.0281 0.3039
## 4 Growth ~ DD + ... -0.0106 0.0358 1329 -0.2967 0.7667
## 5 Live ~ DD + ... 0.0272 0.0271 NA 1.0035 0.3156
## 6 Live ~ Date + ... -0.0466 0.0298 NA -1.5622 0.1183
```

After the d-sep table is the Fisher’s C statistic, and the results from the \(\chi^2\) test. This information was formerly obtained from `sem.fit`

.

```
# Old function
sem.fit(shipley.list, shipley, .progressBar = F)$Fisher.C
```

```
## Warning: `sem.fit` has been replaced. Use `psem` instead of `list`, and
## then call `summary` on that object
```

`## Conditional variables have been omitted from output table for clarity (or use argument conditional = T)`

```
## fisher.c df p.value
## 1 11.53 12 0.484
```

```
## Fisher.C df P.Value
## 1 11.535 12 0.484
```

As with all other functions, the values are exactly the same although the underlying functions have been rewritten to be more efficient.

Next, we have the path coefficients and the standardized estimates.

```
# Old function
sem.coefs(shipley.list, shipley)
```

```
## Warning: `sem.coefs` has been replaced. Use `psem` instead of `list`, and
## then call `summary` or `coefs` on that object
```

```
## response predictor estimate std.error p.value
## 1 DD lat -0.8354736 0.119422385 0 ***
## 2 Date DD -0.4976475 0.004933274 0 ***
## 3 Growth Date 0.3007147 0.026631405 0 ***
## 4 Live Growth 0.3478536 0.058407891 0 ***
```

`sem.coefs(shipley.list, shipley, standardize = "scale")`

```
## Warning: `sem.coefs` has been replaced. Use `psem` instead of `list`, and
## then call `summary` or `coefs` on that object
```

```
## Warning in get.scaled.data(modelList, data, standardize): One or more
## responses not modeled to a normal distribution: keeping response(s) on
## original scale!
```

```
## response predictor estimate std.error p.value
## 1 DD lat -0.7014051 0.100258794 0 ***
## 2 Date DD -0.6281367 0.006226838 0 ***
## 3 Growth Date 0.3824224 0.033867469 0 ***
## 4 Live Growth 2.2268596 0.373932623 0 ***
```

```
## Response Predictor Estimate Std.Error DF Crit.Value P.Value
## 1 DD (Intercept) 196.6524 7.6605 1331 25.6708 0.0000
## 2 DD lat -0.8355 0.1194 18 -6.9960 0.0000
## 3 Date (Intercept) 198.3382 1.1842 1330 167.4825 0.0000
## 4 Date DD -0.4976 0.0049 1330 -100.8757 0.0000
## 5 Growth (Intercept) 10.7892 3.5189 1330 3.0661 0.0022
## 6 Growth Date 0.3007 0.0266 1330 11.2917 0.0000
## 7 Live Growth 0.3479 0.0584 1431 5.9556 0.0000
## Std.Estimate
## 1 161.8746 ***
## 2 -0.6877 ***
## 3 250.3448 ***
## 4 -0.6281 ***
## 5 13.7208 **
## 6 0.3824 ***
## 7 NA ***
```

```
# The new coefs() function is much faster too
coefs(shipley.psem)
```

```
## Response Predictor Estimate Std.Error DF Crit.Value P.Value
## 1 DD (Intercept) 196.6524 7.6605 1331 25.6708 0.0000
## 2 DD lat -0.8355 0.1194 18 -6.9960 0.0000
## 3 Date (Intercept) 198.3382 1.1842 1330 167.4825 0.0000
## 4 Date DD -0.4976 0.0049 1330 -100.8757 0.0000
## 5 Growth (Intercept) 10.7892 3.5189 1330 3.0661 0.0022
## 6 Growth Date 0.3007 0.0266 1330 11.2917 0.0000
## 7 Live Growth 0.3479 0.0584 1431 5.9556 0.0000
## Std.Estimate
## 1 161.8746 ***
## 2 -0.6877 ***
## 3 250.3448 ***
## 4 -0.6281 ***
## 5 13.7208 **
## 6 0.3824 ***
## 7 NA ***
```

Note two items: *the standardized estimates and their standard errors are now reported by default at the end of the coefficients table (column Std.Estimate). *the standardized estimate for the logistic regression is different, based on new calculations (see Section 2.2)

For the moment, version 2.0 only reports the scaled estimates (based on the standard deviations of the response and predictor). Future versions will include a range standardization.

Finally, `summary`

reports the individual model R^{2} values. These were formerly obtained using `sem.model.fits`

which was confusing. I have not included the old calculations in the package as the new `rsquared`

function includes new calculations. However, this information is useful and therefore is now reported alongside the d-sep tests, Fisher’s C, and path coefficients.

The function `partial.resid`

has been replaced with `partialResid`

although the syntax is the same. The plotting output, however, has been eliminated.

`partialResid`

extracts the partial effects of `y ~ x`

in a multiple regression `y ~ x + Z`

. For example:

```
# Generate data
dat <- data.frame(y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100))
# Build model
model <- lm(y ~ x1 + x2, dat)
# Compute partial residuals of y ~ x1
yresid <- resid(lm(y ~ x2, dat))
xresid <- resid(lm(x1 ~ x2, dat))
# Use partialResid
presid <- partialResid(y ~ x1, model)
par(mfrow = c(1, 2))
plot(yresid, xresid)
plot(presid) # identical plot!
```

`sem.lavaan`

has not yet been ported to version 2.0, and it may not, as there was some confusion how multi-level models were translated to the variance-covariance framework (hint: they weren’t, only the formulae were transferred).

`sem.plot`

has also not yet been ported to version 2.0, since it was of limited utility. It may make an appearance in the future.

Shipley, Bill. “A new inferential test for path models based on directed acyclic graphs.” Structural Equation Modeling 7.2 (2000): 206-218.

Shipley, Bill. Cause and correlation in biology: a user’s guide to path analysis, structural equations and causal inference. Cambridge University Press, 2002.

Shipley, Bill. “Confirmatory path analysis in a generalized multilevel context.” Ecology 90.2 (2009): 363-368.

Shipley, Bill. “The AIC model selection method applied to path analytic models compared using a d-separation test.” Ecology 94.3 (2013): 560-564.