Out of the field and into the fRying pan

Blog Notes: ABI 150 Section 001

We are rapidly nearing the end of the quarter and are one week out from our paper due dates. We ended data collection last Thursday with a… drumroll… 1333 observations of stilt behavior. Each of us knows what a tremendous, hard-fought dataset this was. It turns out, birds are pretty good at moving around and they made us really work for that number. Just as an example, we recorded more than 5000 foraging beak dips. Wow!

To get the most out of our data set, Heuijae asked us to come prepared for today’s session with portions of our manuscript completed to allow us to focus on data analysis. Our goals for today were:

A clear/refined research question
Confirmed x and y (predictor and response variables)
Graph type and possible analyses
Some progress on R

We have a canonical class dataset, aptly named “dataset” on the Google Drive. In addition to our tremendous effort in the field, this dataset represents a stunning example of work in the background by many members of the class. Sheets were paired effectively. Sheets were not lost. Rows were not skipped. Typos were fixed. We have really come together in the past 3-4 weeks and I think the results, in having a clean dataset that soon after data collection, speak for themselves.

Heuijae reminded us again of our convenient matrix for deciding on appropriate analyses and as a first step to decide two things about your data:

Response vs. predictor variables: “Whether distance affects foraging rate”
Continuous vs. categorical vs. binary

These are two prerequisites to deciding on analysis type and visualization. On that topic, what if you are trying to predict a large number of categories simultaneously? Perhaps the composition of the species near stilts or the time budget of stilts in different situations?

Making Sense of Multivariate Statistics: PCA, NMDS, and Wildlife Communities

Sometimes, tackling complex datasets can feel overwhelming. Thankfully, Ryan recently worried about this problem for us, kicking things off with a digestible introduction to the world of multivariate statistics.

To set the stage, Ryan clarified some foundational terminology. While univariate analyses deal with a simple one-to-one relationship (one input predicting one output), multivariate analyses come into play when you are measuring multiple response variables simultaneously within the same dataset. (He also noted that while this sounds similar to multivariable analysis, the two concepts are actually opposite in their structure).

Grounding the Math in Ecology

To ground these abstract concepts in reality, we looked at a dataset featuring counts of Indian wildlife captured via trail cameras. The working hypothesis for this case study was straightforward: there is a distinct difference in the composition of animal communities between the "lower" and "upper" regions of the study area.

Visualizing High-Dimensional Data: PCA vs. NMDS

When dealing with higher-dimensional ecological data, visualization is essential. Ryan contrasted two popular techniques used to make sense of this data: Principal Component Analysis (PCA) and Non-metric Multidimensional Scaling (NMDS).

While both help us visualize complex datasets, they differ fundamentally in their mechanics—specifically, the distinction between dimensionality reduction and ordination:

PCA relies on creating synthetic variables to reduce dimensions. It is typically visualized by plotting the first two principal components (often resulting in a familiar biplot of clustered dots and directional arrows).
NMDS, on the other hand, operates using a distance matrix. It applies a specified metric to calculate the distances between data points. Crucially, visualization in NMDS relies entirely on the ranking of these differences. Because it is rank-based, the specific numerical value of the distance metric itself isn't intrinsically meaningful—it's the relative distance between points that matters.

Testing the Hypothesis with PERMANOVA

Visualization is only part of the battle. The third step in this analytical workflow is to test for actual statistical significance. For this, Ryan introduced PERMANOVA, a robust statistical test used to evaluate the differences in community composition between samples.

For those following along and analyzing the data in R, Ryan provided the necessary code to run a PERMANOVA. (A quick reminder: don't forget to load your required packages, particularly the vegan package!)

Looking Ahead

The introduction provided a great toolkit for analyzing community data, but it also left us with some compelling questions to explore moving forward:

How can we predict the species composition of neighboring animals throughout the course of a day?
How do we model and predict an animal's behavioral budget as a complex, multivariate response?

With these tools now in hand, tackling those questions feels a lot more achievable.

After the class sat in stunned silence after the many truth bombs Ryan dropped on us and perhaps reflecting on all the cool things we could use multivariate statistics for in our dataset, Heuijae nearly brought us to tears with a stunningly well-organized set of R notes from the entire quarter.

For the last hour and change of class it was open-season on R bugs. Groups formed around the class based on the types of analyses people were doing to facilitate more effective peer-peer troubleshooting, and Marshall and Ryan floated around, occasionally writing on whiteboards and gesticulating wildly to convey enthusiasm for different pairings of analyses and visualizations.

I don’t want to steal anyone’s thunder, but we saw quite a few clever ways to cut up, summarize, visualize, and analyze our dataset. The progress you have all made has been inspirational to see.

As a counterpoint, R is not without its difficulties. As a reminder of how hard won these successes (binomial model pun intended) are, I’ll add this quote from one concerned student:

“[R has been]… A test of my patience, unmatched by anything else I’ve experienced during my time at UC Davis”
— Anonymous student, experiencing versioning issues with Mac OS. ☠

We have one more long session and it will start with:

Time	Activity
10:00	Turn in notebooks
10:10 - 10:20	Closing remarks
10:20 - 11:30	Group by analysis again, individual R worktime with instructor support
11:30 - 11:50	Any takeaways, any stunning findings to share?
11:50 - 12:30	Lunch
12:30	Hand back notebooks (Note: attendance is voluntary after this and is basically R office hours)
12:40 - 1:40	Work session
1:40 - 1:50	Break
1:50 - 2:50	Work session
2:50 - 3:00	Break
3:00 - 3:50	Work session

Search This Blog

ABI 150 Field Notes