class: center, middle, inverse, title-slide # Social Media for Population Research ### Monica Alexander
University of Toronto ### CAnD3 Workshop, 28 April 2021 --- # Overview -- ### Motivation -- ### Types and uses of social media data -- ### How do we get it -- ### Issues and what don't we know -- ### Workshop content --- # Intros -- ### Monica Alexander - Assistant Professor, Statistics and Sociology - Demographer by training - Not Canadian - Spends too much time on Twitter -- ### Michael Chong - PhD Candidate, Statistics - Slowly being convinced to become a population researcher - Can answer any questions about R (and Taylor Swift) that I can't - Spends an appropriate amount of time on Twitter --- class: center, middle, inverse # Motivation --- # Motivation -- - With the rise of social media comes the rise of data -- - Data about people + characteristics, movements, connections, interests, views... -- - Potentially rich data sources for population research -- - Strengths that traditional data sources often don't have + timely + large sample sizes + granular information -- - But potential drawbacks as well (more later) --- class: center, middle, inverse # Types and uses of social media data --- # Types of social media data -- ### By platform -- - Twitter -- - Facebook -- - LinkedIn -- - Instagram, TikTok... -- - Springer, Google Scholar, arXiv --- # Types of social media data -- ### By data collected -- - Population data from advertising platforms + aggregate level population estimates by subgroup + useful in demographic estimation + e.g. [Estimating out-migration from Puerto Rico](https://onlinelibrary.wiley.com/doi/abs/10.1111/padr.12289) -- - Individual digital trace data + Individual profiles, friends and behavior + Text / media data, networks + Use to study networks, information spread, opinions, etc + e.g. [Assessing the Russian Internet Research Agency’s impact on the political attitudes and behaviors of American Twitter users in late 2017](https://www.pnas.org/content/pnas/117/1/243.full.pdf) + e.g. [Geography of Twitter networks](https://www.sciencedirect.com/science/article/pii/S0378873311000359?casa_token=_ihyPFLiN3wAAAAA:wlhmucP0-sdp_vwynl23QWEPWjNdj0w1lVAgzA8UjOQzcArTguH2BFt48MVXQc3Vkc-K3cui7Ws) --- # Types of social media data ### By data collected -- - Survey data + Use social media platforms to reach and survey people + Timely, hard-to-reach populations, cheap? + e.g. [Behaviours and attitudes in response to the COVID-19 pandemic: insights from a cross-national Facebook survey](https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-021-00270-1) -- - Detailed user data + Know everything about your userbase + Probably only when you work at these places + e.g. [Measuring Long-Term Displacement Using Facebook Data](https://research.fb.com/publications/measuring-long-term-displacement-using-facebook-data/) + e.g. [Country Differences in Social Comparison on Social Media](https://research.fb.com/publications/country-differences-in-social-comparison-on-social-media/) --- class: center, middle, inverse # How do we get it --- # How do we get it -- ### Manually? - I guess you could copy paste info you view on a social media webpage - Slow, hard, not reproducible -- ### Webscraping - Write code that visit a webpage and extracts data - More automated and reproducible, but often not best practice - Legality and ethical considerations depend on the specifics of what you're doing - Some good tips and principles are [here](https://www.tellingstorieswithdata.com/gather-data.html#scraping). --- # How do we get it -- ### Application Programming Interfaces (APIs) -- - A tool that makes it easier for a computer to query a website's data - Most social media websites have APIs, that let you programmatically query and extract data - e.g. [Facebook marketing API](https://developers.facebook.com/docs/marketing-apis/) - [Twitter API](https://developer.twitter.com/en/docs) - e.g. [arXiv API](https://arxiv.org/help/api/) -- ### R packages that help to query APIs -- - Many pacakges exist to help R users query APIs - We will be using two: `rtweet` and `tidygeocoder` - Others e.g. `rfacebookstat`, `Rlinkedin`, `scholar`... --- class: center, middle, inverse # Issues and what we don't know --- # Non-representativeness -- - As population researchers, we are generally interested in studying phenomena in the broader population (or specific subgroups of the broader population) - But in general, people who use social media are not at all representative of the broader population -- - How do we adjust for this? + some examples in the demographic estimation realm. Can use a 'gold standard' data source to estimate and adjust for biases + e.g. see [here](https://link.springer.com/article/10.1007%2Fs11113-020-09599-3) and [here](https://www.demogr.mpg.de/papers/working/wp-2020-019.pdf) + but harder to think about in other realms. How do we generalize networks, or opinions/behavior seen online? - If we can only infer based on the population for which we have data for, how useful is it? --- # Self-reporting and other biases -- - A lot of information is self-reported - e.g. self-reported locations, public bio information - No demographic information is available on Twitter + past efforts have imputed based on e.g. profile pictures, names + error prone, may be inflating bias - How do we check for biases? - How do we make sure analyses are robust to assumptions? -- Some good references: - [Demography in the Digital Era: New Data Sources for Population Research](https://osf.io/preprints/socarxiv/24jp7/) - [Promises and Pitfalls of Using Digital Traces for Demographic Research](https://link.springer.com/content/pdf/10.1007/s13524-018-0715-2.pdf) --- class: center, middle, inverse # This workshop --- # What it is and what it isn't -- ### What we will be doing - A brief introduction to how get extract and work with Twitter data - Introduction to not only querying APIs, but also **geocoding**, **mapping**, and **text analysis** - Useful skills and code to take with you and think about how to apply in your own research -- ### But think about - How can I draw inferences about the population I'm interested in? - How can I check the robustness of my analysis? - How can I make my research open and reproducible? --- # Plan for the rest of the workshop -- - Run through the three modules -- - Assuming you've installed everything and gone through the `set_up` R code, which checks to see if everything works -- - I will share my screen and go through the code in the R Markdown files -- - Any questions / comments put in chat --- # Let's get started! ### Contact info <a href="mailto:monica.alexander@utoronto.ca"><i class="fa fa-paper-plane fa-fw"></i> monica.alexander@utoronto.ca</a><br> <a href="monicaalexander.com"><i class="fa fa-link fa-fw"></i> monicaalexander.com</a><br> <a href="http://twitter.com/monjalexander"><i class="fa fa-twitter fa-fw"></i> @monjalexander</a><br> <a href="http://github.com/MJAlexander"><i class="fa fa-github fa-fw"></i> @MJAlexander</a><br>