🔍 Discourse Analysis Research Pipeline

Track the discursive normalisation of terms across YouTube, Reddit and Bluesky

Platform Setup
Choose the platform you want to collect data from, then enter your credentials.
1 — Select a Platform
▶️
YouTube
Video comments & metadata. Best for tracking mainstream political commentary channels.
API key required
🟠
Reddit
Posts & comments across subreddits. Excellent for tracking fringe-to-mainstream migration.
API key required
🦋
Bluesky
Posts & replies via AT Protocol. Growing platform with strong political discourse communities.
App password required
📋 How to get a YouTube API Key
  1. Go to console.cloud.google.com
  2. Create a new project (or select an existing one)
  3. Click APIs & Services → Enable APIs
  4. Search for and enable YouTube Data API v3
  5. Go to APIs & Services → Credentials
  6. Click Create Credentials → API Key
  7. Copy the key and paste it below

⚠️ Quota note: Free tier gives 10,000 units/day. Each search costs ~100 units.

Your key is only stored locally in this session and sent only to your own backend server.
📋 How to get Reddit API credentials
  1. Log in to Reddit, then go to reddit.com/prefs/apps
  2. Scroll down and click Create App (or Create Another App)
  3. Give it a name (e.g. discourse_research)
  4. Select script as the app type
  5. Set the redirect URI to http://localhost:8080
  6. Click Create App
  7. Your Client ID is the string under the app name. The Client Secret is labelled "secret".
Start broad. Add or remove subreddits to focus your search. Examples: ukpolitics, conspiracy, casualuk
📋 How to get Bluesky credentials
  1. Log in to Bluesky at bsky.app
  2. Go to Settings → Privacy and Security → App Passwords
  3. Click Add App Password
  4. Give it a name (e.g. discourse_research) and click Create App Password
  5. Copy the password shown (it will only be displayed once) and paste it below

ℹ️ App passwords are separate from your main Bluesky password. They can be revoked at any time from the same settings page without affecting your account.

Your full Bluesky handle including .bsky.social
The app password you created above — not your main account password
2 — Session Title & Time Window
A short descriptive title. A folder named [title]_[date]_[time] will be created automatically to store all data and results from this session.
Recommended: 2014 for a full 10-year window
Higher = richer data but slower collection and more API quota used
Search Terms
Enter the terms you want to track. The system will search for these across the selected platform and time window.
Enter Search Terms
💡 Tip: Enter one term per line. You can include variant spellings or related phrases — the system will search for each separately. For this research, terms like great replacement, real british people and two-tier policing are good starting points.
Each term is searched independently. Variants (e.g. "great replacement" and "replacement theory") give richer longitudinal coverage.
⚠️ A note on sensitive terms: This tool is designed for academic research into discursive normalisation. The terms you enter may surface disturbing content. Results are stored locally on your own machine and are not sent anywhere beyond the platform APIs.
Suggested Terms for This Research
🌍 Transnational (Global)
great replacement
replacement theory
demographic replacement
white replacement
🇬🇧 Local (UK)
real british people
ordinary british people
genuine british
legacy british
⚖️ Procedural
two-tier policing
two tier justice
one rule for
unequal treatment
Data Collection
Once configured, start the collection process. This runs in the background — you can monitor progress below.
Collection Summary
Complete the Setup and Terms steps first, then return here to begin collecting.

Live Log

Analysis
Run NLP analysis on collected data. This generates sentiment scores, framing classification (critical / neutral / affirmative) and pattern detection.
What the analysis produces
📉 Sentiment Distribution
Proportion of positive / neutral / negative comments over time
🔖 Framing Classification
Whether terms appear with critical distancing, neutrally, or with affirmative endorsement — the core normalisation indicator
📈 Normalisation Over Time
Framing ratio tracked month by month — shows the marked-to-unmarked transition
☁️ Word Cloud
Most frequent terms in the collected corpus

Live Log

Results
Download your data files and view visualisations generated by the analysis.

Click Refresh Results to load available outputs.

Help & Guide
Everything you need to get started.
Getting Started — Step by Step

Step 1 — First-time setup (one time only)

You need Python 3 installed. Check by opening Spotlight (⌘ Space), typing Terminal, and running:

python3 --version

If you see a version number (e.g. Python 3.11.x) you are ready. If not, download Python 3 from python.org/downloads/macos — this is the only time you will need Terminal.


Step 2 — Start the tool

In Finder, open the folder containing your downloaded files. You will see a file called Launch Discourse Tool.command.

First time only: Right-click (or Control-click) the .command file and choose Open, then click Open in the security dialog. After that, you can double-click it normally every time.

A black terminal window will appear, install any missing packages automatically, and open this interface in your browser. Keep that black window open while you use the tool — closing it stops the server.


Step 3 — Use the tool

The browser interface opens automatically. Follow the tabs from left to right: Setup → Search Terms → Collect → Analyse → Results.


Platform Credentials

  • YouTube: Requires a free Google Cloud account. Full step-by-step instructions appear when you select YouTube in the Setup tab.
  • Reddit: Requires a free Reddit account. Full instructions appear when you select Reddit in the Setup tab.
  • Bluesky: Requires a Bluesky account and an app password (not your main password). Go to Settings → Privacy and Security → App Passwords in Bluesky to create one. Full instructions appear in the Setup tab.

Understanding the Results

  • Framing Over Time chart: This is the most important output for normalisation research. A rising affirmative line and falling critical line indicates the term is moving from marked (used with distancing) to unmarked (used without distancing) — the core signal of discursive normalisation.
  • Toxicity Over Time chart: Shows mean negativity score per month. Spikes often correspond to real-world events.
  • CSV files: The full dataset with all scores and classifications, for further analysis in Excel or R.

Common Problems

  • "Server offline" in sidebar: The launcher window has been closed. Double-click Launch Discourse Tool.command again to restart.
  • Security warning on first launch: Right-click the .command file and choose Open, then confirm. This is a one-time step — after that, double-clicking works normally.
  • YouTube quota exceeded: You have used your 10,000 daily API units. Wait until the next day or reduce the number of terms / years.
  • Reddit "prawcore.exceptions": Check your Client ID and Client Secret are entered correctly.
  • Bluesky "AtProtocolError": Check your handle includes .bsky.social and that you are using an app password, not your main account password.
  • No results after analysis: Make sure collection completed successfully before running analysis.