Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
58b2ea5
initial boilerplate for data dumps (mirror Metasmoke)
Oaphi Jan 20, 2026
88fd12d
dups should have a required title & optional comment
Oaphi Jan 20, 2026
5997829
initial boilerplate for data dump model
Oaphi Jan 20, 2026
8fca596
first data dump fixtures (fixing tests)
Oaphi Jan 20, 2026
efa5fa5
initial boilerplate for data dumps (mirror Metasmoke)
Oaphi Jan 20, 2026
540e07c
dups should have a required title & optional comment
Oaphi Jan 20, 2026
dec449a
initial boilerplate for data dump model
Oaphi Jan 20, 2026
bb6d5f8
first data dump fixtures (fixing tests)
Oaphi Jan 20, 2026
5aba30c
Create scaffolding for pulling data
ArtOfCode- May 12, 2026
96a1b4d
Skeleton processing
ArtOfCode- May 12, 2026
b35e5d1
Merge branch 'develop' into 0valt/1918/data-dump
ArtOfCode- May 12, 2026
47be0f2
Successfully copying data to dump database
ArtOfCode- May 13, 2026
25aed98
Export data and create Dump record
ArtOfCode- May 13, 2026
46b2430
Dump management
ArtOfCode- May 13, 2026
9052b90
Rubocop
ArtOfCode- May 13, 2026
a2be721
Add data page
ArtOfCode- May 13, 2026
2bece2e
Merge branch '0valt/1918/data-dump' into art/data-dumps
ArtOfCode- May 13, 2026
d15877c
Add link to footer
ArtOfCode- May 14, 2026
a5f8276
Test data dump job
ArtOfCode- May 14, 2026
73aa095
Forgot the host
ArtOfCode- May 14, 2026
bf8b7a7
Shooting in the dark now
ArtOfCode- May 14, 2026
eb35540
Not sure where you're getting 2 jobs from, minitest
ArtOfCode- May 14, 2026
4724e15
Look as long as it's _doing_ it...
ArtOfCode- May 14, 2026
1740d3c
If this worked the whole time I'm going to be annoyed
ArtOfCode- May 14, 2026
ff95085
Thanks rubocop
ArtOfCode- May 14, 2026
dd1df8e
Access denied error handling
ArtOfCode- May 15, 2026
f64ea30
Handle no-dumps-yet case
ArtOfCode- May 15, 2026
a7318dc
Apply suggested patch
Oaphi May 18, 2026
10c7ec5
Correct error class
ArtOfCode- May 18, 2026
05bc65d
Add port and SSL state to commands
ArtOfCode- May 18, 2026
3a2d349
Apparently that option's deprecated
ArtOfCode- May 18, 2026
5fe6017
Missed a flag
ArtOfCode- May 18, 2026
8fcb578
Fix 'unknown variable' error
Oaphi May 18, 2026
7f7b8cb
Rubocop cleanup
Oaphi May 18, 2026
443eca5
Update ci-cd's workflow MySQL image to sync with our Docker setup
Oaphi May 18, 2026
f309d1f
Remove outdated build comment from Dockerfile.db
Oaphi May 19, 2026
2c1d070
Always report mysql & mysqldump versions during ci-cd runs
Oaphi May 19, 2026
7b8f9bf
Synchornize mysql & mysqldump client versions
Oaphi May 19, 2026
e807b13
Reformat table and add checksum
ArtOfCode- May 19, 2026
e62d1d5
Rubocop fixes
Oaphi May 19, 2026
2088c7f
Move 'execute' wrapper to ApplicationJob (QoL)
Oaphi May 19, 2026
4015f06
Remove '$VERBOSE' silencing as it's no longer needed
Oaphi May 19, 2026
4c9389e
Switch data dumps job from perform_later to perform_now (queue issues)
Oaphi May 19, 2026
43af8a3
Trim permitted columns from review
ArtOfCode- May 20, 2026
677ce95
Add caution to exec helper
ArtOfCode- May 20, 2026
34ebc36
Helps if you name the columns right
ArtOfCode- May 20, 2026
ee45a34
Missed an alias
ArtOfCode- May 20, 2026
f9704c1
More Problems (tm)
ArtOfCode- May 20, 2026
e0cf419
Correct MariaDB check
ArtOfCode- May 20, 2026
0cdb646
Exclude filter names
ArtOfCode- May 20, 2026
b4ac29d
Add default value for filter names
ArtOfCode- May 20, 2026
1b45890
Merge branch 'develop' into art/data-dumps
ArtOfCode- May 20, 2026
28ce0f4
Drop post_id from flags
ArtOfCode- May 20, 2026
f6758eb
Add schema change checker
ArtOfCode- May 20, 2026
b0831d8
First attempt at a workflow
ArtOfCode- May 20, 2026
aa6bfe3
Helps if you checkout the repo
ArtOfCode- May 20, 2026
3296db0
Bundle it?
ArtOfCode- May 20, 2026
218faf9
Could've sworn I added that
ArtOfCode- May 20, 2026
62cbd4f
Try refs instead
ArtOfCode- May 20, 2026
b8be1ee
Disambiguate
ArtOfCode- May 20, 2026
eb71790
Maybe it doesn't exist
ArtOfCode- May 20, 2026
5c76d74
I'm guessing at this point
ArtOfCode- May 20, 2026
c69acc7
Specify language
ArtOfCode- May 20, 2026
7dcbc30
Remove user create/update timestamps
ArtOfCode- May 20, 2026
3d2a29d
actions/checkout should be enough to get repo contents
Oaphi May 21, 2026
aea3617
Let's not tie to GitHub-specific naming conventions
Oaphi May 21, 2026
8d45d08
Commit schema changes
Oaphi May 21, 2026
e38549d
Only install octokit in development & test - production doesn't need …
Oaphi May 21, 2026
6f5f956
Fix registrations controller test due to changed created_at default f…
Oaphi May 21, 2026
55cab38
Exclude MySQL Docker data files from Rubocop inspection
Oaphi May 21, 2026
112c696
Make add_user_creation_defaults migration reversible
Oaphi May 21, 2026
b5e38ce
Make add_more_default_values migration reversible
Oaphi May 21, 2026
d561a80
Only apply timestamp defaults to the data_dump database
Oaphi May 21, 2026
3ac248e
Ensure registrations controller test checks for timestamp defaults
Oaphi May 21, 2026
cfde135
Rubocop fixes
Oaphi May 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 29 additions & 4 deletions .github/workflows/ci-cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ env:
permissions:
actions: write
contents: read
issues: read
pull-requests: read
issues: write
pull-requests: write

jobs:
rubocop:
Expand Down Expand Up @@ -62,7 +62,7 @@ jobs:

services:
mysql: &db-service
image: mysql:8.0
image: mysql:8.4.2 # please keep the image in sync with Dockerfile.db
env:
MYSQL_ROOT_HOST: '%'
MYSQL_ROOT_PASSWORD: 'root'
Expand All @@ -80,7 +80,13 @@ jobs:
- name: Setup dependencies
run: |
sudo apt-get -qq update
sudo apt-get -yqq install libmariadb-dev libmagickwand-dev
sudo apt-get -yqq install libmariadb-dev \
libmagickwand-dev \
mariadb-client
- name: Report database client versions
run: |
mysql --version
mysqldump --version
- name: Setup Ruby
uses: ruby/setup-ruby@v1
with:
Expand Down Expand Up @@ -181,3 +187,22 @@ jobs:
-i ~/.ssh/deploy.key \
"$SSH_USER"@"$SSH_IP" \
"sudo su -l qpixel /var/apps/deploy-dev"

db_changes:
name: Check for data dump changes
runs-on: ubuntu-latest
if: ${{ github.event_name == 'pull_request' && github.actor != 'dependabot[bot]' }}

steps:
- name: Checkout repo
uses: actions/checkout@v3
- name: Setup Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: 3.4
bundler-cache: true
- name: Check for changes
run: |
bundle exec ruby lib/database_changes_checker.rb origin/${{ github.event.pull_request.base.ref }} \
origin/${{ github.event.pull_request.head.ref }} ${{ secrets.GITHUB_TOKEN }} \
${{ github.event.pull_request.number }}
1 change: 1 addition & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ AllCops:
Exclude:
- 'config/**/*'
- 'db/schema.rb'
- 'docker/mysql/**/*'
- 'scripts/**/*'
- 'bin/**/*'
- 'lib/namespaced_env_cache.rb'
Expand Down
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ end

group :development, :test do
gem 'byebug', '~> 11.1'
gem 'octokit', '~> 10.0'
end

group :development do
Expand Down
16 changes: 16 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,12 @@ GEM
erb (6.0.3)
erubi (1.13.1)
execjs (2.10.1)
faraday (2.14.2)
faraday-net_http (>= 2.0, < 3.5)
json
logger
faraday-net_http (3.4.3)
net-http (~> 0.5)
fastimage (2.4.1)
ffi (1.17.4-x86_64-linux-gnu)
flamegraph (0.9.5)
Expand Down Expand Up @@ -250,6 +256,8 @@ GEM
mutex_m (0.3.0)
mysql2 (0.5.7)
bigdecimal
net-http (0.9.1)
uri (>= 0.11.1)
net-imap (0.6.3)
date
net-protocol
Expand All @@ -263,6 +271,9 @@ GEM
nokogiri (1.19.2-x86_64-linux-gnu)
racc (~> 1.4)
observer (0.1.2)
octokit (10.0.0)
faraday (>= 1, < 3)
sawyer (~> 0.9)
omniauth (2.1.4)
hashie (>= 3.4.6)
logger
Expand Down Expand Up @@ -411,6 +422,9 @@ GEM
sprockets (> 3.0)
sprockets-rails
tilt
sawyer (0.9.3)
addressable (>= 2.3.5)
faraday (>= 0.17.3, < 3)
securerandom (0.4.1)
selenium-webdriver (4.10.0)
rexml (~> 3.2, >= 3.2.5)
Expand Down Expand Up @@ -456,6 +470,7 @@ GEM
unicode-display_width (3.2.0)
unicode-emoji (~> 4.1)
unicode-emoji (4.2.0)
uri (1.1.1)
useragent (0.16.11)
warden (1.2.9)
rack (>= 2.0.9)
Expand Down Expand Up @@ -525,6 +540,7 @@ DEPENDENCIES
mutex_m (~> 0.3)
mysql2 (~> 0.5.4)
net-smtp (~> 0.3)
octokit (~> 10.0)
omniauth (~> 2.1)
premailer-rails (~> 1.11)
puma (~> 5.6)
Expand Down
1 change: 1 addition & 0 deletions INSTALLATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ the MySQL server with `sudo mysql -u root` and create a new database user for QP
CREATE USER qpixel@localhost IDENTIFIED BY 'choose_a_password_here';
GRANT ALL ON qpixel_dev.* TO qpixel@localhost;
GRANT ALL ON qpixel_test.* TO qpixel@localhost;
GRANT ALL ON qpixel_dump.* TO qpixel@localhost;
GRANT ALL ON qpixel.* TO qpixel@localhost;
```

Expand Down
4 changes: 4 additions & 0 deletions app/assets/stylesheets/utilities.scss
Original file line number Diff line number Diff line change
Expand Up @@ -381,6 +381,10 @@ span.spoiler {
white-space: nowrap;
}

.fit-content {
min-width: fit-content;
}

@each $side in $sides {
.border-#{$side}-none {
border-#{$side}: none;
Expand Down
8 changes: 8 additions & 0 deletions app/controllers/dumps_controller.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
class DumpsController < ApplicationController
before_action :authenticate_user!

def index
@latest = Dump.automatic.last
@others = Dump.manual
end
end
2 changes: 2 additions & 0 deletions app/helpers/dumps_helper.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
module DumpsHelper
end
8 changes: 8 additions & 0 deletions app/jobs/application_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,14 @@ def initialize(*args, **opts)
super
end

# Executes a given SQL statement in the context of the current connection
# @note CAUTION: This method does NOT parametrize or escape the SQL statement in any way. YOU are responsible
# for ensuring +sql+ is safe.
# @param [String] sql SQL statement to execute
def exec(sql)
ApplicationRecord.connection.execute(sql)
end

def logger
Rails.job_logger.tagged(self.class.name, @job_id)
end
Expand Down
117 changes: 117 additions & 0 deletions app/jobs/data_dump_job.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
class DataDumpJob < ApplicationJob
queue_as :default

DEFAULT_TIMESTAMP = '1970-01-01 00:00:00'.freeze

def perform(drop_db_after: true)
permitted = YAML.safe_load_file(Rails.root.join('db/scripts/dump_permitted_columns.yml'))
logger.info "Found #{permitted&.size} tables to dump."

begin
exec('SET FOREIGN_KEY_CHECKS = 0;')
exec('DROP DATABASE IF EXISTS qpixel_dump;')
exec('CREATE DATABASE qpixel_dump;')

@db_creds = Rails.configuration.database_configuration[Rails.env]
@username = @db_creds['username']
@password = @db_creds['password']
@database = @db_creds['database']
@port = @db_creds['port']
@host = @db_creds['host']

mysqldump_command = build_command('mysqldump', '-h', @host, '-u', @username, "-p#{@password}", '-d', @database,
'--no-tablespaces', "--port=#{@port}", ssl_state)
mysql_command = build_command('mysql', '-h', @host, '-u', @username, "-p#{@password}", "--port=#{@port}",
'-D', 'qpixel_dump', ssl_state)
logger.debug 'Running system command:'
logger.debug "#{mysqldump_command} | #{mysql_command}"
copy_success = system("#{mysqldump_command} | #{mysql_command}")

unless copy_success
logger.fatal "Couldn't replicate database: nonzero exit code"
return
end

logger.info 'Copied database structure.'

initialize_defaults

logger.info 'Initialized defaults.'

permitted&.each do |table, data|
migrate_table(table, data)
end

logger.info 'Migrated data.'

file_path = Rails.root.join('tmp/qpixel_export.sql')
export_cmd = build_command('mysqldump', '-h', @host, '-u', @username, "-p#{@password}", "--port=#{@port}",
'qpixel_dump', '--no-tablespaces', ssl_state, '>', file_path)
logger.debug 'Running system command:'
logger.debug export_cmd
export_success = system(export_cmd)

unless export_success
logger.fatal "Couldn't export database: nonzero exit code"
return
end

logger.info 'Exported database.'

raw_checksum = `sha256sum #{file_path}`.split[0]
logger.debug "Export checksum: #{raw_checksum}"

dump = Dump.create(title: "Data Dump #{Time.now.strftime('%Y-%m-%d')}",
comment: "Automatically generated data dump as of #{Time.now.strftime('%Y-%m-%d %H:%M:%S')}.",
file: File.open(file_path),
automatic: true,
checksum: "SHA256:#{raw_checksum.chars.in_groups_of(8).map(&:join).join('-')}")
Dump.where(automatic: true).where.not(id: dump.id).destroy_all
rescue ActiveRecord::ConnectionFailed
logger.fatal "Couldn't connect to database. Have you run `GRANT ALL ON qpixel_dump.*` for your DB user?"
ensure
exec('SET FOREIGN_KEY_CHECKS = 1;')
if drop_db_after
exec('DROP DATABASE qpixel_dump;')
end
end
end

private

def initialize_defaults
[:community_users, :votes, :users].each do |table|
[:created_at, :updated_at].each do |column|
change_column_default(table, column, "'#{DEFAULT_TIMESTAMP}'")
end
end
end

def change_column_default(table, column, value)
query = "ALTER TABLE qpixel_dump.`#{table}` ALTER COLUMN `#{column}` SET DEFAULT #{value}"
exec(query)
end

def migrate_table(table, data)
columns = data['columns']
query = data['query']
select = "(SELECT #{columns.map { |c| "`#{table}`.`#{c}`" }.join(', ')} FROM #{@database}.#{table} #{query})"
full_query = "INSERT INTO qpixel_dump.`#{table}` (#{columns.map { |c| "`#{c}`" }.join(', ')}) #{select}"
logger.debug full_query
exec(full_query)
end

def build_command(cmd, *args)
"#{cmd} #{args.compact_blank.join(' ')}"
end

def ssl_state
command = mariadb? ? '--skip-ssl' : '--ssl-mode=DISABLED'
command if Rails.env.development? || Rails.env.test?
end

def mariadb?
result = `mysql --version`
result.downcase.include? 'mariadb'
end
end
14 changes: 14 additions & 0 deletions app/models/dump.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
class Dump < ApplicationRecord
has_one_attached :file

before_destroy :delete_file

scope :automatic, -> { where(automatic: true) }
scope :manual, -> { where(automatic: false) }

private

def delete_file
file.purge
end
end
72 changes: 72 additions & 0 deletions app/views/dumps/index.html.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<h1>Data Dumps</h1>

<p>
Data from all <%= SiteSetting['NetworkName'] %> communities is made available in database format for download here.
This data is a weekly export of the entire database, minus any personally identifiable information, moderation data,
and some other sensitive information such as who cast votes.
</p>

<div class="notice is-info has-color-tertiary-900">
<p>
<i class="fas fa-balance-scale"></i>
<strong>Licensing</strong>
</p>
<p>
This data is provided free of charge as part of our contribution to the commons. If you use post content, you must
still abide by the terms of the licenses set by the author of each post.
</p>
</div>

<h2>Latest data dump</h2>
<% if @latest.nil? %>
<p>No data dumps available yet. Check back next week.</p>
<% else %>
<table class="table is-with-hover is-full-width">
<tbody>
<tr>
<td><strong>Name</strong></td>
<td><%= @latest.title %></td>
</tr>
<tr>
<td><strong>Download</strong></td>
<td><%= link_to 'Download', rails_blob_path(@latest.file, disposition: 'attachment') %></td>
</tr>
<tr>
<td><strong>Created</strong></td>
<td><%= @latest.created_at.strftime('%Y-%m-%d') %></td>
</tr>
<tr>
<td class="fit-content wrap-word"><strong>Checksum</strong></td>
<td><%= @latest.checksum %></td>
</tr>
</tbody>
</table>
<% end %>

<% if @others.any? %>
<h2>Other data dumps</h2>
<table class="table is-with-hover is-full-width">
<thead>
<tr>
<th></th>
<th>Created</th>
<th>Download</th>
</tr>
</thead>
<tbody>
<% @others.each do |dump| %>
<tr>
<td><%= dump.title %></td>
<td><%= dump.created_at.strftime('%Y-%m-%d') %></td>
<td>
<% if dump.file.attached? %>
<%= link_to 'Download', rails_blob_path(@latest.file, disposition: 'attachment') %>
<% elsif dump.link.present? %>
<%= link_to 'View', dump.link %>
<% end %>
Comment thread
cellio marked this conversation as resolved.
</td>
</tr>
<% end %>
</tbody>
</table>
<% end %>
1 change: 1 addition & 0 deletions app/views/layouts/_footer.html.erb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
<li><%= link_to 'About Us', '/policy/network-faq' %></li>
<li><%= link_to 'Privacy & Safety Center', safety_center_url %></li>
<li><%= link_to 'Report harmful content', new_complaint_path %></li>
<li><%= link_to 'Data dumps', data_dumps_path %></li>
</ul>
</div>
<div class="grid--cell is-6 is-12-md is-12-sm">
Expand Down
Loading
Loading