-
-
Notifications
You must be signed in to change notification settings - Fork 71
Add data dumps #2047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add data dumps #2047
Changes from all commits
Commits
Show all changes
75 commits
Select commit
Hold shift + click to select a range
58b2ea5
initial boilerplate for data dumps (mirror Metasmoke)
Oaphi 88fd12d
dups should have a required title & optional comment
Oaphi 5997829
initial boilerplate for data dump model
Oaphi 8fca596
first data dump fixtures (fixing tests)
Oaphi efa5fa5
initial boilerplate for data dumps (mirror Metasmoke)
Oaphi 540e07c
dups should have a required title & optional comment
Oaphi dec449a
initial boilerplate for data dump model
Oaphi bb6d5f8
first data dump fixtures (fixing tests)
Oaphi 5aba30c
Create scaffolding for pulling data
ArtOfCode- 96a1b4d
Skeleton processing
ArtOfCode- b35e5d1
Merge branch 'develop' into 0valt/1918/data-dump
ArtOfCode- 47be0f2
Successfully copying data to dump database
ArtOfCode- 25aed98
Export data and create Dump record
ArtOfCode- 46b2430
Dump management
ArtOfCode- 9052b90
Rubocop
ArtOfCode- a2be721
Add data page
ArtOfCode- 2bece2e
Merge branch '0valt/1918/data-dump' into art/data-dumps
ArtOfCode- d15877c
Add link to footer
ArtOfCode- a5f8276
Test data dump job
ArtOfCode- 73aa095
Forgot the host
ArtOfCode- bf8b7a7
Shooting in the dark now
ArtOfCode- eb35540
Not sure where you're getting 2 jobs from, minitest
ArtOfCode- 4724e15
Look as long as it's _doing_ it...
ArtOfCode- 1740d3c
If this worked the whole time I'm going to be annoyed
ArtOfCode- ff95085
Thanks rubocop
ArtOfCode- dd1df8e
Access denied error handling
ArtOfCode- f64ea30
Handle no-dumps-yet case
ArtOfCode- a7318dc
Apply suggested patch
Oaphi 10c7ec5
Correct error class
ArtOfCode- 05bc65d
Add port and SSL state to commands
ArtOfCode- 3a2d349
Apparently that option's deprecated
ArtOfCode- 5fe6017
Missed a flag
ArtOfCode- 8fcb578
Fix 'unknown variable' error
Oaphi 7f7b8cb
Rubocop cleanup
Oaphi 443eca5
Update ci-cd's workflow MySQL image to sync with our Docker setup
Oaphi f309d1f
Remove outdated build comment from Dockerfile.db
Oaphi 2c1d070
Always report mysql & mysqldump versions during ci-cd runs
Oaphi 7b8f9bf
Synchornize mysql & mysqldump client versions
Oaphi e807b13
Reformat table and add checksum
ArtOfCode- e62d1d5
Rubocop fixes
Oaphi 2088c7f
Move 'execute' wrapper to ApplicationJob (QoL)
Oaphi 4015f06
Remove '$VERBOSE' silencing as it's no longer needed
Oaphi 4c9389e
Switch data dumps job from perform_later to perform_now (queue issues)
Oaphi 43af8a3
Trim permitted columns from review
ArtOfCode- 677ce95
Add caution to exec helper
ArtOfCode- 34ebc36
Helps if you name the columns right
ArtOfCode- ee45a34
Missed an alias
ArtOfCode- f9704c1
More Problems (tm)
ArtOfCode- e0cf419
Correct MariaDB check
ArtOfCode- 0cdb646
Exclude filter names
ArtOfCode- b4ac29d
Add default value for filter names
ArtOfCode- 1b45890
Merge branch 'develop' into art/data-dumps
ArtOfCode- 28ce0f4
Drop post_id from flags
ArtOfCode- f6758eb
Add schema change checker
ArtOfCode- b0831d8
First attempt at a workflow
ArtOfCode- aa6bfe3
Helps if you checkout the repo
ArtOfCode- 3296db0
Bundle it?
ArtOfCode- 218faf9
Could've sworn I added that
ArtOfCode- 62cbd4f
Try refs instead
ArtOfCode- b8be1ee
Disambiguate
ArtOfCode- eb71790
Maybe it doesn't exist
ArtOfCode- 5c76d74
I'm guessing at this point
ArtOfCode- c69acc7
Specify language
ArtOfCode- 7dcbc30
Remove user create/update timestamps
ArtOfCode- 3d2a29d
actions/checkout should be enough to get repo contents
Oaphi aea3617
Let's not tie to GitHub-specific naming conventions
Oaphi 8d45d08
Commit schema changes
Oaphi e38549d
Only install octokit in development & test - production doesn't need …
Oaphi 6f5f956
Fix registrations controller test due to changed created_at default f…
Oaphi 55cab38
Exclude MySQL Docker data files from Rubocop inspection
Oaphi 112c696
Make add_user_creation_defaults migration reversible
Oaphi b5e38ce
Make add_more_default_values migration reversible
Oaphi d561a80
Only apply timestamp defaults to the data_dump database
Oaphi 3ac248e
Ensure registrations controller test checks for timestamp defaults
Oaphi cfde135
Rubocop fixes
Oaphi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| class DumpsController < ApplicationController | ||
| before_action :authenticate_user! | ||
|
|
||
| def index | ||
| @latest = Dump.automatic.last | ||
| @others = Dump.manual | ||
| end | ||
| end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| module DumpsHelper | ||
| end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,117 @@ | ||
| class DataDumpJob < ApplicationJob | ||
| queue_as :default | ||
|
|
||
| DEFAULT_TIMESTAMP = '1970-01-01 00:00:00'.freeze | ||
|
|
||
| def perform(drop_db_after: true) | ||
| permitted = YAML.safe_load_file(Rails.root.join('db/scripts/dump_permitted_columns.yml')) | ||
| logger.info "Found #{permitted&.size} tables to dump." | ||
|
|
||
| begin | ||
| exec('SET FOREIGN_KEY_CHECKS = 0;') | ||
| exec('DROP DATABASE IF EXISTS qpixel_dump;') | ||
| exec('CREATE DATABASE qpixel_dump;') | ||
|
|
||
| @db_creds = Rails.configuration.database_configuration[Rails.env] | ||
| @username = @db_creds['username'] | ||
| @password = @db_creds['password'] | ||
| @database = @db_creds['database'] | ||
| @port = @db_creds['port'] | ||
| @host = @db_creds['host'] | ||
|
|
||
| mysqldump_command = build_command('mysqldump', '-h', @host, '-u', @username, "-p#{@password}", '-d', @database, | ||
| '--no-tablespaces', "--port=#{@port}", ssl_state) | ||
| mysql_command = build_command('mysql', '-h', @host, '-u', @username, "-p#{@password}", "--port=#{@port}", | ||
| '-D', 'qpixel_dump', ssl_state) | ||
| logger.debug 'Running system command:' | ||
| logger.debug "#{mysqldump_command} | #{mysql_command}" | ||
| copy_success = system("#{mysqldump_command} | #{mysql_command}") | ||
|
|
||
| unless copy_success | ||
| logger.fatal "Couldn't replicate database: nonzero exit code" | ||
| return | ||
| end | ||
|
|
||
| logger.info 'Copied database structure.' | ||
|
|
||
| initialize_defaults | ||
|
|
||
| logger.info 'Initialized defaults.' | ||
|
|
||
| permitted&.each do |table, data| | ||
| migrate_table(table, data) | ||
| end | ||
|
|
||
| logger.info 'Migrated data.' | ||
|
|
||
| file_path = Rails.root.join('tmp/qpixel_export.sql') | ||
| export_cmd = build_command('mysqldump', '-h', @host, '-u', @username, "-p#{@password}", "--port=#{@port}", | ||
| 'qpixel_dump', '--no-tablespaces', ssl_state, '>', file_path) | ||
| logger.debug 'Running system command:' | ||
| logger.debug export_cmd | ||
| export_success = system(export_cmd) | ||
|
|
||
| unless export_success | ||
| logger.fatal "Couldn't export database: nonzero exit code" | ||
| return | ||
| end | ||
|
|
||
| logger.info 'Exported database.' | ||
|
|
||
| raw_checksum = `sha256sum #{file_path}`.split[0] | ||
| logger.debug "Export checksum: #{raw_checksum}" | ||
|
|
||
| dump = Dump.create(title: "Data Dump #{Time.now.strftime('%Y-%m-%d')}", | ||
| comment: "Automatically generated data dump as of #{Time.now.strftime('%Y-%m-%d %H:%M:%S')}.", | ||
| file: File.open(file_path), | ||
| automatic: true, | ||
| checksum: "SHA256:#{raw_checksum.chars.in_groups_of(8).map(&:join).join('-')}") | ||
| Dump.where(automatic: true).where.not(id: dump.id).destroy_all | ||
| rescue ActiveRecord::ConnectionFailed | ||
| logger.fatal "Couldn't connect to database. Have you run `GRANT ALL ON qpixel_dump.*` for your DB user?" | ||
| ensure | ||
| exec('SET FOREIGN_KEY_CHECKS = 1;') | ||
| if drop_db_after | ||
| exec('DROP DATABASE qpixel_dump;') | ||
| end | ||
| end | ||
| end | ||
|
|
||
| private | ||
|
|
||
| def initialize_defaults | ||
| [:community_users, :votes, :users].each do |table| | ||
| [:created_at, :updated_at].each do |column| | ||
| change_column_default(table, column, "'#{DEFAULT_TIMESTAMP}'") | ||
| end | ||
| end | ||
| end | ||
|
|
||
| def change_column_default(table, column, value) | ||
| query = "ALTER TABLE qpixel_dump.`#{table}` ALTER COLUMN `#{column}` SET DEFAULT #{value}" | ||
| exec(query) | ||
| end | ||
|
|
||
| def migrate_table(table, data) | ||
| columns = data['columns'] | ||
| query = data['query'] | ||
| select = "(SELECT #{columns.map { |c| "`#{table}`.`#{c}`" }.join(', ')} FROM #{@database}.#{table} #{query})" | ||
| full_query = "INSERT INTO qpixel_dump.`#{table}` (#{columns.map { |c| "`#{c}`" }.join(', ')}) #{select}" | ||
| logger.debug full_query | ||
| exec(full_query) | ||
| end | ||
|
|
||
| def build_command(cmd, *args) | ||
| "#{cmd} #{args.compact_blank.join(' ')}" | ||
| end | ||
|
|
||
| def ssl_state | ||
| command = mariadb? ? '--skip-ssl' : '--ssl-mode=DISABLED' | ||
| command if Rails.env.development? || Rails.env.test? | ||
| end | ||
|
|
||
| def mariadb? | ||
| result = `mysql --version` | ||
| result.downcase.include? 'mariadb' | ||
| end | ||
| end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| class Dump < ApplicationRecord | ||
| has_one_attached :file | ||
|
|
||
| before_destroy :delete_file | ||
|
|
||
| scope :automatic, -> { where(automatic: true) } | ||
| scope :manual, -> { where(automatic: false) } | ||
|
|
||
| private | ||
|
|
||
| def delete_file | ||
| file.purge | ||
| end | ||
| end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| <h1>Data Dumps</h1> | ||
|
|
||
| <p> | ||
| Data from all <%= SiteSetting['NetworkName'] %> communities is made available in database format for download here. | ||
| This data is a weekly export of the entire database, minus any personally identifiable information, moderation data, | ||
| and some other sensitive information such as who cast votes. | ||
| </p> | ||
|
|
||
| <div class="notice is-info has-color-tertiary-900"> | ||
| <p> | ||
| <i class="fas fa-balance-scale"></i> | ||
| <strong>Licensing</strong> | ||
| </p> | ||
| <p> | ||
| This data is provided free of charge as part of our contribution to the commons. If you use post content, you must | ||
| still abide by the terms of the licenses set by the author of each post. | ||
| </p> | ||
| </div> | ||
|
|
||
| <h2>Latest data dump</h2> | ||
| <% if @latest.nil? %> | ||
| <p>No data dumps available yet. Check back next week.</p> | ||
| <% else %> | ||
| <table class="table is-with-hover is-full-width"> | ||
| <tbody> | ||
| <tr> | ||
| <td><strong>Name</strong></td> | ||
| <td><%= @latest.title %></td> | ||
| </tr> | ||
| <tr> | ||
| <td><strong>Download</strong></td> | ||
| <td><%= link_to 'Download', rails_blob_path(@latest.file, disposition: 'attachment') %></td> | ||
| </tr> | ||
| <tr> | ||
| <td><strong>Created</strong></td> | ||
| <td><%= @latest.created_at.strftime('%Y-%m-%d') %></td> | ||
| </tr> | ||
| <tr> | ||
| <td class="fit-content wrap-word"><strong>Checksum</strong></td> | ||
| <td><%= @latest.checksum %></td> | ||
| </tr> | ||
| </tbody> | ||
| </table> | ||
| <% end %> | ||
|
|
||
| <% if @others.any? %> | ||
| <h2>Other data dumps</h2> | ||
| <table class="table is-with-hover is-full-width"> | ||
| <thead> | ||
| <tr> | ||
| <th></th> | ||
| <th>Created</th> | ||
| <th>Download</th> | ||
| </tr> | ||
| </thead> | ||
| <tbody> | ||
| <% @others.each do |dump| %> | ||
| <tr> | ||
| <td><%= dump.title %></td> | ||
| <td><%= dump.created_at.strftime('%Y-%m-%d') %></td> | ||
| <td> | ||
| <% if dump.file.attached? %> | ||
| <%= link_to 'Download', rails_blob_path(@latest.file, disposition: 'attachment') %> | ||
| <% elsif dump.link.present? %> | ||
| <%= link_to 'View', dump.link %> | ||
| <% end %> | ||
| </td> | ||
| </tr> | ||
| <% end %> | ||
| </tbody> | ||
| </table> | ||
| <% end %> | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.