Conversation
There was a problem hiding this comment.
-
I think this is a good idea. It's possible that this doesn't actually break any packages that depend on loo. I can check by running reverse dependency checks once the errors in the code are fixed (see individual comments).
-
The tests need to be updated, but we can wait to do that until we check that it doesn't break any reverse dependencies
-
We should put some information about this in main
loo_comparedocumentation, not only the print method -
I actually think it would be good to print these probabilities by default. What was the reason you turned it off by default? (Actually right now the printing will error by default, but once fixed it will be off by default). I don't think the printing should affect backwards compatibility (or am I missing something?). It would be adding more columns could potentially affect something, right? Or was there a different reason you turned off printing the probabilities by default?
Co-authored-by: Jonah Gabry <jgabry@gmail.com>
|
Ok, let's change them to be printed by default. I had the column name as p_worse so that it will also tell direction. Then I considered to switching to pnorm as it tells that it's based on the normal approximation, and then did get betweeb |
|
Setting |
|
I would love to get rid of |
| diag_pnorm[khat_diff > 0.5] <- paste0("khat_diff > 0.5") | ||
| } | ||
| rownames(comp) <- rnms | ||
| comp <- cbind(data.frame(elpd_diff = elpd_diff, se_diff = se_diff, |
There was a problem hiding this comment.
This also changes the returned object to a data frame when it used to be a matrix. That could potentially cause reverse dependency issues, but we'll see when I run the checks. This is a necessary change, though, since we're mixing numeric columns with text columns for the diagnostic.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #300 +/- ##
==========================================
- Coverage 92.80% 92.70% -0.11%
==========================================
Files 31 31
Lines 3004 3029 +25
==========================================
+ Hits 2788 2808 +20
- Misses 216 221 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I fixed the tests so this is passing now. I will run reverse dependency checks over the weekend hopefully. I just put one question for you @avehtari in the review comments about how to label to diagnostic. |
|
Starting a comment to keep track of reverse dependency issues. The reverse dependency checks are still running (they will take a while). I will update here as I discover new issues.
|
|
We could consider |
|
Hi, thanks for entertaining my input. Just as context: I teach epidemiologists with differing degrees of statistical background, because of this I am maybe a little bit too cautious. And I think this package should also be used be these people, think your package is fantastic. DiagnosticsFor the diagnostics, I think an option to show all warnings would be nice, even if the value of the warnings is limited. In infectious disease surveillance you want your datasets to be small, because if you have large datasets you failed in containing the disease. A small sample size warning is therefore often not helpful, since your data often still contains useful information. And this data has to be published to be used in a meta-analysis at a later time. If you consider adding an option, maybe a p_worseWhen thinking about this a little bit longer, I realized my worries with p_worse are maybe more about the interpretation of the probability. While I think My argument with the AUC was way simpler. When looking at the examples, I was unsure how to interpret
The main tradeoff is often elpd and parsimony. And since p_worse is often above 0.9, I would want p_worse to be close to 1 to consider a more complex model superior (everything else being equal). This is what is unclear to me. Did I get the right idea? Other thought in this regard: I hope you find my thoughts helpful, and sry for the rambling. Greetings, |
|
@ndevln , thanks for the additional comments! I've been busy with a grant proposal, but now the deadline has passed, I will soon modify one of the case studies to provide more information about the diagnostics and interpretation of the results. I'll comment here and tag you when it's ready, and we'll see if it will help to clarify some things and your feedback will be useful for us to see whether we are able to explain things better. |
|
@ndevln, and don't be sorry, it's great that you rambled and wrote what you think even when being uncertain (not all have the courage) as that also helps us to see where we need to improve documentation to reduce such uncertainty |
|
We discussed this PR in our group meeting and I think at this point we should drop |
|
I have updated a case study using this PR https://users.aalto.fi/~ave/casestudies/LOO_uncertainty/loo_uncertainty_diag.html There are now more explanations on how to interpret the diagnostics |
|
After a discussion with @avehtari, we agreed on the following next steps for this PR:
|
|
I added additional sections in the |
avehtari
left a comment
There was a problem hiding this comment.
Excellent loo-glossary text. Just one comment about Bengio and Grandvalet reference
|
@florence-bockting @avehtari Thank you in advance for doing the new round of reverse dependency checks. |
|
Re-running reverse dependency checks. In the following a list of packages where issues occur:
@avehtari: I have now worked out fixes for all affected libraries which are currently still local. When I understand the process correctly, then we can now merge this PR into master and I will make the PRs in the respective packages public. Or some other thoughts on this? |
Description
The corresponding issue #299
In this PR we are making changes to the output structure returned by
loo_compare(). The main changes are a transition from a matrix to a data frame and the addition of several new columns. Here is an example of the updated output structure:More context on the motivation behind these changes can be found in the updated case study "Uncertainty in Bayesian LOO-CV Model Comparison".
TODOs