Training on Paratext projects whose USFM structure and content do not match their versification results in misaligned training data. We've already observed that many projects are translating across versifications: #953. These projects may be particularly susceptible to this problem. We need to experiment to determine how much having some misaligned training data (at the verse level) affects translation quality in order to prioritize solutions for this problem.
Training on Paratext projects whose USFM structure and content do not match their versification results in misaligned training data. We've already observed that many projects are translating across versifications: #953. These projects may be particularly susceptible to this problem. We need to experiment to determine how much having some misaligned training data (at the verse level) affects translation quality in order to prioritize solutions for this problem.