Skip to content

reject non-ascii digits when lexing xsd integer types#32

Open
aizu-m wants to merge 2 commits into
apache:trunkfrom
aizu-m:ascii-digits-xsd-int
Open

reject non-ascii digits when lexing xsd integer types#32
aizu-m wants to merge 2 commits into
apache:trunkfrom
aizu-m:ascii-digits-xsd-int

Conversation

@aizu-m
Copy link
Copy Markdown

@aizu-m aizu-m commented May 31, 2026

Noticed while feeding random unicode into lexInt: parseIntXsdNumber uses Character.digit(c, 10), which maps any unicode decimal digit (fullwidth U+FF11, arabic-indic U+0661, devanagari, ...) to a value, so "123" parses to 123 as a valid xsd:int/short/byte. Those code points are outside the xsd lexical space and lexLong/lexInteger already reject them via Long.parseLong/BigInteger. Restrict the digit check to ascii 0-9.

@pjfanning
Copy link
Copy Markdown
Member

can you add tests?

@aizu-m
Copy link
Copy Markdown
Author

aizu-m commented Jun 1, 2026

Added XsTypeConverterTest covering lexInt/lexShort/lexByte. It checks ascii still parses and that fullwidth, arabic-indic and devanagari digits now throw NumberFormatException.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants