Skip to content

Discard Microsoft.Data.SqlCLient.SqlMetaData.xml in favor of storing metadata internally#4362

Open
tetolv wants to merge 8 commits into
dotnet:mainfrom
tetolv:getschema-rewrite
Open

Discard Microsoft.Data.SqlCLient.SqlMetaData.xml in favor of storing metadata internally#4362
tetolv wants to merge 8 commits into
dotnet:mainfrom
tetolv:getschema-rewrite

Conversation

@tetolv

@tetolv tetolv commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Description

Continuation of @paulmedynski effort on #3372 to pull inside one by one all metadata collections from Microsoft.Data.SqlCLient.SqlMetaData.xml. Here it led me to complete rewrite of the SqlMetaDataFactory. Main reasons were:

  • Replacing DataTable as cached representation of metadata, with dedicated classes, because they require less checks for data consistency, they should consume less memory and we don't require DataTable querying capabilities anyway
  • More streamlined design. Different metadata type collection are encapsulated in their own classes and it's possible to more easily extend it with new kinds of metadata, although I don't think is it's necessary.
  • All the actual static metadata (in opposite to metadata read from server system views) is kept in a single structure s_metaDataCollection , which in most cases should be the only place, where new metadata additions and tweaks should be performed, without need to dig deep into the code, which handles this data structure. There I have diverted from the code style guidelines to use longer lines, because that structure by nature has rather tabular representation and it would never fit into 100 characters, but I think that maintaining it's tabular representation is more important to code readability.

Fixes

  • sys.table_types, used in query for StructuredTypeMembers metadata collection was supported since SQL2008(v10.0), not since SQL2005(v9.0). I have updated minimumVersion accordingly.

Localization

MDF_DataTableDoesNotExist need to be rephrased and translated. Original text was The collection '{0}' is missing from the metadata XML., but obviously it is not valid anymore.

Testing

I have compared metadata returned from rewritten version of SqlMetaDataFactory with metadata returned from MDS v7.0.1. I have attached comparison results in SchemaComparison.zip.
SchemaComparison.zip It has the following structure:

---
config:
    treeView:
        rowIndent: 10
        lineThickness: 1
    themeVariables:
        treeView:
            labelFontSize: '15px'
            labelColor: '#808080'
            lineColor: '#606060'
---
treeView-beta
    "09.00.5000  (MSSQL 2005)"
        "legacy"
             "multiple .csv files for each collection."
        "new"
             "multiple .csv files for each collection"
    "12.00.9114 (Azure database)"
        "legacy"
             "multiple .csv files for each collection"
        "new"
             "multiple .csv files for each collection"
    "17.00.4015 (MSSQL 2025)"
        "legacy"
             "multiple .csv files for each collection"
        "new"
             "multiple .csv files for each collection"
Loading

Here's a list of all notable differences:

  • In ReservedWords collection in all versions (09.00.5000, 12.00.9114, 17.00.4015) all elements are rearranged, but no new items appears, or were removed. Rearrangement is the same as in #PR4328, because I took a list of reserved words from there.
  • In Restrictions collection in all versions, there exists a column RestrictionDefault. But in https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/common-schema-collections it is marked as Ignored. Previously for all the metadata collections, which were based on query to server system views, this column contained name of the column in system view, which was used by the restriction, but it was not always the case. For now I have decided to do not return anything in RestrictionDefault column, but if it's deemed important, then I can return there the same values as before.
  • In Restrictions collection, filtering by server engine version wasn't working correctly, so on v09.00.5000 restrictions for collections AllColumns, ColumnSetColumns, StructuredTypeMembers were returned, although collections itself are not supported on that engine version.
  • In MetaDataCollections there are two new collections TVPs and UDTs. Those are used by DataTypes collection, where in addition to static data types, there are also returned table-valued parameter types, and user-defined types, that are read from the server system views. Previously there were separate methods for querying each of them and appending to the DataTypes collection, but now they are handled by separate SqlCommandCollection instances added to the s_metaDataCollection. As result are also returned in MetaDataCollections. Those two could either remain as undocumented collections, or I can somehow mark them as internal, so they wouldn't be returned in MetaDataCollections.

No automatic tests were added, because affected functionality is completely private.

Will be happy to hear any feedback.

…d of Microsoft.Data.SqlClient.SqlMetaData.xml
@tetolv tetolv marked this pull request as ready for review June 15, 2026 12:49
@tetolv tetolv requested a review from a team as a code owner June 15, 2026 12:49

@edwardneal edwardneal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. I've made a first-pass review for discussion.

Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlConnectionFactory.cs Outdated
Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlMetaDataFactory.cs Outdated
Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlMetaDataFactory.cs Outdated
"DUMMY"
]),
new SqlCommandCollection("Users", 1, 1, null, null, "select uid, name as user_name, createdate, updatedate from sysusers where (name = @Name or (@Name is null))",
[new Restriction(1, "User_Name", "@Name")]),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to specify the sequence number here? I can't see a reason why we'd ever want to have duplicates.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, I understand your question. The reason I left explicit restriction numbers there, is that according to documentation https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/common-schema-collections#restrictions , column RestrictionNumber is required and for compatibility with the original ordering of the restrictions, I made those numbers explicit. Of course they could be removed and resulting table would rely on the implicit restrictions order in array for each sql-based collection.

private readonly string _populationString;
public readonly Restriction[] RestrictionParams;

public SqlCommandCollection(string collectionName, int numberOfRestrictions, int numberOfIdentifierParts,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no circumstances where numberOfRestrictions is not restrictions.Length then we don't need to specify it explicitly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that there are two brotherly columns NumberOfRestrictions and NumberOfIdentifierParts, which are explicitly required by documentation https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/common-schema-collections#metadatacollections and they have a direct dependency: NumberOfIdentifierParts could not be greater than NumberOfRestrictions. If we would separate them, by replacing explicit definition of NumberOfRestrictions with a length of restriction array, then this dependency would not be so obvious for anyone adding new query-based collections.
Also I wanted for metadata definition in code, to resemble as close as possible, metadata returned to the caller.

// 7. Variable-length string and binary types: varchar, nvarchar, varbinary.
// 8. Miscellaneous fixed-length types: uniqueidentifier, sql_variant, timestamp

AddFixedNumericType(SqlDbType.TinyInt, isBestMatch: true);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've lost a lot of metadata by replacing these with a single list; the goal of doing this was to make it difficult for DataTypes to drift from SqlClient's internal MetaType definitions.

Can we replace the list in SqlMetaDataFactory.cs with multiple constructors, and derive data from MetaType definitions in the same way?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought about unification with MetaData definition, and even initially had a second constructor for SqlMetaDataFactory.TypeMetaData which would use them, but very few fields would match directly one-to-one between the two definitions for each type, so I decided to go on with a flat table, where all type metadata are defined explicitly. I bear in mind, that in #3372 there was an additional proposal to make general unification of all datatype info in one place, but right now I thought it to be a task for later.
Right now I have settled on the intermediate solution, I initialize all types in the private static readonly TypeMetaData[] s_types with a constructor, that takes SqlDbType and derive from it all the type values, that have a direct one-to-one relation to database schema datatype values,

        public TypeMetaData(SqlDbType dbType, ...)
        {
            // Shared type properties
            MetaType metaType = MetaType.GetMetaTypeFromSqlDbType(dbType, isMultiValued: false);
            TypeName = alias ?? metaType.TypeName;
            ProviderDbType = (int)metaType.SqlDbType;
            DataType = metaType.ClassType.FullName!;

            // Schema specific type properties
            ...

, and the rest are defined explicitly in s_types as before.
The reason I haven't taken many-constructor approach is, because there were constructors, used by several datatypes, but inside a constructor there were conditional logic based on particular datatype being constructed. Not only this logic was hard to follow to create a complete mental picture of what properties of constructed datatype would be, but also adding a new data type would anyway require either adding a new dedicated constructor, or additionally enhance inner logic of one of the existing constructors.

When datatype metadata unification will eventually be made, then I expect that datatype class will have all the fields required by all datatype representations that we have, and optionally perhaps a set methods: ToMetaType, ToSchemaRow, etc..., to produce particular type representations where needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have additionally updated my version, so now almost everything about data types is derived from Microsoft.Data.SqlClient.MetaType, except for MaximumScale property for date, datetime2, datetimeoffset.

Comment on lines 233 to 237
@@ -237,14 +230,8 @@
<EmbeddedResource Include="Resources\ILLink.Substitutions.xml"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file also needs to be deleted.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really mean ILLink.Substitutions.xml? This file is not related to my changes is any way. Or you mean Microsoft.Data.SqlClient.SqlMetaData.xml should be physically removed from repo, not just it's inclusion in project file.

Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlMetaDataFactory.cs Outdated
@tetolv

tetolv commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

I have several more questions:

  • should vector datatype be added now, or later?
  • I'm usually bad at naming things, nor I understand terminology used in MDS perfectly, so if any of my classes/methods/properties sounds unclear, or confusing, feel free to suggest renames for them.
  • There are bunch of new small classes/interfaces stored internally under SqlMetaDataFactory, they all seems quite messy. How would it better to reorganize them?
  • One error text MDF_DataTableDoesNotExist (The collection '{0}' is missing from the metadata XML.) should be renamed and translated. How to do that?
  • Which automatic tests might be needed?

}
};

DataRow row = table.NewRow();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like row is unused.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Somehow Visual Studio haven't highlighted that unused code to me, so I haven't noticed.

@benrr101

Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 3 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To triage

Development

Successfully merging this pull request may close these issues.

4 participants