mirror of
https://github.com/mysql/mysql-server.git
synced 2026-06-03 21:55:37 +00:00
No description
- C++ 74.7%
- C 10.8%
- Java 3.6%
- PHP 2.4%
- C# 2.2%
- Other 5.9%
Symptom:
A deadlock between thread performing a typical dive into B-tree, and a
thread trying to physically remove a record could happen, in case the
later had to perform merge with a sibling requiring deregistering one of
the children from their parents, which in turn required X-latching the
parent while holding X-latch on a child.
Analysis:
The most typical latching order in the B-tree (simplifying a lot) is
top-down. But, sometimes we have to navigate upward, as is the case when
a page has too little records and have to be merged with a sibling,
which requires adjusting the parent. One way to do it safely is to take
X-latch on the whole dict_index_t::lock, but typically we only take
SX-latch, and X-latch the internal nodes on the way down, if we predict
in btr_cur_will_modify_tree(..) we might need to bubble up later.
One of the heuristics checks if removal of one record could cause the
usage of space in the page to drop below BTR_CUR_PAGE_COMPRESS_LIMIT.
For this to work correctly, it's crucial to correctly estimate the
maximum size of a record (not necessarily the one we are descending to,
as in case of merge we might remove the record for the sibling).
This was done by dict_index_node_ptr_max_size(..) which was recently
fixed to use get_field_max_size(..) as part of fixing
Bug#25579578 "INNODB: FAILING ASSERTION: !FIELD->PREFIX_LEN..."
Alas, the get_field_max_size(..) since its introduction in 2008 had a
bug: it interpreted field->prefix_len == 0 as a reason to consider the
field as stored off-page if the underlying column was longer than 40.
Further, it assumed that this implies the field never takes more than 40
bytes. This is perhaps a separate bug to fix later, as a value longer
than 40 but not causing record to overflow can definitely be stored
inline.
What matters for the current bug, though, is that if a column is one of
the first dict_index_get_n_unique_in_tree(index) columns then it is
always stored inline to support binary search during navigation.
This condition is not checked in dict_index_add_col(..) because it is
often called before the index->n_uniq is even properly set, as the index
is still under construction. Thus it often blindly sets fixed_len to
0 whenever the field seems longer than DICT_MAX_FIXED_COL_LEN=768.
The way this is relevant is that in case the key was defined to be
shorter than 768 bytes, the bug didn't manifest as fixed_len was set
correctly and get_field_max_size(..) used a quick-path.
Solution:
This patch adds the exploratory assert to btr_cur_search_to_nth_level()
that node_ptr_max_size computed by dict_index_node_ptr_max_size(..)
should at least equal the length of the record we found by bisect.
With this assert in place, this patch fixes the estimation in several
ways:
1. The field_max_size > BTR_EXTERN_LOCAL_STORED_MAX_SIZE case is
narrowed down to columns which aren't part of the prefix used for
navigatition, i.e. first dict_index_get_n_unique_in_tree() fields.
2. The case of spatial index on GEOMETRY now correctly estimates the
size to be DATA_MBR_LEN + 1 extra byte (for length). Previously,
because (for historical reasons) dict_index_add_col() sets the
field->fixed_len=0 in this case, we have wrongly estimated the max
field length to be ULINT_MAX + 2 which caused an overflow,
equivalent to just 1 byte.
3. The get_field_max_size(..) now returns field->fixed_len instead of
instead of col->get_fixed_size(..) on the fast path. This makes the
estimate smaller than before, but closer to the truth.
These means that some CREATE TABLE and ALTER TABLE statements which used
to be (erroneusly) accepted by InnoDB now longer are. The new code is in
line with the old public documentation, and differs in behaviour only
for tables which violated the publicly documented limits.
Change-Id: Ib0b7bd4f358471a21ff8ed2cc8488f68ee9c048e
|
||
|---|---|---|
| client | ||
| cmake | ||
| components | ||
| Docs | ||
| doxygen_resources | ||
| extra | ||
| include | ||
| libbinlogevents | ||
| libbinlogstandalone | ||
| libchangestreams | ||
| libmysql | ||
| libservices | ||
| man | ||
| mysql-test | ||
| mysys | ||
| packaging | ||
| plugin | ||
| router | ||
| scripts | ||
| share | ||
| sql | ||
| sql-common | ||
| storage | ||
| strings | ||
| support-files | ||
| testclients | ||
| unittest | ||
| utilities | ||
| vio | ||
| .clang-format | ||
| .clang-tidy | ||
| .gitattributes | ||
| .gitconfig | ||
| .gitignore | ||
| CMakeLists.txt | ||
| config.h.cmake | ||
| configure.cmake | ||
| CONTRIBUTING.md | ||
| Doxyfile-ignored | ||
| Doxyfile.in | ||
| INSTALL | ||
| LICENSE | ||
| MYSQL_VERSION | ||
| README | ||
| run_doxygen.cmake | ||
| SECURITY.md | ||
Copyright (c) 2000, 2026, Oracle and/or its affiliates. This is a release of MySQL, an SQL database server. License information can be found in the LICENSE file. In test packages where this file is renamed README-test, the license file is renamed LICENSE-test. This distribution may include materials developed by third parties. For license and attribution notices for these materials, please refer to the LICENSE file. For further information on MySQL or additional documentation, visit http://dev.mysql.com/doc/ For additional downloads and the source of MySQL, visit http://dev.mysql.com/downloads/ MySQL is brought to you by the MySQL team at Oracle.