- This is an English translation of an article written in Japanese at DeepL.
FIFA has several new initiatives for the 2022 World Cup. One of them is to power up the data that will be acquired and released. These releases are being made from the FIFA Training Centre website, which includes a video to give you an idea of what the data is about.
After the 2022 Men's World Cup, the U-20 Men's World Cup, Women's World Cup, and U-17 Men's World Cup will be held in 2023, and the same data as for the Men's World Cup has been released (the Club World Cup has also been released, although not covered in this issue due to the small number of games). The women's U-20 and U-17 World Cups were to be held in 2022, but since they were held before the men's World Cup, no data was available for them, so they are not included.
Assumption
International tournaments such as the World Cup, EURO, and UEFA Champions League can be said to have some impact on the soccer world as a whole, but they are also difficult to analyze in terms of data.These tournaments are played in the format of group stages plus knockout stages, but there are differences depending on the group combinations, the way the teams face the third group section changes depending on the points won, extra time is included in the knockout stages, and of course the winners remain, so team totals cannot be uniformly calculated. etc., etc. If you are concerned about this, you will not be able to do anything, so we will proceed with caution in our conclusions.
The host environment is important for any analysis, so let's take a quick look at it. The Men's World Cup will be held in Qatar.The U-20 Men's World Cup will be held in Argentina from May to June. The U-20 Men's World Cup will be held in May-June in Argentina, which is in the southern hemisphere, so it will be in the fall, but the conditions should not be a problem. However, the stadiums were limited due to the sudden decision to host the tournament, and the pitch conditions were getting worse with each match. The Women's World Cup will be held in Australia and New Zealand in winter. The U-17 Men's World Cup was held in Indonesia in November/December. The U-17 Men's World Cup was held in Indonesia in November/December, and was probably the toughest of the four tournaments because of the hot weather and many matches were interrupted by squalls.
About the Data
The data source is a PDF from the FIFA Training Centre's Post Match Summary Reports.
Men's World Cup : https://www.fifatrainingcentre.com/en/fwc2022/post-match-summaries/post-match-summary-reports.php
U-20 Men's World Cup : https://www.fifatrainingcentre.com/en/game/tournaments/2023-u20-fwc/post-match-summary-reports.php
Women's World Cup : https://www.fifatrainingcentre.com/en/game/tournaments/fifa-womens-world-cup/2023/match-report-hub/post-match-summary-reports.php
U-17 Men's World Cup : https://www.fifatrainingcentre.com/en/game/tournaments/fu17wc/2023/post-match-summary-reports.php
There were a few cases that were corrected at a later date, so the data currently posted may differ from the data we have on hand. Also, Uzbekistan vs. Canada in the U-17 World Cup was excluded because the PDF was inaccessible. As the figures are basically on a per-match basis, matches with extra time are also excluded; the U-17 World Cup may be slightly affected because there is no extra time even in the knockout stages. Also, in this case, the match result category is considered a draw.
Match classification using Phases of Play
In this article, I would like to compare physical data, but first I would like to classify the games. However, the purpose of soccer is to put the ball in the goal and not to let it go in, and to achieve that, how to control the ball and how to move with the upper limit of physical strength is the main objective of the game. (Of course, in extreme cases, if there were many players who could move at super speed and control the ball, they would be strong, but...). There is a limit to what we can do with just the published reports, but at least we can categorize "what kind of match was it", and I would like to do that and then compare the data.
The Phases of Play data is used to classify matches, one of FIFA's newly acquired and publicly available data, which lists the content of ball possession and ball possession by percentage.
An instructional video can be found on the following page of the FIFA Training Centre.
https://www.fifatrainingcentre.com/en/fwc2022/efi-metrics/efi-metric--phases-of-play.php
The formula for calculating this % is unclear, and even if it is added within retention and within retained, the former may exceed 100% and the latter may fall short of 100%. I will present the data later, but to begin with, a high retention rate increases the % of phases within retention, so it is possible that they are similar to the time data. The trend is almost consistent with the image we saw of the match, so it is not impossible to use the data.
We organized these figures for each team per match, and excluded ropeless and recoveries with no significant change in the percentage values (the same applies to counterattacks, but we left this one out in case you are wondering). After also calculating the data for each of these types of exposure, we classified them into 10 categories using the simple k-means method. In fact, if we were to classify the contents of a soccer game, we would probably have to add time series and point differences to the 10 categories, but we will proceed with the 10 categories that can be classified in a somewhat close number of games and that will not make the article too long.
Clustering resulted in the following.
Scatterplot of Retention OR Retained and Play Phase. Colors are cluster IDs.
The chart above provides a quick summary of the characteristics of each cluster ID.
While most data providers calculate ball retention as own team: opponent team, FIFA now divides the ball retention into three parts: own team, loose balls, and opponent team. Therefore, it is important to note that the retention rate figures will not be the same as in the past.
What can be read from the classification by tournament,
- Men's World Cup has many matches with close retentions, but no loose developments like ID=9
- Women's World Cup is the opposite, with either this development or a one-sided holding development
- In the U-20 Men's World Cup, the loose games are close in terms of retention, but there are no loose games.
- The U-17 Men's World Cup is more divided in terms of retention, but there are some loose games (a bit similar to the Women's World Cup).
The U-17s may have been affected by the reduced intensity due to the hot weather.
The classification of teams from major countries shows that even the strongest countries are more often than not not playing the same games within the same tournament. In this case, England in the U-17 World Cup and the U.S. in the U-20 World Cup have recorded four games in the same cluster, but the others have at most three games. Naturally, the team needs to be able to win no matter how the match unfolds.
The final results were categorized by "teams in the top 8 or higher" and "other eliminated teams," and wins and losses were calculated by cluster, with the match range set to round 16 (if the match went to a penalty shootout in round 16, it was calculated as a draw).

In the Men's World Cup, the difference in strength among participating teams was not that great, but the U-20, U-17, and Women's competitions are still large, and there are cases where teams lose by overwhelming margins. Those teams are in the lower end of the retention rate, so their results in this area are likely to be worse. Personally, I was concerned about the cluster with the closest retention rate in this score sheet. For ID=9, where many loose balls destabilize the game, the results are a little closer, but for the other clusters, the differences are large. Although it is an opinion that one does not need to look at the data, it seems that the key to advancing to the top is whether or not one can win when the game is close both offensively and defensively.

I've put out the cluster division by match order, with 1-3 being the group stages, 4 onwards being the knockout stages, and 7 being a mixture of finals and 3rd place playoffs. I don't have a clear-cut rule that says "this round goes like this! is not clear, but the change in ID=9 is easy to understand. The first two matches are more, then it settles down, and then it increases in the last match. By the way, there were four ID=9 matches in both the final and the third-place match. Since the desire to win is the strongest in these games, it is easy to get bouncy, and loose balls and outplays are likely to be long.
Physical Data Trends
So let's look at the other main topic, physical data. Physical data disclosed in the FIFA report include total distance traveled, distance between zones 1 and 5, number of high-speed runs (zone 4) and sprints (zone 5), and top speed. The zones represent a range of speeds per hour, with different standards for men and women as shown in the table below.
First, let's look at a brief summary by tournament. The data presented here excludes games in which players left the field and their opponents left the field. I know it's hard to look at a table full of numbers, but I don't like to take out information for the sake of clarity, so please forgive me.

In terms of total distance traveled, the figures were U-20 Men's World Cup > Men's World Cup > Women's World Cup = U-17 Men's World Cup; the U-17 Men's World Cup would have changed depending on the venue and season. High-intensity mileage for men was Men's World Cup > U-20 Men's World Cup > U-17 Men's World Cup. Total distance traveled is generally described as mileage, but about one-third of it is recorded at about walking speed. Recently, some athletes record a maximum of 13 km in one match, but the expression "ran 13 km" is not strictly correct. In terms of the difference between men and women, even though the speed standard is lower, the distance and distance percentage of high-intensity runs have smaller numbers for women. If you are used to watching men's soccer, when you watch women's soccer, you will see scenes like "I thought I could catch up to the ball, but I couldn't." Therefore, it can be said that in women's soccer, skill and positioning in ball possession are more important than in men's soccer. In the above mentioned Women's World Cup trend, teams that are unable to keep the ball will fight in chaos. Even so, the difference between the top speed of the men's team and the women's team was not as big as I thought it would be, and I believe that as the environment surrounding women's soccer develops and practice and research progresses, the physical figures will increase a little more.

(Data trends for winning teams by tournament. Difference is not median subtraction, but median after calculation. Own team - opponent team.)
We will look at the difference between wins and losses by match classification later, but first, by tournament as a whole. In many of the competitions, the team that won the match was higher in both total distance traveled and high-intensity run distance, but with a difference this large, it is hard to say, "The team that moves or runs wins!" It is hard to say. Many teams won matches in which they did not move more than their opponents.
Data Trends by Match Order (Men's Tournament)


I also put out the data by match order and by wins/losses. For convenience of reference, only the men's competitions are included. The most interesting thing is the difference between the high intensity run-distance opponents, but in the first round of the group stage, the higher ranked teams seem to win, while in the second round, the distribution does not seem so, but in the third round, the higher ranked teams seem to win again. Although we do not have enough data after the knockout stage, it does not seem to matter except for the final round (final and third round), in which the opposite is true. It looks like a wall that cannot be overcome by physicality alone.
I wonder if the Women's World Cup figure has a similar trend.
Data Trends by Match Order (Women's Tournament)
Finally, we combine the first match classification with the physical data. First, here is a summary of the men's competition, regardless of wins or losses.
Data trends by cluster ID (men's competition)
First, it seems that the total distance traveled increases when ball retention is biased. It's easy to see that slow speed numbers in zones 1 and 2 tend to be larger because of the stagnant state of the game. However, cluster ID = 0, which can turn from mid-block to attack there even if the time of possession is long, has a slightly higher high intensity value. Cluster IDs 2 and 9, which are close in ball possession, also both have high intensity values, but the total distance traveled is quite different; I assume that the shorter distance for ID=9 is due to more outplays.


The above chart shows the difference between the winners and losers. In terms of the difference between the win-loss difference and the opponent's difference, the game with ID=2 is probably the one with the largest numerical difference. The difference in the differential between wins and losses for ID=2 is probably the most significant in games where the retention rate is close and there are few loose balls, so it seems to be greatly influenced by physicality. Looking at the high intensity data for ID=3, which is dominated by ball possession, it appears that it is important to prevent the opponent from running, although it is also important to run more than the opponent. It is likely that they are taking the ball away or blocking it before allowing such a situation to arise. Of the holding type, what I noticed in ID=3 and ID=7 is the distance between zones 2-3. It is difficult to make a high intensity run because the situation tends to be stagnant, but it is possible to shift positions even at the speed of zones 2-3. Perhaps other FIFA data can shed some more light on this area.
Distribution by cluster ID and by wins/losses (women's competition)
Let's just look at the distribution for the women's competition, since there is less data available. Actually, the Women's World Cup shows a different trend. In the Women's World Cup, there are many ID=9 matches, but here, those who are inferior to their opponents in terms of high intensity run distance tend to win. Conversely, in matches that are biased toward retention, the high intensity difference seems to have a strong influence on the winner. This would also be influenced by the difference in strength mentioned earlier.
For the sake of this article, I wanted to compare men's and women's results, so I categorized them together. There will be a variety of matches within the ID=9 for the Women's World Cup, so it may be better to divide them into another level.
So we have seen the relationship between match classification, physical data, and wins/losses. Incidentally, the analysis that classified the matches by win/loss only classified them by the final score, so the case where a team takes the lead early in the match and runs away with it and the case where a team wins late in the match are treated the same, right? Therefore, I would like to add the factors of time series and goal difference to the analysis, but it is difficult to do so with the current FIFA data, so I will stop here. I hope that this has helped you understand the difficulty of the relationship between physical data and victory or defeat, as well as the fact that the relationship between ball retention and victory is not constant. The ball retention rate has been around for a long time and is well established, but I don't think there is any need to get hung up on it because it's not that big of a data set.
Thank you.
Comments
Post a Comment