Data Provision Service 

About Pseudo People Flow Data

Pseudo People Flow Data is human flow data that represents typical daily activities for the total national population by using only statistical data published as open data and geospatial information that is available at low cost, such as building data. This data is not limited to the scope of person-trip surveys, is available to everyone, has stable accuracy, and is maintained on a nationwide scale.

This dataset contains the following:

  1. Population attribute distribution data: Based on the census, this data represents individual attributes (age, gender, occupation, household composition, etc.) and their distribution.
  2. Activity data: It shows what each individual does throughout the day and includes information such as activity start time, activity duration, activity purpose, and activity location. This data is differentiated based on statistics from multi-metropolitan area person-trip survey data.
  3. Trip data: Based on activity data, the system provides information about an individual’s trips throughout the day. This includes information such as individual ID, time of departure, departure location, destination location, mode of transportation, and purpose of travel.
  4. Trajectory data: Based on the trip data, it shows the travel trajectory of an individual person throughout the day. Attributes such as individual ID, time, longitude, latitude, mode of transportation, purpose of activity, and (relevant) link ID are included.
  5. Dynamic population distribution data and triffic volume data: Based on trajectory data, population distribution data at the 500m mesh level is provided every 6 minutes and linked traffic volume data on the road network every hour.

Features of the latest version (ver2.0)

The Pseudo-Personal Flow Data ver. 2.0, provided in 2023, has undergone several significant improvements. Specifically, the 2015 Census data was updated to the 2020 Census data, the accuracy of destination selection was improved by utilizing POI attributes, and buses are now available and bus routes are considered in route selection. An estimated transportation model based on travel costs has also been introduced to ensure realistic transportation choices.

This dataset is the world’s first attempt to reproduce pseudo-human flows for 120 million people, covering 1,724 cities, towns, and villages nationwide, and has achieved a high correlation (0.81) with cell phone data. This provides an easy-to-use human mobility simulation tool that can be used in digital twin scenario analysis in various fields such as public health, urban planning, and location-based services.

For details of the data, please refer to the National Pseudo-Human Flow Data Specification ver 2.0.

Improvements in each version

How to decompress data

The “Pseudo-Human Flow Trajectory Data” is very large data. Therefore, when the data is provided via JoRAS, the files are provided in a split format. This facilitates data downloading and management. However, the split files must be combined and decompressed before use. Below are instructions on how to decompress split files on each operating system.

Windows

In Windows, use tools such as 7-Zip or WinRAR to recombine and decompress split files.

  1. Check the split file:
    • Make sure all the split files(e.g., trajectory_foldername.part00, trajectory_foldername.part01, etc)are present.
  2. Combine and unzip with 7-Zip or WinRAR:
    • Right-click on the first of the split files(e.g., trajectory_foldername.part00)and select “Extract here” in 7-Zip or WinRAR.
    • This will cause the tool to automatically combine and decompress all the split files.
macOS

In macOS, use the standard commands for merging and unzipping using the terminal.

  1. Merge split files

Open a terminal and execute the following command to combine the split files:

cat trajectory_foldername.tar.gz.part* > combined_trajectory_foldername.tar.gz

2. File unpacking

Extract the combined file with the following command:

tar -xzf combined_trajectory_foldername.tar.gz -C /path/to/extract/folder

Specify the directory where the extracted data will be stored in /path/to/extract/folder

Linux

Linux also works in the terminal as macOS.

  1. Merge split files

Combine the split files with the following command:

cat trajectory_foldername.tar.gz.part* > combined_trajectory_foldername.tar.gz

2. File unpacking

Extract the combined file using the following command:

tar -xzf combined_trajectory_foldername.tar.gz -C /path/to/extract/folder

Paper Information

More information on the construction of this dataset can be found in the following paper:

Takehiro Kashiyama, Yanbo Pang, Yuya Shibuya, Takahiro Yabe, Yoshihide Sekimoto. Nationwide synthetic human mobility dataset construction from limited travel surveys and open data. Computer-Aided Civil and Infrastructure Engineering. Available 10 June 2024 online. https://doi.org/10.1111/mice.13285

Fig.1:Image of Pseudo People Flow Data

(Top: National level link traffic visualization; Bottom (left): Mesh population distribution; Bottom (right): Link traffic)

Page Top