Using this Data Site
Welcome to the International Relations Data Site, created and maintained by Paul Hensel of the Department of Political Science at the University of North Texas. This site includes seven pages of links to on-line data resources for the serious international relations scholar, as well as the introduction page that you are currently reading. These pages are meant to include the most useful data sources on processes of international conflict and cooperation, as well as data covering international economic, environmental, political, and social data and data on similar topics for the United States.
Data Site Index
The best way to use this data site is to go to the appropriate page for the type of data that you need, using the following index. It is worth noting that some of the resources included on one page may also include variables that are relevant to the other pages; for example, many of the economic development resources included in the economic data page include social and environmental variables as well as strictly economic variables. The bold headings in this index represent separate pages on this site; users may click on these headings to start at the top of the page in question, or if desired may click on the subheadings listed below to go directly to the most relevant portion of the page.
- International Data (General)
- International Conflict and Cooperation Data
- International Economic Data
- International Environmental Data
- International Political Data
- International Social Data
- ISA Compendium companion page (a related page of links associated with the SSIP section of ISA's Compendium of International Relations project, which has been updated more recently than some of these pages)
Once on the appropriate page, users may simply scroll up or down using the scroll bars on their Internet browser, in order to scan each data set on the page. Alternatively, you may use your browser's "Find on Page" command to search the page for specific data sets or specific variables (on most Macintosh browsers this is done with the Find command under the "Edit" menu, or with the keyboard equivalent (command-F); other browsers usually offer an equivalent capability). For example, a user desiring data from the Correlates of War project could search for the phrase "COW" or "correlates," and a user seeking data on a specific variable could search for the variable name (such as "GDP"). Users employing this search option would do well to try different variants of the desired variable name, though; many of the sources included in these data pages include dozens of different variables, so the specific variable names are left out in favor of more general names of variable groups (e.g., "economic wealth" instead of seven or eight specific measures using different GNP formulas or per capita calculations, or "water resources" instead of five different measures of water inflows, outflows, and consumption).
Information about Each Data Set
Beyond simply listing each data set and providing a link by which it can be obtained, these pages attempt to provide important information about each data set to help the user determine which data set(s) would be most useful. Three general types of information are provided for each data set:
(Note that not all of this information is currently available for each data set, particularly for the "data format" section that was first added in December of 1999. This will gradually be corrected as time allows.)
Spatial-Temporal Domain: This section describes the cases that are included in each data set. Some data sets attempt to cover the entire world from 1816-1992, for example, while others are limited to a specific subset of cases (e.g., South America or OECD members) or a more limited temporal subset (e.g., 1990-1995).
Variables Included: This section lists the variables, or groups of variables, that are included in each data set. Where possible, each individual variable is listed, although for some massive data compilations only the general groupings of variables are reported (in order to keep the page as usable as possible).
Data Format: Unfortunately (or perhaps fortunately?), data sources on the Web are available in a wide variety of formats. Some are available in specific word processing, spreadsheet, or database formats, some are pre-packaged as data sets usable in a specific statistical package, some are provided as ASCII text that can be read into any word processor, spreadsheet, or statistical package, and some are provided in Adobe Acrobat's PDF format for ease of printing. The following description should help clear up what each data format means and guide users in using data provided in each format.
- ASCII Text files (.txt, .asc): These files are simply saved as plain text, with no formatting or other file characteristics that are specific to any particular word processor, spreadsheet, database, or statistics package. They may be viewed with or read into any program, although users may experience difficulties in determining where one variable value ends and the next begins while trying to load the file into their computer program of choice.
- ASCII Delimited files (.csv, .tab): These files are saved as plain text, like standard ASCII text files, but a specific character (usually a comma, a tab, or a space) is placed between each variable in the file. This allows statistics packages, spreadsheets, or databases to recognize different variables and read them in correctly. This is definitely the most useful file type for widespread data distribution, as it can be read into any computer program and is easy to use because variables are delimited from each other; this is the format I use for distributing all of my own data (and the format that I wish everybody else would use).
- Word processing files (Microsoft Word,RTF, WordPerfect): These files are saved in a format used by a specific word processing program, such as Word or WordPerfect (although RTF is a common interchange format that can be ready by just about any word processor). This is no problem for users of that particular program, as they can read in the file directly and save it in any other format that is convenient for them. For users who do not have access to the particular word processor in question, though, such files are much more difficult to use. Microsoft makes available free Office file viewers for Windows users, but other users may need to rely on file conversion packages such as DataViz' Mac Link Plus or Conversions Plus. There is also the risk that Word or other files may be saved in a newer version of the program than many users possess, in which case file conversion packages (or friends or colleagues with access to the program in question) are the only option.
- Spreadsheet files (Microsoft Excel): These files are saved in a format used by a specific spreadsheet program, such as Excel or Lotus. This is no problem for users of that particular program, as they can read in the file directly and save it in any other format that is convenient for them. For users who do not have access to the particular spreadsheet in question, though, such files are much more difficult to use. Microsoft makes available free Office file viewers for Windows users, and many spreadsheets can import Excel or Lotus files, but other users may need to rely on file conversion packages such as DataViz' Mac Link Plus or Conversions Plus or more specialized software such as Stat/Transfer. There is also the risk that Excel or other files may be saved in a newer version of the program than many users possess, in which case file conversion packages (or friends or colleagues with access to the program in question) are the only option.
- Database files (Microsoft Access, Paradox): These files are saved in a format used by a specific database program, such as Access or Paradox. This is no problem for users of that particular program, as they can read in the file directly and save it in any format that is convenient for them. For users who do not have access to the particular database in question, though, such files are much more difficult to use. Free readers are often not available, meaning that specialized software such as Stat/Transfer is usually needed to access such files.
- Statistical databases (SAS, SPSS, Stata): These files are saved in a format used by a specific statistical package, such as SAS, SPSS, or Stata (a fair number of data sets are also provided in time series package formats such as RATS). This is no problem for users of that particular program, as they can read in the file directly and save it in any format that is convenient for them. For users who do not have access to the particular database in question, though, such files are much more difficult to use. Free readers are often not available, meaning that specialized software such as Stat/Transfer is usually needed to access such files.
- Adobe Acrobat files (.pdf): These files are created in Adobe's Acrobat software using Portable Document Format (PDF) format. The goal behind this format is to create a file type that will always look and print the same on any computing platform and any type of printer. Users can not usually cut and paste or extract text from PDF files, meaning that such files will not save much data entry work, but this format does allow the easy cross-platform distribution of information and it does guarantee that the information will appear correctly (users can always type in the data by hand after printing out the PDF file). Acrobat/PDF files can be viewed and printed using the freeware Adobe/Acrobat Reader software for Macintosh, Windows, and many Unix/Linux variants.
An additional complication involves data compression. Many of the data sets used in social science research are quite large, and would require excessive bandwidth (and excessive download times) if made available in their full size. As a result, many on-line data sets are made available in some compressed format, which may reduce their size by 90 percent or more (depending on the nature of the original data set and the compression format used). The following description should help clear up what each compression format means and guide users in extracting the needed data set from the compressed files.
- Zipped archives (.zip): These can be decompressed using software such as PKZip (for DOS, Windows, OS/2, and Unix), WinZip (Windows), ZipIt (Macintosh), or StuffIt Expander for Macintosh or Windows.
- StuffIt archives (.sit): These files are created using StuffIt or DropStuff, and represent the most common compression technique on the Mac platform. They may be decompressed using software such as StuffIt Expander for Macintosh or Windows.
- Binary or binhex archives (.bin, .hqx): These files used to be common on the Mac platform, and were typically used to ensure that files do not get corrupted while being sent or downloaded across the Internet. They may be decompressed using software such as StuffIt Expander for Macintosh or Windows, or WinZip (Windows).
- Other archives (.gz, .tar, .uu, .z, etc.): These files are created using a variety of other compression techniques on a variety of platforms. Many archives of these non-standard types may be decompressed using software such as StuffIt Expander for Macintosh or Windows, or WinZip (Windows).
Note about Citations to Internet Sources
It is important to remember that the data sets included on this page come from other scholars or data collectors, who should receive the appropriate credit for their work. In most cases, the web pages linked from this site provide proper citation information. In some cases, though, the links send users directly to downloadable data sets, with no information page to offer citations. In these cases the user (you) should cite the Internet source appropriately; my Citations and Plagiarism page discusses the need to cite and various citation styles.
Suggestions or Additions to This Page
Feel free to email me with any suggestions or additions for this page. This page can always benefit from new data sets that are not yet listed here, as well as from corrections or updates for data sets that are already included (since I unfortunately do not have the time to check every single one of these links on a regular basis to make sure that nothing has changed).
Please note, though, that I can not guarantee a rapid response to requests to add data sets. I do not get paid to maintain this web site, and it probably won't earn any credit toward tenure, so teaching and research usually take priority at any given point in time. For the same reason, if I have not yet posted a link here, you should assume that I do not know of the resource in question. While it is true that I have bookmarked a number of links to be added to these data pages eventually, these links are not easily searchable or accessible until I actually add them to the pages. Rather than emailing me to ask if I know of a certain type of data on the Web, you are probably better off checking some of the data collections listed further down this page, trying some of the data sources on my other data pages that appear to be closest to what you are looking for, or trying Google or another search engine.
http://www.paulhensel.org/data.html
Last updated: 30 July 2018
This site © Copyright 1996-present,
Paul R. Hensel. All rights reserved.
Site Privacy Policy