Tag Archives: data

Running sums in Stata

Perhaps it is bad that I didn’t know this before, but the following code for Stata would have saved a week off of my dissertation work.  Suppose that you have data structured like so:

firm_id,date,amount

and you want to create a new variable that is the total amount as of each date for each firm.  In Stata, you simply type:

sort firm_id date
bysort firm_id: gen total_t = sum(amount)

Note the use of ‘gen‘ rather than ‘egen.’ The ‘sum’ command differs by the type of generate command (i.e. gen or egen), so about 500 lines of loops written in Stata code could be condensed in a few lines.  Stata needs to fix the ‘egen’ and ‘gen’ distinction or I need to port more of my projects to R.

Latex regression and summary tables in Stata

A clean, well organized latex table is difficult to build. If you do a lot of analysis in Stata, there are several tools to output latex tables of your regressions or summary statistics. These packages do not always work perfectly with the standard options. Below I present two example code snippets to produce a latex table of a set of regressions that includes a IV estimator and a summary statistics table that compares two groups in a database. Each uses the eststo package.

Regression Table with Multiple Equations and Stages

Here I run a couple of limited dependent variable models and a two-stage bivariate probit with an IV. The output isn’t perfect, but it works for pre-submission distribution.

The output looks like this:

Latex regression output from eststo

Summary statistics with a by variable

Next, consider summarizing the characteristics of two groups in your data. For example, I want to compare the age, number of boards seats and other features of venture capital spinoff founders to everyone else.

The output will look like this:

Found

The full details of the esttab, estpost and eststo have many more options and a lot of examples.

Stata, ODBC and date columns

Stata has a ton of flexilbility for creating and manipulating dates. However, if you want to save Stata data to an external database (e.g. Accesss, PostgreSQL, MySQL, etc.) the numeric date format in Stata will be difficult to interpret outside the program. My use case involved working with Stata to merge and clean some data that was pushed to a MySQL database with odbc and later loaded in R with its odbc functionality. It was in the last step that I learned of Stata’s dating conventions when writing to external databases. So I asked the Stata list.

A very helpful subsciber presented a solution very similar to one that I mocked up. With some of the code posted on the Statalist and some new additions, I present odbc2create. UPDATE: I fixed an issue when dealing with a database with no dates and had to add a loop.

This modified odbc command does the following:

  • searches all your variables for dates (they must be formatted as such or Stata’s ability to detect them is impossible)
  • converts those dates to the YYYY-MM-DD format
  • inserts the dates into your mysql database as strings
  • re-types those date columns in the newly created database as DATEs

The best part: when you load a table created this way back into Stata, it immediately recognized the DATEs as dates.  I hope they build this functionality into Stata in the future.  One caveat (which may explain why they haven’t built it internally) is that the ALTER command in the ado file is specific to MySQL.  Someone should generalize the code to recognize the datasource engine and modify the ALTER command accordingly.