I’m looking to automate the process of converting many .CSV files into .DTA files via Python. .DTA files is the filetype that is handled by the Stata Statistics language.
I have not been able to find a way to go about doing this, however.
R language has write(.dta) which allows a dataFrame in R to be converted to a .dta file, and there is a port to the R language from Python via RPy, but I can’t figure out how to use RPy to access the write(.dta) function in R.
You need rpy2 for Python and also the
foreign package installed in R. You do that by starting R and typing
install.packages("foreign"). You can then quit R and go back to Python.
import rpy2.robjects as robjects robjects.r("require(foreign)") robjects.r('x=read.csv("test.csv")') robjects.r('write.dta(x,"test.dta")')
You can construct the string passed to
robjects.r from Python variables if you want, something like:
robjects.r('x=read.csv("%s")' % fileName)
(copypasting from my answer to a previous question)
pandas DataFrame objects now have a “to_stata” method. So you can do for instance
import pandas as pd df = pd.read_stata('my_data_in.dta') df.to_stata('my_data_out.dta')
DISCLAIMER: the first step is quite slow (in my test, around 1 minute for reading a 51 MB dta – also see this question), and the second produces a file which can be way larger than the original one (in my test, the size goes from 51 MB to 111MB). Spacedman’s answer may look less elegant, but it is probably more efficient.