當前熱門：【干貨】Python與STAT時間日期轉換問題

來源：CSDN 時間：2023-04-06 10:00:01

常見問題與解決方案

問題目錄：1. 時間變量的操作2. 索引的重建與設置3. 數據的排序4. 數據類型的轉化5. 異常處理 -跳過異常繼續執行6. CSV 與 Excel的批量轉換7. 文件操作8. 數據讀入與寫入9. 缺失值問題 np.nan,None,NaT10. 使用Python輸出回歸表格11. 特殊結構數據整理12. 估計系數聯合檢驗（Wald tests via statsmodels）13. Python 與 STATA 時間日期轉換問題14. Python執行面板數據的Hausman檢驗15. Fama Macbeth回歸、滾動回歸等等16. 事件研究法 event study生創創如注K新UF導導出導入

(資料圖)

問題目錄：

Python,作為當今主流的編程語言，受到全世界愛好者的追捧。如果你是一名科研小白，導師也許要求你使用它來完成一些任務。確實，入門容易，看看市面上種類繁多的教科書就行，但是，能夠做到靈活運用又是談何容易，Python語句編寫的靈活性常常讓很多的文科生想死的感覺都有。以下的內容就是鄙人在講授相關課程時學生容易產生疑問的知識點，不是很全面，還在持續的更新之中…。

1. 時間變量的操作

主要的時間模塊有包括time、datetime和calendar。在Python中表示時間的方式：

時間戳：10位整數位和若干小數位，例如 1551153156.6358607元組（struct_time）: 含有9個元素的元組，例如 (tm_year=2011, tm_mon=9, tm_mday=28, tm_hour=10, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=271, tm_isdst=-1)格式化字符串：格式化的時間字符串，例如 ‘2019-02-26 12:45:46’time模塊，以元組（struct_time）為核心實現時間戳和格式化時間字符串的相互轉換。datetime模塊，以datetime類實例對象為核心實現時間戳和格式化時間字符串的相互轉換。

import time#1.時間戳------->時間元組:time1 = time.time() #顯示的是時間戳tuple = time.gmtime(time1) # UTC時間時間元組print(tuple)#time.struct_time(tm_year=2021, tm_mon=11, tm_mday=3, tm_hour=16, tm_min=9, tm_sec=20, tm_wday=2, tm_yday=307, tm_isdst=0)tuple1 = time.localtime(time1) # UTC + 8 時間時間元組print(tuple1)#time.struct_time(tm_year=2021, tm_mon=11, tm_mday=4, tm_hour=0, tm_min=9, tm_sec=20, tm_wday=3, tm_yday=308, tm_isdst=0)#2.時間元組-------->時間戳:tuple2 = time.localtime()time2 = time.mktime(tuple2)print(time2)#1635955760.0#3.時間元組--------->字符串:tuple = time.localtime()strTime = time.strftime("%Y-%m-%d %H:%M:%S",tuple)print(strTime)#2021-11-04 00:09:20strTime1 = time.strftime("%Y{y}%m{m}%dtldv75d %H{h}%M{m1}%S{s}",tuple).format(y="年",m="月",d="日",h="時",m1="分",s="秒") #"2021年11月02日 23時49分53秒"print(strTime1)#2021年11月04日 00時11分49秒#4.字符串---------->時間元組:tupleTime = time.strptime("2018年01月05日","%Y{y}%m{m}%dhhprvlh".format(y="年",m = "月",d="日"))print(tupleTime)#time.struct_time(tm_year=2018, tm_mon=1, tm_mday=5, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=4, tm_yday=5, tm_isdst=-1)

時間元組(time.struct_time) 時間格式化

參考文獻：python 之時間戳 Python中的時間元組與時間日期 Python3 日期和時間 Python日期時間（詳細）

在實戰中，數據常常存儲在DataFrame之中，時間 date變量可以是作為index存在的，也可以是作為一個單獨的變量存在的。格式為：2021-11-10 或者 20211110 或者2021/11/10 或者2021-11-10 00:00:00 等等樣子。

首先，我們在 IPYTHON的命令窗口 In 1: 輸入 df.dtypes（假設數據對象是df）或者輸入 df.info()，查看各個變量的個數，以及各個變量的屬性。

如果date是字符型則顯示為 object，如果是數字型則顯示為 int64 或者 float64，如果是日期型則顯示為 datetime64[ns]。

a.如果時間 date變量是作為index存在的（一般情況下通過 API提取的數據存在這樣的情況，并且這時的date是時間型datetime64[ns]），下面我們要提取它的年月日。df[‘year’] = df.index.year df[‘month’] = df.index.month df[‘day’] = df.index.day

b.如果是date是數字型變量（20101203），顯示為 int64 或者 float64。df[“year”] = df[“date”].apply(lambda x: int(str(x)[:4])) #year 數字型 df[“year2”] = df[“date”].apply(lambda x: str(x)[:4]) #year2字符型

df[“month”] = df[“date”].apply(lambda x: int(str(x)[4:6])) df[“day”] = df[“date”].apply(lambda x: int(str(x)[6:]))

c.如果是date是字符型變量（“20101203”），顯示為 object。方法一： import pandas as pd from datetime import datetime df[“date2”]=pd.to_datetime(df[“date”]) #字符型轉化為時間型 datetime64[ns] df[“year”] =pd.to_datetime(df[“date”]).dt.year #year數字型 df[“month”] =pd.to_datetime(df[“date”]).dt.month df[“day”] =pd.to_datetime(df[“date”]).dt.day

方法二： df[“year”] = df[“date”].apply(lambda x: int(x[:4])) #year數字型 df[“month”] = df[“date”].apply(lambda x: int(x[4:6])) df[“day”] = df[“date”].apply(lambda x: int(x[6:]))

方法三：使用astype

df[‘date’].astype("datetime’64)

方法使四：用datetime df[‘date’] = df[‘date’].apply(lambda x: datetime.strptime(x,’%Y-%m-%d’))

將日期轉換為月份M、季度Q df[‘period’] = df[‘date’].dt.to_period(‘M’)

d.如果date是日期型，則顯示為 datetime64[ns]。df[“year”] =pd.to_datetime(df[“date”]).dt.year #year數字型 df[“month”] =pd.to_datetime(df[“date”]).dt.month df[“day”] =pd.to_datetime(df[“date”]).dt.day

e.如果date是python將excel讀取的日期文本數字轉為日期(5位數字時間戳)。

如果時間戳是很長的位數，則：

df[‘gtime’]=pd.to_datetime(df[‘gtime’],unit=‘s’)) #以上以默認1970-01-01為0點位

或者：

df[‘time_stamp’]=pd.to_datetime(df[‘time_stamp’],unit=‘s’,origin=pd.Timestamp(‘2018-07-01’)) #這個的意思是將time_stamp這列的時間戳轉換為日期格式，單位是秒（unit=‘s’），計算的日期是從2018年的7月1號開始，即從2018年的7月1號開始加上時間戳的那么多秒。

下面以excel讀入時間數據（時間變為看似浮點型的數字了，并且是5位數字時間戳，開始的時候確實有點懵逼-^^-）為例來講解：

#這列fin46本來在excel中是日期，讀入Python后變為數字了，并且還有很多的缺失值nan#如果直接使用時間轉換函數，可能轉不了，需要先將nan轉為數值0df_final["fin462"]=df_final["fin46"].fillna(0) from datetime import datetime#以下是時間轉換函數def date_change(stamp): delta = pd.Timedelta(str(stamp)+"D") real_time = pd.to_datetime("1899-12-30") + delta return real_time#套用以上函數，轉為 2021-11-16 00：00：00 字符型格式df_final["fin463"]=df_final["fin462"].apply(date_change)#去掉時分秒df_final["fin463"] = df_final["fin463"].dt.date #或者代碼df_final["fin4633"] = df_final["fin463"][0].strftime("%Y-%m-%d")#將日期 fin463 轉換為月份、季度df_final["period"] = df_final["fin463"].dt.to_period("M")df_final["period2"] = df_final["fin463"].dt.to_period("Q")#把一些本來是缺失的時間變為空值df_final.loc[df_final["fin462"] == 0,"fin463"] = None#字符型轉化為時間型 datetime64[ns]df_final["date2"]=pd.to_datetime(df_final["fin463"]) #刪除中間變量，因為他們已經無用武之地del df_final["fin462"]del df_final["fin463"]#生成三列：年月日df_final["year"] =pd.to_datetime(df_final["date2"]).dt.year #year數字型df_final["month"] =pd.to_datetime(df_final["date2"]).dt.monthdf_final["day"] =pd.to_datetime(df_final["date2"]).dt.day

參考文章： https://blog.csdn.net/weixin_41261833/article/details/104839119 https://vimsky.com/examples/usage/python-pandas-period-strftime.html https://vimsky.com/zh-tw/examples/usage/python-pandas-datetimeindex-strftime.html https://www.cnblogs.com/shadow1/p/10951979.html https://www.cnblogs.com/nxf-rabbit75/p/10660317.html https://blog.csdn.net/u010591976/article/details/104253489?ivk_sa=1024320u

參考文章：

a = "44042"import pandas as pddef date(stamp): delta = pd.Timedelta(str(stamp)+"D") real_time = pd.to_datetime("1899-12-30") + delta return real_timeprint(str(date(a)).split(" ")[0])

原文鏈接：https://blog.csdn.net/qq_57173722/article/details/121030038

io = "中國數據.xlsx"dt = pd.read_excel(io, sheet_name = 0)dt1 = pd.read_excel(io, sheet_name = 1)def date(para): if type(para) == int: delta = pd.Timedelta(str(int(para))+"days") time = pd.to_datetime("1899-12-30") + delta return time else: return paradt["日期"] = dt["日期"].apply(date)dt

還有這種時間顯示的格式，pandas 讀excel，日期變成了數字，pandas方法解決：

import pandas as pddata = pd.read_excel("文件路徑")data["發貨日期"] = data["發貨日期"].fillna(method="ffill") # 因為有合并單元格，datadef date(para): delta = pd.Timedelta(str(int(para))+"days") time = pd.to_datetime("1899-12-30") + delta return timedata["發貨日期"] = data["發貨日期"].apply(date)data

計算兩個日期的時間（天數）間隔。

import timedef demo(day1, day2): time_array1 = time.strptime(day1, "%Y-%m-%d") timestamp_day1 = int(time.mktime(time_array1)) time_array2 = time.strptime(day2, "%Y-%m-%d") timestamp_day2 = int(time.mktime(time_array2)) result = (timestamp_day2 - timestamp_day1) // 60 // 60 // 24 return resultday1 = "2018-07-09"day2 = "2020-09-26"day_diff = demo(day1, day2)print("兩個日期的間隔天數：{} ".format(day_diff))#或者：def is_leap_year(year): if (year % 4 == 0 and year % 100 != 0) or year % 400 == 0: return 1 else: return 0def get_days(year, month, day): days = 0 month_days = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31] if is_leap_year(year): month_days[1] = 29 for i in range(1, year): year_day = 365 if is_leap_year(i): year_day = 366 days += year_day for m in range(month - 1): days += month_days[m] days += day return daysdef get_result(start_time, end_time): res = end_time - start_time return resyear1, month1, day1 = 2018, 7, 9year2, month2, day2 = 2020, 9, 26days1 = get_days(year1, month1, day1)days2 = get_days(year2, month2, day2)day_diff = get_result(days1, days2)print("兩個日期的間隔天數：{} ".format(day_diff))

2. 索引的重建與設置

關鍵詞：reset_index

reset_index可以還原索引，重新變為默認的整型索引。

df.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=”)常用的是： df.reset_index(drop=False, inplace=True)

a.如果數據框df 索引和變量都存在一個 date, 重建索引時要使用： df.reset_index(drop=True,inplace=True) 這樣date就能夠順利從index角色變為普通變量的角色，方便再次重新編制索引了。

b.如果數據框df 索引與變量不存在重復, 重建索引時要使用： df.reset_index( inplace=True) #索引變為普通變量 df.reset_index(drop=True, inplace=True) #原索引被刪除了

當然，前期在重建索引時，最好不要刪除原索引，后期可以隨時刪除： df.drop(“date”,axis=1,inplace=True) 或者： del df[“date”] #都能夠刪除這個原來的索引date.

關鍵詞：set_index

DataFrame可以通過set_index方法，可以設置單索引和復合索引。

DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

常用的是： df.set_index(“code”,drop=False, inplace=True)

3. 數據的排序

要對pandas.DataFrame和pandas.Series進行排序，可以使用sort_values（）和sort_index（）方法。

a.按元素排序sort_values（）（如果要更改原始對象，則參數inplace=True）

df_s = df.sort_values(‘state’) #按照一個變量排序 df_s = df.sort_values(‘state’, ascending=False)#按照一個變量排序，降序排 df_s = df.sort_values([‘age’, ‘state’], ascending=[True, False]))#按照多個變量排序，降或升序排都可以設置 df_d .sort_values(by=1, axis=1, ascending=False, inplace=True)#按照列來排序

b. 按索引排序（行名/列名）sort_index（）

與sort_values（）一樣，默認值為升序。如果要使用降序，請將升序參數設置為False。 df_s = df.sort_index(ascending=False)

按列名列排序（參數axis），意義不大。 df.sort_index(axis=1, ascending=False, inplace=True)

c.多層索引的取值

loc使用的是標簽索引，iloc使用的是位置索引。但是，iloc的取值并不會受多層索引影響，只會根據數據的位置索引進行取值。

s.loc[‘張三’] s.loc[‘張三’,‘期中’] s.loc[:,‘期中’] s.iloc[0] df.loc[‘張三’].loc[‘期中’] df.loc[(‘張三’,‘期中’)]

多層索引的排序:DataFrame按行索引排序的方法是sort_index(),df.sort_index()中的level參數可以指定是否按照指定的層級進行排列，第一層級索引值為0，第二層級索引值為1。df.sort_index(level=0,ascending=False).

4. 數據類型的轉化

在實際的數據處理中，常見的數據類型包括了：numpy中的array、list、tuple、dict、pandas中的series、pandas中的dataframe。我們在套用他人的代碼的時候，很多同學經常出錯，原因就是別人代碼輸入端的規定數據類型和你的數據類型不一樣，對象不一樣，對象的屬性和方法就出現了很大的差異，那程序當然運行不了呀。所以，你需要理解別人的代碼，然后把我們自己的數據轉化為合格的數據類型。

a. DataFrame 轉化為 List 類型

import pandas as pd import numpy as np from pandas import DataFrame

data_file = ‘d:/Users/qiguan/Downloads/data.xlsx’ #data_file 數據文件路徑

data = pd.read_excel(data_file, index_col=None)#讀取文件數據

data_array = np.array(data)#首先將pandas讀取的數據轉化為array

data_list =data_array.tolist() #然后轉化為list形式

b. 把DataFrame轉成Series類型,改變列中值的類型方法

b1. 使用 pd.Series把dataframe轉成Seriesimport pandas as pd df_series = pd.Series(df[‘Value’].values, index=df[‘Date’])

b2.使用astype改變列中的值的類型，注意前面要有npimport numpy as np df[‘列名’] = df[‘列名’].astype(np.int64)

c. DataFrame類型轉換成Numpy中array類型import numpy as np import pandas as pd df=pd.DataFrame({‘A’:[1,2,3],‘B’:[4,5,6],‘C’:[7,8,9]})

c1.使用DataFrame中的values方法np1 = df.values #type(np1) 查看類型為 numpy.ndarray

c2.使用Numpy中的array方法np2 = np.array(df) #type(np2) 查看類型為 numpy.ndarray 2種方法效果相同，都能實現DataFrame到array的轉換

5. 異常處理 -跳過異常繼續執行

python一般使用try…except…處理異常，這樣子的話，保障程序在遇到意外情況的時候，能夠繼續執行下去。當然，這個辦法也有缺陷，就是丟失掉了一些數據，失去了改正的機會。要是想改正，我們就必須讓程序返回錯誤的信息是什么才好下手去找錯。

比如，很多時候代碼只要有一個異常，程序就不繼續執行了。異常情況是有很多的，有的你想也想不到。那么，當循環中出現異常時，如何跳過循環中的異常繼續執行。比如當我move一個文件的時候，如果目標文件不存在，程序可跳過異常，繼續執行，下面是一種可行的方法：

import pandas as pddates=range(20161010,20161114)pieces=[]for date in dates: try: data=pd.read_csv("A_stock/overview-push-%d/stock overview.csv" %date, encoding="gbk") pieces.append(data) except Exception as e: pass #存有異常，但是跳過現在的錯誤的當前循環，繼續執行下一步循環。 continue #無論是否有異常，都將繼續執行 continue后面的命令，但是這里沒有了后續命令，所以這兒的pass continue結構可以修改，即 pass改為continue，不過，如果原始的continue后面有語句則不同了。data=pd.concat(pieces)

下面的代碼相對完善一點：

try: #x=1/0 #這個錯誤對應于 except Exception as e: print("========>1") name print("==========>2") l = [1, 2, 3] # l[100] print("==========>3") d={} d["name"] print("=============>4")except NameError as e: print("----------->", e) print(1)except IndexError as e: print("----------->", e) print(2)except KeyError as e: print("----------->", e) print(3)except Exception as e:#異常不是上面的幾種類型，但仍存有異常，則執行這個語句 print("統一的處理方法",333)else: #所有的異常都沒有用發生，則執行下面的語句 print("在被監測的代碼塊沒有發生異常時執行")finally: #不管上面的語句是否正確，都能執行下面的語句。 print("不管被監測的代碼塊有無發生異常都會執行")print("執行接下去的代碼")

try except 與pass continue結合編寫代碼：

(1)為了跳過for循環里的某次循環,以下代碼當某次循環發生錯誤時，執行except代碼塊，continue跳過該次循環：

x=5for i in range(x): try: i += 1 print(i) except: pass #這兒pass可以換為 continue print("kk:",i)

(2)還可以寫成這樣,遇到錯誤執行except代碼塊，pass忽略錯誤并繼續往下運行，略有不同的就是無論程序錯誤與否都會運行到continue這一行代碼：

x=5for i in range(x): try: i += 1 print(i) except: pass continue

如果continue后面有語句情況就不一樣了。

x=5for i in range(x): try: i += 1 print(i) except: pass continue print("kk:",i)

修改一下：

x=5for i in range(x): try: i += 1 print(i) except: continue print("kk:",i)

(3)還有一種用法就是遇到錯誤時直接中斷整個for循環：

try: for i in range(x): i += 1 print(i)except: pass

pass, continue 與條件語句的結合：

var = 10 while var > 0: var = var -1 if var == 5: pass print ("當前變量值 :", var)print( "Good bye!")

var = 10 # 第二個實例while var > 0: var = var -1 if var == 5: continue print ("當前變量值 :", var)print( "Good bye!")

綜上，continue 終止本次循環功能不變，pass無所事事的功能不變，但continue在與 pass結合起來，并與條件語句、循環語句使用的時候，要看清楚程序的結構層次，注意不同寫法的差異。

6. CSV 與 Excel的批量轉換

在實際生活工作中，我們經常Csv文件與Excel文件，但是這兩類文件通常情況下不能直接轉換，Csv是逗號分隔數據的純文本文件，可以保存任何文本類型數據，但不能保存格式、公式、宏等），需要手動另存對應格式文件。文件數量少還好，一旦涉及文件數量較多時，重復“另存為”操作很繁鎖，而且往往會因編輯后未及時保存而丟失加工后的數據。

6.1 Excel轉換為Csv (1)讀取當前目錄下Excel文件 (2)讀取Excel文件內容 (3)轉換為Csv文件

#excel文件轉化為csv文件def excel_to_csv(in_path,out_path): #獲取當前目錄下所有的excel文件 file_list=[file for file in os.listdir(in_path) if os.path.splitext(file)[1][1:]=="xlsx" or os.path.splitext(file)[1][1:]=="xls" ] #遍歷每個文件 for file in file_list: #讀取文件數據 df=pd.read_excel(os.path.join(in_path,file),index_col=0) #轉換為csv文件，不保存索引 df.to_csv(os.path.join(out_path,os.path.splitext(file)[0]+".csv"),index=False) #輸出提示信息 print("轉換完成")

6.2 Csv轉換為Excel (1)讀取當前目錄下Csv文件 (2)讀取Csv文件內容 (3)轉換為Excel文件

#csv文件轉化為excel文件def csv_to_excel(in_path,out_path): #獲取當前目錄下所有的csv文件 file_list=[file for file in os.listdir(in_path) if os.path.splitext(file)[1][1:]=="csv"] #遍歷每個文件 for file in file_list: #讀取文csv件數據 df=pd.read_csv(os.path.join(in_path,file),index_col=0) #轉換為excel文件，不保存索引 df.to_excel(os.path.join(out_path,os.path.splitext(file)[0]+".xlsx"),index=False) #輸出提示信息 print("轉換完成")#讀者可自己運行觀察結果

7. 文件操作

參考文獻：Python文件操作，看這篇就足夠 python 讀寫、創建文件 python 文件操作，最全的一個（轉） Python文件操作

8. 數據讀入與寫入

參考文獻：讀寫Excel文件第三方庫匯總

9. 缺失值問題 np.nan,None,NaT

文獻1 文獻2 https://blog.csdn.net/sinat_26811377/article/details/103216680 https://blog.csdn.net/laicikankna/article/details/106881864

10. 使用Python輸出回歸表格

https://github.com/divinites/stargazer-1 使用Stargazer模塊 https://lost-stats.github.io/Presentation/Tables/Regression_Tables.html https://qiita.com/huda_1629/items/d47d023936003ccf3fd1 https://cloud.tencent.com/developer/news/273282 利用python輸出規范的回歸結果，可惜百度網盤共享的代碼不見了。 https://cloud.tencent.com/developer/article/1739458 可以參考一下 https://github.com/YangShuangjie/summary3 原來可以運行，現在依賴包升級了，這個用不了了，希望哪個大神也升級一下這個程序。

from stargazer.stargazer import Stargazer model1_stargazer = Stargazer([model1]) model1_stargazer

import osos.chdir("d:\\") #設置一下HTML輸出的目錄地址import pandas as pdfrom sklearn import datasetsimport statsmodels.api as smfrom stargazer.stargazer import Stargazerfrom IPython.core.display import HTMLdiabetes = datasets.load_diabetes()df = pd.DataFrame(diabetes.data)df.columns = ["Age", "Sex", "BMI", "ABP", "S1", "S2", "S3", "S4", "S5", "S6"]df["target"] = diabetes.targetest = sm.OLS(endog=df["target"], exog=sm.add_constant(df[df.columns[0:4]])).fit()est2 = sm.OLS(endog=df["target"], exog=sm.add_constant(df[df.columns[0:6]])).fit()est3 = sm.OLS(endog=df["target"], exog=sm.add_constant(df[df.columns[[2,4]]])).fit()est4 = sm.OLS(endog=df["target"], exog=sm.add_constant(df[df.columns[3:6]])).fit()stargazer = Stargazer([est, est2, est3, est4])HTML(stargazer.render_html()) #網頁上顯示表格#stargazer.render_html() #展示HTML原始代碼#或者如下代碼可以幫我們去存儲結果，然后在Excel里面去編輯完美一點：stargazer_tab = Stargazer([est, est2, est3, est4])stargazer_tabopen("regression.html", "w").write(stargazer_tab.render_html()) # for html

Dependent variable:target

(1)(2)(3)(4)

ABP416.674***397.583***670.157***

(69.495)(70.870)(71.217)

Age37.24124.704

(64.117)(65.411)

BMI787.179***789.742***921.169*

(65.424)(66.887)(64.409)

S1197.852113.167202.038

(143.812)(64.409)(158.070)

S2-169.251-23.728

(142.744)(156.064)

Sex-106.578-82.862

(62.125)(64.851)

const152.133***152.133***152.133***152.133***

(2.853)(2.853)(2.967)(3.277)

Observations442442442442

R20.4000.4030.3490.207

Adjusted R20.3950.3950.3460.201

Residual Std. Error59.976 (df=437)59.982 (df=435)62.367 (df=439)68.901 (df=438)

F Statistic72.913***(df=4; 437)48.915***(df=6; 435)117.417***(df=2; 439)38.032(df=3; 438)

Note:\n p<0.1;\n p<0.05;\n ***p<0.01\n

’

以上代碼在notebook中運行代碼，結果貼在CSDN中后有點小瑕疵，應該是CSDN編譯的問題，在notebook中沒有問題。

https://github.com/YangShuangjie/summary3

# Load the data and fitimport numpy as npimport pandas as pdimport statsmodels.api as smimport statsmodels.formula.api as smf# olsdat = sm.datasets.get_rdataset("Guerry", "HistData").datares_ols = smf.ols("Lottery ~ Literacy + np.log(Pop1831)", data=dat).fit()#glmdata = sm.datasets.scotland.load()data.exog = sm.add_constant(data.exog)gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())res_glm = gamma_model.fit()# geedata = sm.datasets.get_rdataset("epil", package="MASS").datafam = sm.families.Poisson()ind = sm.cov_struct.Exchangeable()mod = smf.gee("y ~ age + trt + base", "subject", data, cov_struct=ind, family=fam)res_gee = mod.fit()# logitspector_data = sm.datasets.spector.load()spector_data.exog = sm.add_constant(spector_data.exog)logit_mod = sm.Logit(spector_data.endog, spector_data.exog)res_logit = logit_mod.fit()# load panel data and fit the modelfrom linearmodels.datasets import wage_paneldata = wage_panel.load()year = pd.Categorical(data.year)data = data.set_index(["nr", "year"])data["year"] = yearfrom linearmodels.panel import PooledOLSexog_vars = ["black","hisp","exper","expersq","married", "educ", "union", "year"]exog = sm.add_constant(data[exog_vars])mod = PooledOLS(data.lwage, exog)res_pooled = mod.fit()from linearmodels.panel import PanelOLSexog_vars = ["expersq","union","married"]exog = sm.add_constant(data[exog_vars])mod = PanelOLS(data.lwage, exog, entity_effects=True, time_effects=True)res_fe_re = mod.fit()from linearmodels.panel import FirstDifferenceOLSexog_vars = ["exper","expersq", "union", "married"]exog = data[exog_vars]mod = FirstDifferenceOLS(data.lwage, exog)res_fd = mod.fit()exog_vars = ["black","hisp","exper","expersq","married", "educ", "union"]exog = sm.add_constant(data[exog_vars])mod = PooledOLS(data.lwage, exog)res_robust = mod.fit(cov_type="robust")res_clust_entity = mod.fit(cov_type="clustered", cluster_entity=True)res_clust_entity_time = mod.fit(cov_type="clustered", cluster_entity=True, cluster_time=True)#我看了一下summary3的源代碼，作者添加了 to_excel 和 to_csv 等函數?，F在不能用了from summary3 import summary_col # summary_col(res_ols)summary_col([res_ols]) print(summary_col([res_ols,res_glm,res_gee,res_logit],more_info=["df_model","scale"]))print(sumary_col([res_fe_re,res_fd,res_robust,res_clust_entity,res_clust_entity_time], regressor_order=["black"],show="se",title="Panel Results Summary Table"))summary_col([res_glm,res_logit]).to_excel()summary_col([res_clust_entity,res_fd]).to_csv("your path\\filename.csv")

# -*- coding: utf-8 -*-"""Created on Sun Dec 5 22:10:44 2021@author: Administrator"""import pandas as pdimport numpy as npfrom statsmodels.datasets import grunfelddata = grunfeld.load_pandas().datadata.year = data.year.astype(np.int64)# MultiIndex, entity - timedata = data.set_index(["firm","year"])from linearmodels import PanelOLSmod = PanelOLS(data.invest, data[["value","capital"]])fe_res_VS = mod.fit(cov_type="clustered", cluster_entity=True)pd.set_option("precision", 4)pd.options.display.float_format = "{:,.4f}".formatReg_Output_FAmount= pd.DataFrame()#1) Table1 = pd.DataFrame(fe_res_VS.params)Table1["id"] = np.arange(len(Table1))#create numerical index for pd.DataFrameTable1 = Table1.reset_index().set_index(keys = "id")#set numercial index as new indexTable1 = Table1.rename(columns={"index":"parameter", "parameter":"coefficient 1"})P1 = pd.DataFrame(fe_res_VS.pvalues)P1["id"] = np.arange(len(P1))#create numerical index for pd.DataFrameP1 = P1.reset_index().set_index(keys = "id")#set numercial index as new indexP1 = P1.rename(columns={"index":"parameter"})Table1 = pd.merge(Table1, P1, on="parameter")Table1["significance 1"] = np.where(Table1["pvalue"] <= 0.01, "***",\ np.where(Table1["pvalue"] <= 0.05, "**",\ np.where(Table1["pvalue"] <= 0.1, "*", "")))Table1.rename(columns={"pvalue": "pvalue 1"}, inplace=True) SE1 = pd.DataFrame(fe_res_VS.std_errors)SE1["id"] = np.arange(len(SE1))#create numerical index for pd.DataFrameSE1 = SE1.reset_index().set_index(keys = "id")#set numercial index as new indexSE1 = SE1.rename(columns={"index":"parameter", "std_error":"coefficient 1"})SE1["parameter"] = SE1["parameter"].astype(str) + "_SE"SE1["significance 1"] = ""SE1 = SE1.round(4)SE1["coefficient 1"] = "(" + SE1["coefficient 1"].astype(str) + ")"Table1 = Table1.append(SE1)Table1 = Table1.sort_values("parameter")Table1.replace(np.nan,"", inplace=True)del P1del SE1fe_res_CVS = mod.fit(cov_type="clustered", cluster_entity=True)#2) Table2 = pd.DataFrame(fe_res_CVS.params)Table2["id"] = np.arange(len(Table2))#create numerical index for pd.DataFrameTable2 = Table2.reset_index().set_index(keys = "id")#set numercial index as new indexTable2 = Table2.rename(columns={"index":"parameter", "parameter":"coefficient 2"})P2 = pd.DataFrame(fe_res_CVS.pvalues)P2["id"] = np.arange(len(P2))#create numerical index for pd.DataFrameP2 = P2.reset_index().set_index(keys = "id")#set numercial index as new indexP2 = P2.rename(columns={"index":"parameter"})Table2 = pd.merge(Table2, P2, on="parameter")Table2["significance 2"] = np.where(Table2["pvalue"] <= 0.01, "***",\ np.where(Table2["pvalue"] <= 0.05, "**",\ np.where(Table2["pvalue"] <= 0.1, "*", "")))Table2.rename(columns={"pvalue": "pvalue 2"}, inplace=True) SE2 = pd.DataFrame(fe_res_CVS.std_errors)SE2["id"] = np.arange(len(SE2))#create numerical index for pd.DataFrameSE2 = SE2.reset_index().set_index(keys = "id")#set numercial index as new indexSE2 = SE2.rename(columns={"index":"parameter", "std_error":"coefficient 2"})SE2["parameter"] = SE2["parameter"].astype(str) + "_SE"SE2["significance 2"] = ""SE2 = SE2.round(4)SE2["coefficient 2"] = "(" + SE2["coefficient 2"].astype(str) + ")"Table2 = Table2.append(SE2)Table2 = Table2.sort_values("parameter")Table2.replace(np.nan,"", inplace=True)del P2del SE2#Merging Tables and adding StatsReg_Output_FAmount= pd.merge(Table1, Table2, on="parameter", how="outer")Reg_Output_FAmount = Reg_Output_FAmount.append(pd.DataFrame(np.array([["observ.", fe_res_VS.nobs, "", fe_res_CVS.nobs, ""]]), columns=["parameter", "pvalue 1", "significance 1", "pvalue 2", "significance 2"]), ignore_index=True)Reg_Output_FAmount = Reg_Output_FAmount.append(pd.DataFrame(np.array([["Rsquared", "{:.4f}".format(fe_res_VS.rsquared), "", "{:.4f}".format(fe_res_CVS.rsquared), ""]]), columns=["parameter", "pvalue 1", "significance 1", "pvalue 2", "significance 2"]), ignore_index=True)Reg_Output_FAmount= Reg_Output_FAmount.append(pd.DataFrame(np.array([["Model type", fe_res_VS.name, "", fe_res_CVS.name, ""]]), columns=["parameter", "pvalue 1", "significance 1", "pvalue 2", "significance 2"]), ignore_index=True)Reg_Output_FAmount = Reg_Output_FAmount.append(pd.DataFrame(np.array([["DV", fe_res_VS.model.dependent.vars[0], "", fe_res_CVS.model.dependent.vars[0], ""]]), columns=["parameter", "pvalue 1", "significance 1", "pvalue 2", "significance 2"]), ignore_index=True)Reg_Output_FAmount.fillna("", inplace=True)Reg_Output_FAmount.to_html()open("d://ss5.html", "w").write(Reg_Output_FAmount.to_html()) # for html

11. 特殊結構數據整理

https://www.cnpython.com/qa/560110

import numpy as npfrom scipy.stats import norm, gaussian_kdeimport matplotlib.pyplot as pltimport pandas as pdfrom linearmodels.panel.data import PanelDatafrom linearmodels.panel import PanelOLS, PooledOLS, RandomEffects, comparefrom collections import OrderedDictimport wooldridgefrom statsmodels.formula.api import olsimport warningswarnings.filterwarnings("ignore")wagepan = wooldridge.data("wagepan")wooldridge.data("wagepan", description=True)np.random.seed(123)df = pd.DataFrame(np.random.randint(1,10,(4,12)), index=["ID 1", "ID 2", "ID 3", "ID 4"])df.columns = ["Q1", "Q2", "Q3", "Q4"]*3df.columns = [ df.columns.to_series().groupby(level=0).cumcount().map({0: "X", 1: "Y",2:"Z"}), df.columns]df2=df.stack().rename_axis(["ID", "T"]).reset_index()

https://stackoverflow.com/questions/32835498/pandas-python-describe-formatting-output

這是典型的縱向結構（類似于面板數據）的數據轉換為橫向排列的數據。

#這個是將描述性統計與相關系數表輸出，很簡單。df=df_finaltemp = df.groupby("firm")["fin0"].describe().reset_index()aaa=temp.stack(level=-1) #unstack(),unstack(level=-1) level can be -1, 0temp = df.corr().reset_index()# 模擬數據d = {"id": [1,1,1,2,2], "Month":[1,2,3,1,3],"Value":[12,23,15,45,34], "Cost":[124,214,1234,1324,234]}df = pd.DataFrame(d)df2 = pd.pivot_table(df, values=["Value","Cost"], index=["id"], columns=["Month"], aggfunc=np.sum, fill_value=0)df2.columns =[s1 + str(s2) for (s1,s2) in df2.columns.tolist()]df2.reset_index(inplace=True)df2.columns = [" ".join(col).strip() for col in df2.columns.values]df2.index.name = None

12. 估計系數聯合檢驗（Wald tests via statsmodels）

系數聯合相等，聯合等于0，任意兩個系數之差檢驗，等等，諸如此類，都可以執行 wald 檢驗。

https://andrewpwheeler.com/2021/06/18/wald-tests-via-statsmodels-python/ https://github.com/apwheele/ 作者主頁 Often times interventions are general and may be expected to reduce multiple crime types, e.g. hot spots policing may reduce both violent crimes and property crimes. But we do not know for sure – so it makes sense to fit models to check if that is the case.

For crimes that are more/less prevalent, this is a case in which fitting Poisson/Negative Binomial models makes alot of sense, since the treatment effect is in terms of rate modifiers. The crossvalidated post shows an example in R. In the past I have shown how to stack models and do these tests in Stata, or use seemingly unrelated regression in Stata for generalized linear models. Here I will show an example in python using data from my dissertation on stacking models and doing Wald tests.

The above link to github has the CSV file and metadata to follow along. Here I just do some upfront data prep. The data are crime counts at intersections/street segments in DC, across several different crime types and various aspects of the built environment.

# python code to stack models and estimate wald testsimport pandas as pdimport numpy as npimport statsmodels.formula.api as smfimport patsyimport itertools# Use dissertation data for multiple crimes#https://github.com/apwheele/ResearchDesign/tree/master/Week02_PresentingResearchdata = pd.read_csv(r"d://DC_Crime_MicroPlaces.csv", index_col="MarID")# only keep a few independent variables to make it simplercrime = ["OffN1","OffN3","OffN5","OffN7","OffN8","OffN9"] #dropping very low crime countsx = ["CFS1","CFS2"] #311 calls for servicedata = data[crime + x].copy()data.reset_index(inplace=True)# Stack the data into long format, so each crime is a new rowdata_long = pd.wide_to_long(data, "OffN",i="MarID",j="OffCat").reset_index()data_long.sort_values(by=["MarID","OffN"],inplace=True)# Fit a model with clustered standard errorscovp = {"groups": data_long["MarID"],"df_correction":True}nb_mod = smf.negativebinomial("OffN ~ C(OffCat) + CFS1:C(OffCat) ",data_long).fit(cov_type="cluster",cov_kwds=covp)print(nb_mod.summary())# Conduct a Wald test for equality of multiple coefficientsx_vars = nb_mod.summary2().tables[1].indexwald_str = " = ".join(list(x_vars[6:-1]))print(wald_str)wald_test = nb_mod.wald_test(wald_str) # joint testprint(wald_test)# Or can do test all join equal 0nb_mod.wald_test_terms()# To replicate what wald_test_terms is doing yourselfall_zero = [x + "= 0" for x in x_vars[6:-1]]nb_mod.wald_test(",".join(all_zero))# Pairwise contrasts of coefficients# To get the actual difference in coefficientswald_li = []for a,b in itertools.combinations(x_vars[6:-1],2): wald_li.append(a + " - " + b + " = 0")wald_dif = " , ".join(wald_li)dif = nb_mod.t_test(wald_dif) print(dif)# c"s correspond to the wald_li listres_contrast = dif.summary_frame()res_contrast["Test"] = wald_lires_contrast.set_index("Test", inplace=True)print(res_contrast)# Nicer function to print out the actual tests it interprets as# ends up being 1 - 3, 3 - 5, etc.def nice_lab_tests(test_str,mod): # Getting exogenous variables x_vars = mod.summary2().tables[1].index # Patsy getting design matrix and constraint from string di = patsy.DesignInfo(x_vars) const_mat = di.linear_constraint(test_str) r_mat = const_mat.coefs c_mat = list(const_mat.constants) # Loop over the tests, get non-zero indices # Build the interpreted tests lab = [] for i,e in enumerate(c_mat): lm = r_mat[i,:] #single row of R matrix nz = np.nonzero(lm)[0].tolist() #only need non-zero c_vals = lm[nz].tolist() v_labs = x_vars[nz].tolist() fin_str = "" in_val = 0 for c,v in zip(c_vals,v_labs): # 1 and -1 drop values and only use +/- if c == 1: if in_val == 0: fin_str += v else: fin_str += " + " + v elif c == -1: if in_val == 0: fin_str += "-" + v else: fin_str += " - " + v else: if in_val == 0: fin_str += str(c) + "*" + v else: if c > 0: sg = " + " else: sg = " - " fin_str += sg + str(np.abs(c)) + "*" + v in_val += 1 fin_str += " = " + str(e[0]) #set equality at end lab.append(fin_str) return lab #任意組合的變量組，聯合系數檢驗。# Wald string for equality across coefficients# from earlierlab_tests = nice_lab_tests(wald_str,nb_mod)print(lab_tests)dif4 = nb_mod.t_test(lab_tests) print(dif4)# Additional test to show how nice_lab_tests function worksstr2 = "CFS2:C(OffCat)[1] = 3, CFS2:C(OffCat)[7] = CFS2:C(OffCat)[5]"nice_lab_tests(str2,nb_mod)dif3 = nb_mod.t_test(nice_lab_tests(str2,nb_mod)) print(dif3)

13. Python 與 STATA 時間日期轉換問題

參考文獻：STATA時間格式

Python的時間常常是下面的樣子： ±--------------------------+ | datetime year | |---------------------------| 1. | 09sep1943 00:00:00 1943 | ±--------------------------+ 將其輸出到stata之中后，怎么樣獲取年月日等數據呢？到了stata之后，格式顯示為 %tc double。首先是顯示為年月日的形式： gen eventdate=dofc(datetime) #dofc是將函數格式轉化為日期 format eventdate=%td

gen year = yofd(dofc(datetime)) 顯示年月等等 gen day = dofy(dofc(datetime)) gen year = yofd(dofc(datetime)) //由天數數據計算年份

gen halfyear=hofd(dofc(datetime)) //由天數數據計算半年 format %thCCYY!hh halfyear

gen quarter=qofd(dofc(datetime)) //由天數數據計算季度數 format %tqCCYY!qq quarter

gen month=mofd(dofc(datetime)) //由天數數據計算月度數 format %tmCCYY-NN month or. format month %tmCY_N

gen week=wofd(dofc(datetime)) //由天數數據計算周數 format %twCCYY!www week

14. Python執行面板數據的Hausman檢驗

參考文獻：https://py4etrics.github.io/17_Panel.html 日本學者教學網站

#首先使用 PanelOLS 模塊來進行個體固定效應、時間固定效應估計 formula_fe = "lwage ~ married + union + expersq \ +d81+d82+d83+d84+d85+d86+d87 + EntityEffects"mod_fe = PanelOLS.from_formula(formula_fe, data=wagepan)est1 = mod_fe.fit()print(est1)#同學們，這個wald檢驗有很多種，我們現在需要的是所有系數均=0的聯合檢驗restriction = "married = union = expersq = d81=d82=d83=d84=d85=d86=d87=0"est1.wald_test(formula=restriction) #下面，我們將要進行hausman檢驗，進而確定是采用隨機效應還是固定效應模型，# 注意，這個檢驗的原假設是支持采用隨機效應模型。# 第一步應用固定效應模型進行擬合formula_fe = "lwage ~ married + union + expersq \ +d81+d82+d83+d84+d85+d86+d87 + EntityEffects"mod_fe = PanelOLS.from_formula(formula_fe, data=wagepan)fe_res = mod_fe.fit()print(fe_res)# 第二步應用隨機效應模型進行擬合formula_re = "lwage ~ married + union + expersq \ +d81+d82+d83+d84+d85+d86+d87 "model_re = RandomEffects.from_formula(formula_re, data=wagepan) re_res = model_re.fit() print(re_res)# 第三步，運用計量書本上的公式，自編函數，因為目前 Python里面沒有現成的模塊import numpy.linalg as lafrom scipy import statsimport numpy as npdef hausman(fe, re): b = fe.params B = re.params v_b = fe.cov v_B = re.cov df = b[np.abs(b) < 1e8].size chi2 = np.dot((b - B).T, la.inv(v_b - v_B).dot(b - B)) pval = stats.chi2.sf(chi2, df) return chi2, df, pval# 第四步，將 1 2 步的結果帶入自編函數之中。hausman_results = hausman(re_res, fe_res) print("chi-Squared: " + str(hausman_results[0]))print("degrees of freedom: " + str(hausman_results[1]))print("p-Value: " + str(hausman_results[2]))# 第五步，看結果，如果這兒的 p-Value 小于 10%，則支持固定效應模型。# 不過，我們的作業，只是一個練習，不管這個 p-Value 等于多少，我們只提供固定效應結果。# 計量的路很長，理論與軟件操作缺一不可。

15. Fama Macbeth回歸、滾動回歸等等

https://fin-library.readthedocs.io/en/latest/BYU FIN 585 Library Documentation 這個上面提供了多個有用的包。https://pypi.org/project/finance-byu/#files 還有一些網頁也介紹了這個算法： https://www.cnblogs.com/jungsee/p/8448156.html我原來的主頁 https://www.kevinsheppard.com/teaching/python/notes/notebooks/example-fama-macbeth/數據也在這個教授的主頁上 https://randlow.github.io/posts/finance-economics/asset-pricing-regression/

相關討論的網頁： https://stackoverflow.com/questions/24074481/fama-macbeth-regression-in-python-pandas-or-statsmodels

https://www.e-learn.cn/topic/1620028

from finance_byu.fama_macbeth import fama_macbeth, fama_macbeth_parallel, fm_summary, fama_macbeth_numbaimport pandas as pdimport timeimport numpy as npn_jobs = 5n_firms = 1.0e2n_periods = 1.0e2def firm(fid): f = np.random.random((int(n_periods),4)) f = pd.DataFrame(f) f["period"] = f.index f["firmid"] = fid return fdf = [firm(i) for i in range(int(n_firms))]df = pd.concat(df).rename(columns={0:"ret",1:"exmkt",2:"smb",3:"hml"})df.head()result = fama_macbeth(df,"period","ret",["exmkt","smb","hml"],intercept=True)result.head()fm_summary(result)%timeit fama_macbeth(df,"period","ret",["exmkt","smb","hml"],intercept=True)%timeit fama_macbeth_parallel(df,"period","ret",["exmkt","smb","hml"],intercept=True) %timeit fama_macbeth_numba(df,"period","ret",["exmkt","smb","hml"],intercept=True)from finance_byu.statistics import GRSimport numpy as npimport pandas as pdn_periods = 1.0e2df = pd.DataFrame(np.random.random((int(n_periods),6)))df = df.rename(columns={0:"port1",1:"port2",2:"port3",3:"exmkt",4:"smb",5:"hml"})grsstat,pval,tbl = GRS(df,["port1","port2","port3"],["exmkt","smb","hml"])grsstatpvalfrom tabulate import tabulateprint(tabulate(tbl.render(),tablefmt="github",headers=tbl.render().columns))from finance_byu.summarize import summaryn_periods = 1.0e2df = pd.DataFrame(np.random.random((100,3))/10.,columns=["Port 1","Port 2","Port 3"])summary(df)

# -*- coding: utf-8 -*-from numpy import mat, cov, mean, hstack, multiply,sqrt,diag, \ squeeze, ones, array, vstack, kron, zeros, eye, savez_compressedfrom numpy.linalg import invfrom scipy.stats import chi2from pandas import read_csvimport statsmodels.api as smimport osos.chdir("d://")data =read_csv("FamaFrench.csv")# Split using both named colums and ix for larger blocksdates = data["date"].valuesfactors = data[["VWMe", "SMB", "HML"]].valuesriskfree = data["RF"].valuesportfolios = data.iloc[:, 5:].values# Use mat for easier linear algebrafactors = mat(factors)riskfree = mat(riskfree)portfolios = mat(portfolios)# Shape informationT,K = factors.shapeT,N = portfolios.shape# Reshape rf and compute excess returnsriskfree.shape = T,1excessReturns = portfolios - riskfree# Time series regressionsX = sm.add_constant(factors)ts_res = sm.OLS(excessReturns, X).fit()alpha = ts_res.params[0]beta = ts_res.params[1:]avgExcessReturns = mean(excessReturns, 0)# Cross-section regressioncs_res = sm.OLS(avgExcessReturns.T, beta.T).fit()riskPremia = cs_res.params# Moment conditionsX = sm.add_constant(factors)p = vstack((alpha, beta))epsilon = excessReturns - X @ pmoments1 = kron(epsilon, ones((1, K + 1)))moments1 = multiply(moments1, kron(ones((1, N)), X))u = excessReturns - riskPremia[None,:] @ betamoments2 = u * beta.T# Score covarianceS = mat(cov(hstack((moments1, moments2)).T))# JacobianG = mat(zeros((N * K + N + K, N * K + N + K)))SigmaX = (X.T @ X) / TG[:N * K + N, :N * K + N] = kron(eye(N), SigmaX)G[N * K + N:, N * K + N:] = -beta @ beta.Tfor i in range(N): temp = zeros((K, K + 1)) values = mean(u[:, i]) - multiply(beta[:, i], riskPremia) temp[:, 1:] = diag(values) G[N * K + N:, i * (K + 1):(i + 1) * (K + 1)] = tempvcv = inv(G.T) * S * inv(G) / TvcvAlpha = vcv[0:N * K + N:4, 0:N * K + N:4]J = alpha @ inv(vcvAlpha) @ alpha.TJ = J[0, 0]Jpval = 1 - chi2(25).cdf(J)vcvRiskPremia = vcv[N * K + N:, N * K + N:]annualizedRP = 12 * riskPremiaarp = list(squeeze(annualizedRP))arpSE = list(sqrt(12 * diag(vcvRiskPremia)))print(" Annualized Risk Premia")print(" Market SMB HML")print("--------------------------------------")print("Premia {0:0.4f} {1:0.4f} {2:0.4f}".format(arp[0], arp[1], arp[2]))print("Std. Err. {0:0.4f} {1:0.4f} {2:0.4f}".format(arpSE[0], arpSE[1], arpSE[2]))print("\n\n")print("J-test: {:0.4f}".format(J))print("P-value: {:0.4f}".format(Jpval))i = 0betaSE = []for j in range(5): for k in range(5): a = alpha[i] b = beta[:, i] variances = diag(vcv[(K + 1) * i:(K + 1) * (i + 1), (K + 1) * i:(K + 1) * (i + 1)]) betaSE.append(sqrt(variances)) s = sqrt(variances) c = hstack((a, b)) t = c / s print("Size: {:}, Value:{:} Alpha Beta(VWM) Beta(SMB) Beta(HML)".format(j + 1, k + 1)) print("Coefficients: {:>10,.4f} {:>10,.4f} {:>10,.4f} {:>10,.4f}".format(a, b[0], b[1], b[2])) print("Std Err. {:>10,.4f} {:>10,.4f} {:>10,.4f} {:>10,.4f}".format(s[0], s[1], s[2], s[3])) print("T-stat {:>10,.4f} {:>10,.4f} {:>10,.4f} {:>10,.4f}".format(t[0], t[1], t[2], t[3])) print("") i += 1 betaSE = array(betaSE)savez_compressed("fama-macbeth-results", alpha=alpha, beta=beta, betaSE=betaSE, arpSE=arpSE, arp=arp, J=J, Jpval=Jpval)from numpy import savezsavez("fama-macBeth-results.npz", arp=arp, beta=beta, arpSE=arpSE, betaSE=betaSE, J=J, Jpval=Jpval)

16. 事件研究法 event study

https://www.lianxh.cn/news/90de95e42e8ff.html http://fmwww.bc.edu/repec/bocode/e/ https://zhuanlan.zhihu.com/p/348125854 前期準備：使用eventstudy2命令需要先安裝用戶編寫的程序moremata、nearmrg、distinct、_gprod、rmse和parallel。

ssc install morematassc install nearmrgssc install distinctssc install _gprodssc install rmsessc install parallel

注：moremata可能存在安裝不成功的問題，這時可以選擇手動安裝。下載moremata.zip并解壓縮到指定文件夾路徑（例如E:\moremata），運行如下命令

net install moremata, from(E:\moremata)

eventstudy2 執行事件研究程序，并允許用戶指定設定已在金融領域和相關文獻中建立的幾個模型，例如原始收益率、市場模型、多因子模型和買入-持有異常收益率（BHAR）。估計窗口和事件窗口可以被自由選擇，可以計算多達10個窗口的累積（平均）異常（買入-持有）收益率。當前eventstudy2的輸出結果提供了下列檢驗統計量，它們均使用Mata編程語言：

eventstudy2 Security_id Date using Security_returns if Ea > 0.05, /// ret(Return) car1LB(-1) car1UB(1) mod(FM) marketfile(Factor_returns) /// mar(MKT) idmar(Market_reference) #市場模型，需要三個數據集支持這個算法模塊

import pandas as pdimport tushare as tstoken = "第一步準備階段的接口復制到此處"pro = ts.pro_api("2f4f20d8b2fb57e033a820d04954b6f8a33f5a44c18b806189a802adsee")import matplotlib.pyplot as pltplt.style.use("fivethirtyeight")#畫圖風格#獲取深交所指數列表，找創業板指數的代碼dfzs = pro.index_basic(market="szse")#獲取深證綜合指數基本信息df = pro.index_daily(ts_code="399107.SZ", start_date="20130101", end_date="20 211225")#h獲取深證綜合指數的交易日和日漲幅列df_index = df[["trade_date", "pct_chg"]]#df_index1 = df[["trade_date", "pct_chg"]]#調整日期格式并按照日期進行排序df_index["trade_date"] = pd.to_datetime(df_index["trade_date"])df_index = df_index.sort_values("trade_date")#reset_index()重置索引，0，1，2，3#刪除index列df_index = df_index.reset_index().drop("index", axis=1)#將漲跌幅轉化為百分數對應的小數df_index["pct_chg"] = df_index["pct_chg"] / 100d2= pro.daily(ts_code="000333.SZ", start_date="20110121", end_date="20211225")d0 = d2[["trade_date", "pct_chg"]]d0.columns = ["trade_date", "return"]#df_index1 = df[["trade_date", "return"]]#調整日期格式為pandas格式，并按照日期進行排序d0["trade_date"] = pd.to_datetime(d0["trade_date"])d0 = d0.sort_values("trade_date")#reset_index()重置索引，0，1，2，3#刪除index列d0 = d0.reset_index().drop("index", axis=1)#將漲跌幅轉化為百分數對應的小數d0["return"] = d0["return"] / 100#獲取美的集團2013-01-01至2020-02-20的日漲幅數據d1= d0[(d0["trade_date"] >= "2013-01-01") & (d0["trade_date"] <= 3="" 5="" 100="" 123="" 600="" nandf_final="d1.merge(df_index," on="trade_date" how="left" events="[" from="" sklearn.linear_model="" import="" python="" def="" :="" linear_m="LinearRegression().fit(X," r_2="linear_m.score(X," y="estimate_df["return"]" rm="" return="" u="((y_true" -="" v="((y_true" coef_="" intercept_="" h="df_final[df_final["trade_date"]" target="df_final.loc[q:h].copy()" estimate_df="df_final.loc[q-160:q]" x="estimate_df[["pct_chg"]]" predict_x="target[["pct_chg"]]" a="df0["code"]返回的是series類型"""a" for="" e="" in="" events:="" bug:="" os="" pandas="" as="" pddf="pd.read_excel("eventdates.xlsx").drop_duplicates(["code"],keep="last")df0" axis="1)" dfcode="df0[["code"]]lcode" i="" lcode:="" ldf="df0[df0["code"]==i]#" key="0#" event="list(df0["date"])#" del="" tushare="" tstoken="2f4f20d8b2fb57e033a820d04954b6f8a33f5a44c18b806189a802adsee" pro="ts.pro_api(token)"""全指醫藥指數的日期及對應的日收益率"""dfzs" market="SSE" dfyy="pro.index_daily(ts_code="000991.SH"," start_date="20190101" end_date="20210630" df_index="df_index.sort_values(" sort_value="" d2="pro.daily(ts_code=cd," d0="d0.reset_index().drop(" d0.columns="["trade_date"," df_index1="df[["trade_date"," d1="d0[(d0["trade_date"]">= "2014-01-01") & (d0["trade_date"] <= "2021-06-30")].copy() #股票漲跌幅和市場漲跌幅 #以左側或d1為基礎，交易日為關鍵值進行合并，df_index沒有的值填充nan df_final0 = d1.merge(df_index, on="trade_date", how="left") return df_final0from sklearn.linear_model import LinearRegression#機器學習中的線性回歸"""函數作用：計算預期報酬率，CAPM資本市場定價模型，線性回歸x,y為預測期的指數日回報率和個股日回報率，線性回歸出相應的函數pre_X是窗口期的指數日回報率最終帶入回歸函數算出預測指數回報率"""def get_OLS(X, y, pre_X): linear_m = LinearRegression().fit(X, y)#基于數組x和y建立回歸模型 # r_2 = linear_m.score(X, y)#值越接近1越好，擬合優度越好。 # print(f"構建模型，擬合優度為{round(r_2*100, 2)}%") # print(f"y = {round(linear_m.intercept_,3)} + {round(linear_m.coef_[0],3)}Rm + e") return linear_m.predict(pre_X)"""異常收益率的計算"""def get_data(event): # print("事件日為: ", event) q, h = df_final[df_final["trade_date"] == event].index[0] -3, df_final[df_final["trade_date"] == event].index[0] +3 target = df_final.loc[q:h].copy() #窗口期 estimate_df = df_final.loc[q-110:q-11] #估計期 X = estimate_df[["pct_chg"]] #估計期深證綜合指數漲幅 y = estimate_df["return"] #估計期個股的漲幅 predict_X = target[["pct_chg"]] #窗口期的全職醫藥指數漲幅 target["E(Rt)"] = get_OLS(X, y, predict_X) #預期收益率的計算 target["ARt"] = target["return"] - target["E(Rt)"] #異常收益率 = 實際收益率 -預期收益率 target["CARt"] = target["ARt"].cumsum() #累計異常收益率 = 異常收益率在窗口期的求和 target = target.reset_index().drop("index",axis=1) return target# df_final = getdf_final("002950.SZ")# car = get_data(event[0])# m = car.loc[11:20]["CARt"]car = pd.DataFrame()ar_car=pd.DataFrame()for i in range(len(lcode)): df_final = getdf_final(lcode[i]) car0 = get_data(event[i]) car0["firm"]=lcode[i] car0["event_date"]=event[i] car0["interval_days"]=(car0["trade_date"]-car0["event_date"]) car0["interval"]=pd.Series(range(-3,4)) ar_car=ar_car.append(car0) car[lcode[i]] = car0.loc[0:6]["CARt"] plt.rcParams["savefig.dpi"] = 600 #圖片像素 plt.rcParams["figure.dpi"] = 600 #分辨率 car0.set_index("interval")[["ARt", "CARt"]].plot()

================== 持續更新中 ==================

生

項目項目項目項目1項目2項目3計劃任務完成任務

創

如

注

您可以使用渲染LaTeX數學表達式 KaTeX:

Gamma公式展示 Γ ( n ) = ( n ? 1 ) ! ? n ∈ N \Gamma(n) = (n-1)!\quad\forall n\in\mathbb N Γ(n)=(n?1)!?n∈N 是通過歐拉積分

Γ ( z ) = ∫ 0 ∞ t z ? 1 e ? t d t ? . \Gamma(z) = \int_0^\infty t^{z-1}e^{-t}dt\,. Γ(z)=∫0∞tz?1e?tdt.

新

導

導出

如果你想嘗試使用此編輯器, 你可以在此篇文章任意編輯。當你完成了一篇文章的寫作, 在上方工具欄找到文章導出，生成一個.md文件或者.html文件進行本地保存。

導入

如果你想加載一篇你寫過的.md文件，在上方工具欄可以選擇導入功能進行對應擴展名的文件導入，繼續你的創作。

責任編輯：

標簽：

上一篇：當前頭條：筆記本電腦當無線路由器怎么設置？設置流程
下一篇：最后一頁