必威体育Betway必威体育官网
当前位置:首页 > IT技术

【转】利用聚宽探索多因子策略

时间:2019-11-04 20:12:11来源:IT技术作者:seo实验室小编阅读:53次「手机版」
 

neutralize

【转】多因子策略探索(1)

为什么要用聚宽

以市场上知名的TB、文化等期货研究平台为代表,均采用了数据驱动的回测方式,不能按照复利回测(浮赢加仓),也不能方便的在单次回测中操作多个标的的仓位,更不用说一揽子标的的组合策略了。而聚宽平台的架构可以在定义的任意时点,操作市场上任意标的的仓位,非常适合研究/实盘一揽子标的的策略,笔者在这里就用聚宽框架展示几个横截面因子的回测。

期开始研究多因子策略,看了一些研报,初步代码实现了一些经常用到的模块。

本文重在代码实现,底层选股逻辑可以在代码框架基础上灵活变动,本文最后的选股策略借鉴广发证券多年前的研报,逻辑较为简单,且策略已经失效。但最后的结果对于理解多因子分层及策略探索有较大帮助,所以代码和大家分享。

本文前面大部分代码之前已经发过,介绍不再重复,后面新增了两个模块,分别是:选股准备和选股方法尝试。

选股准备模块代码主要实现因子分层、因子权重计算、个股打分。

分层:基于因子值排序分层,分层数可调整;

权重计算:每个因子最高收益组减去最低收益组的差值表征因子有效性,单个因子差值除以各个因子差值的和为权重;

个股打分:对因子分层,收益最高的一层打分为1,最低的为-1,其余的为0,打分值是基于过去一个时间段内计算的均值;

选股方法:

基于因子分层,依次找到分层收益最好的因子,取该因子最佳收益分层的股票,多个因子取多组股票,取交集,股票数不小于10支;

2.基于因子权重和个股打分对股票排序,选择打分最高的股票组合。

用到的数据有:原始数据,中性化后的数据。

因子选择:因子选择的评判标准通常有信息系数IC,包含稳定性的IR(IC除以IC的标准差)、回归系数。本文添加了机器学习的特征选择方法,用于进行因子有效性评判,在FeatureSelection类中identify_collinear方法可以先去除相关性高的特征,identify_importance_lgbm方法使用lightgbm算法进行特征选择,此方法本身能有效避免因子间的影响,embedded_select方法可以使用岭回归或者lasso回归,去除因子间共线性。以上方法都可以作为新的因子选择的标准。

第一模块 数据准备

import pandas as pd

import numpy as np

import time

import datetime

import statsmodels.api as sm

import pickle

import warnings

from jqdata import *

warnings.filterwarnings(‘ignore’)

start_date = ‘2013-01-01’

end_date = ‘2014-01-01’

all_trade_days = (get_trade_days(start_date=start_date,end_date=end_date)).tolist() #所有交易日

trade_days = all_trade_days[::20] #每隔20天取一次数据,基本面数据更新频率较慢,数据获取频率尽量与之对应

securities = get_all_securities()

start_data_dt = datetime.datetime.strptime(start_date,’%Y-%m-%d’).date()

securities_after_start_date = securities[(securities[‘start_date’]<start_data_dt)] #选择起始时间之前上市的股票

all_stocks = list(securities_after_start_date.index)

INDUSTRY_NAME = ‘sw_l1’

ttm_factors = []

‘’’

基本面因子映射

‘’’

fac_dict = {

‘MC’:valuation.market_cap, # 总市值

‘GP’:indicator.gross_profit_margin * income.operating_revenue, # 毛利润

‘OP’:income.operating_profit,

‘OR’:income.operating_revenue, # 营业收入

‘NP’:income.net_profit, # 净利润

‘EV’:valuation.market_cap + balance.shortterm_loan+balance.non_current_liability_in_one_year+balance.longterm_loan+balance.bonds_payable+balance.longterm_account_payable - cash_flow.cash_and_equivalents_at_end,

'TOE':balance.total_owner_equities, # 股东权益合计(元)
'TOR':income.total_operating_revenue, # 营业总收入
'EBIT':income.net_profit+income.financial_expense+income.income_tax_expense,

'TOC':income.total_operating_cost,#营业总成本
'NOCF/MC':cash_flow.net_operate_cash_flow / valuation.market_cap, #经营活动产生的现金流量净额/总市值
'OTR':indicator.ocf_to_revenue, #经营活动产生的现金流量净额/营业收入(%) 


'GPOA':indicator.gross_profit_margin * income.operating_revenue / balance.total_assets,  #毛利润 / 总资产 = 毛利率*营业收入 / 总资产
'GPM':indicator.gross_profit_margin, # 毛利率
'OPM':income.operating_profit / income.operating_revenue, #营业利润率
'NPM':indicator.net_profit_margin, # 净利率
'ROA':indicator.roa, # ROA
'ROE':indicator.roe, # ROE
'INC':indicator.inc_return, # 净资产收益率(扣除非经常损益)(%)
'EPS':indicator.eps, # 净资产收益率(扣除非经常损益)(%)
'AP':indicator.adjusted_profit, # 扣除非经常损益后的净利润(元)
'OP':indicator.operating_profit, # 经营活动净收益(元)
'VCP':indicator.value_change_profit, # 价值变动净收益(元) = 公允价值变动净收益+投资净收益+汇兑净收益

'ETTR':indicator.expense_to_total_revenue, # 营业总成本/营业总收入(%)
'OPTTR':indicator.operation_profit_to_total_revenue, # 营业利润/营业总收入(%)
'NPTTR':indicator.net_profit_to_total_revenue, # 净利润/营业总收入(%)
'OETTR':indicator.operating_expense_to_total_revenue, # 营业费用/营业总收入
'GETTR':indicator.ga_expense_to_total_revenue, # 管理费用/营业总收入(%)
'FETTR':indicator.financing_expense_to_total_revenue, # 财务费用/营业总收入(%)	

'OPTP':indicator.operating_profit_to_profit, # 经营活动净收益/利润总额(%)
'IPTP':indicator.invesment_profit_to_profit, # 价值变动净收益/利润总额(%)
'GSASTR':indicator.goods_sale_and_service_to_revenue, # 销售商品提供劳务收到的现金/营业收入(%)
'OTR':indicator.ocf_to_revenue, # 经营活动产生的现金流量净额/营业收入(%)
'OTOP':indicator.ocf_to_operating_profit, # 经营活动产生的现金流量净额/经营活动净收益(%)

'ITRYOY':indicator.inc_total_revenue_year_on_year, # 营业总收入同比增长率(%)
'ITRA':indicator.inc_total_revenue_annual, # 营业总收入环比增长率(%)
'IRYOY':indicator.inc_revenue_year_on_year, # 营业收入同比增长率(%)
'IRA':indicator.inc_revenue_annual, # 营业收入环比增长率(%)
'IOPYOY':indicator.inc_operation_profit_year_on_year, # 营业利润同比增长率(%)
'IOPA':indicator.inc_operation_profit_annual, # 营业利润环比增长率(%)
'INPYOY':indicator.inc_net_profit_year_on_year, # 净利润同比增长率(%)
'INPA':indicator.inc_net_profit_annual, # 净利润环比增长率(%)
'INPTSYOY':indicator.inc_net_profit_to_shareholders_year_on_year, # 归属母公司股东的净利润同比增长率(%)
'INPTSA':indicator.inc_net_profit_to_shareholders_annual, # 归属母公司股东的净利润环比增长率(%)
'INPTSA':indicator.inc_net_profit_to_shareholders_annual, # 归属母公司股东的净利润环比增长率(%)


'ROIC':(income.net_profit+income.financial_expense+income.income_tax_expense)/(balance.total_owner_equities+balance.shortterm_loan+balance.non_current_liability_in_one_year+balance.longterm_loan+balance.bonds_payable+balance.longterm_account_payable),
'OPTT':income.operating_profit / income.total_profit, # 营业利润占比
'TP/TOR':income.total_profit / income.total_operating_revenue, #利润总额/营业总收入
'OP/TOR':income.operating_profit / income.total_operating_revenue,
'NP/TOR':income.net_profit / income.total_operating_revenue,

'NP':income.net_profit, # 净利润

'TA':balance.total_assets, # 总资产

'DER':balance.total_liability / balance.equities_parent_company_owners, # 产权比率 = 负债合计/归属母公司所有者权益合计
'FCFF/TNCL':(cash_flow.net_operate_cash_flow - cash_flow.net_invest_cash_flow) / balance.total_non_current_liability, #自由现金流比非流动负债
'NOCF/TL': cash_flow.net_operate_cash_flow / balance.total_liability, # 经营活动产生的现金流量净额/负债合计
'TCA/TCL':balance.total_current_assets / balance.total_current_liability, # 流动比率

'PE':valuation.pe_ratio, # PE 市盈率
'PB':valuation.pb_ratio, # PB 市净率
'PR':valuation.pcf_ratio, # PR 市现率
'PS':valuation.ps_ratio, # PS 市销率

'TOR/TA':income.total_operating_revenue / balance.total_assets, #总资产周转率
'TOR/FA':income.total_operating_revenue / balance.fixed_assets, #固定资产周转率
'TOR/TCA':income.total_operating_revenue / balance.total_current_assets, #流动资产周转率
'LTL/OC':balance.longterm_loan / income.operating_cost, #长期借款/营业成本

'TL/TA':balance.total_liability / balance.total_assets, #总资产/总负债
'TL/TOE':balance.total_liability / balance.total_owner_equities,#负债权益比

}

adjust_factors = {

‘TOR/TA’:income.total_operating_revenue / balance.total_assets, #总资产周转率

‘TOR/FA’:income.total_operating_revenue / balance.fixed_assets, #固定资产周转率

‘TOR/TCA’:income.total_operating_revenue / balance.total_current_assets, #流动资产周转率

‘LTL/OC’:balance.longterm_loan / income.operating_cost, #长期借款/营业成本

'TL/TA':balance.total_liability / balance.total_assets, #总资产/总负债
'TL/TOE':balance.total_liability / balance.total_owner_equities,#负债权益比

'DER':balance.total_liability / balance.equities_parent_company_owners, # 产权比率 = 负债合计/归属母公司所有者权益合计
'FCFF/TNCL':(cash_flow.net_operate_cash_flow - cash_flow.net_invest_cash_flow) / balance.total_non_current_liability, #自由现金流比非流动负债
'NOCF/TL': cash_flow.net_operate_cash_flow / balance.total_liability, # 经营活动产生的现金流量净额/负债合计
'TCA/TCL':balance.total_current_assets / balance.total_current_liability, # 流动比率

'ROIC':(income.net_profit+income.financial_expense+income.income_tax_expense)/(balance.total_owner_equities+balance.shortterm_loan+balance.non_current_liability_in_one_year+balance.longterm_loan+balance.bonds_payable+balance.longterm_account_payable),
'OPTT':income.operating_profit / income.total_profit, # 营业利润占比
'TP/TOR':income.total_profit / income.total_operating_revenue, #利润总额/营业总收入
'OP/TOR':income.operating_profit / income.total_operating_revenue,
'NP/TOR':income.net_profit / income.total_operating_revenue,

'NOCF/MC':cash_flow.net_operate_cash_flow / valuation.market_cap, #经营活动产生的现金流量净额/总市值
'GPOA':indicator.gross_profit_margin * income.operating_revenue / balance.total_assets,  #毛利润 / 总资产 = 毛利率*营业收入 / 总资产
'OPM':income.operating_profit / income.operating_revenue, #营业利润率
'EBIT':income.net_profit+income.financial_expense+income.income_tax_expense,

}

#获取所有因子列表

factor_list = list(fac_dict.keys())

def get_fundamental_data(securities,factor_list,ttm_factors, date):

‘’’

获取基本面数据,横截面数据,时间、股票、因子三个参数确定

获取的数据中含有Nan值,一般用行业均值填充

输入:

factor_list:list, 普通因子

ttm_factors:list, ttm因子,获取过去四个季度财报数据的和

date:str 或者 datetime.data, 获取数据的时间

securities:list,查询的股票

输出:

DataFrame,普通因子和ttm因子的合并,index为股票代码,values为因子值

‘’’

if len(factor_list) == 0:

return ‘factors list is empty, please input data’

#获取查询的factor list

q = query(valuation.code)

for fac in factor_list:

q = q.add_column(fac_dict[fac])

q = q.filter(valuation.code.in_(securities))

fundamental_df = get_fundamentals(q,date)

fundamental_df.index = fundamental_df[‘code’]

fundamental_df.columns = [‘code’] + factor_list

if type(date) == str:
    year = int(date[:4])
    month_day = date[5:]
elif type(date) == datetime.date:
    date = date.strftime('%Y-%m-%d')
    year = int(date[:4])
    month_day = date[5:]
else:
    return 'input date ERROR'

if month_day < '05-01':
    statdate_list = [str(year-2)+'q4', str(year-1)+'q1', str(year-1)+'q2', str(year-1)+'q3']
elif month_day >= '05-01' and month_day < '09-01':
    statdate_list = [str(year-1)+'q1', str(year-1)+'q2', str(year-1)+'q3',str(year)+'q1']
elif month_day >= '09-01' and month_day < '11-01':
    statdate_list = [str(year-1)+'q2', str(year-1)+'q3', str(year)+'q1', str(year)+'q2']
elif month_day >= '11-01':
    statdate_list = [str(year-1)+'q4', str(year)+'q1', str(year)+'q2', str(year)+'q3']
        
ttm_fundamental_data = ''

ttm_q = query(valuation.code)
for fac in ttm_factors:
    ttm_q = ttm_q.add_column(fac_dict[fac])
ttm_q = ttm_q.filter(valuation.code.in_(securities))  
                         
for statdate in statdate_list:
    if type(ttm_fundamental_data) == str:
        fundamental_data = get_fundamentals(ttm_q, statDate=statdate)
        fundamental_data.index = fundamental_data['code']
        ttm_fundamental_data = fundamental_data
    else:
        fundamental_data = get_fundamentals(ttm_q, statDate=statdate)
        fundamental_data.index = fundamental_data['code']
        ttm_fundamental_data.iloc[:,1:] += fundamental_data.iloc[:,1:]
ttm_fundamental_data.columns = ['code'] + ttm_factors
results = pd.merge(fundamental_df,ttm_fundamental_data,on=['code'],how='inner')
results = results.sort_values(by='code')
results.index = results['code']
results = results.drop(['code'],axis=1)
#删除非数值列
columns = list(results.columns)
for column in columns:
    if not(isinstance(results[column][0],int) or isinstance(results[column][0],float)):
        results = results.drop([column],axis=1)
return results
def get_all_fundamentals(securities, date):
'''
获取所有基本面因子
输入:
securies:list,查询的股票代码
date:str or datetime,查询的时间
输出:
fundamentals:dataframe,index为股票代码,values为因子值
'''
q = query(valuation,balance,cash_flow,income,indicator).filter(valuation.code.in_(securities))
fundamentals = get_fundamentals(q,date)
fundamentals.index = fundamentals['code']
#删除非数值列
columns = list(fundamentals.columns)
for column in columns:
    if not(isinstance(fundamentals[column][0],int) or isinstance(fundamentals[column][0],float)):
        fundamentals = fundamentals.drop([column],axis=1)
fundamentals = fundamentals.sort_index()
return fundamentals

all_fundamentals = get_all_fundamentals(all_stocks,start_date)

def get_stock_industry(industry_name,date,output_csv = False):

‘’’

获取股票对应的行业

input:

industry_name: str,

“sw_l1”: 申万一级行业

“sw_l2”: 申万二级行业

“sw_l3”: 申万三级行业

“jq_l1”: 聚宽一级行业

“jq_l2”: 聚宽二级行业

“zjw”: 证监会行业

date:时间

output: DataFrame,index 为股票代码,columns 为所属行业代码

‘’’

industries = list(get_industries(industry_name).index)

all_securities = get_all_securities(date=date) #获取当天所有股票代码

all_securities[‘industry_code’] = 1

for ind in industries:

industry_stocks = get_industry_stocks(ind,date)

#有的行业股票不在all_stocks列表之中

industry_stocks = set(all_securities) & set(industry_stocks)

all_securities[‘industry_code’][industry_stocks] = ind

stock_industry = all_securities[‘industry_code’].to_frame()

if output_csv == True:

stock_industry.to_csv(‘stock_industry.csv’) #输出csv文件,股票对应行业

return stock_industry

def fillna_with_industry(data,date,industry_name=‘sw_l1’):

‘’’

使用行业均值填充nan值

input:

data:DataFrame,输入数据,index为股票代码

date:string,时间必须和data数值对应时间一致

output:

DataFrame,缺失值用行业中值填充,无行业数据的用列均值填充

‘’’

stocks = list(data.index)

stocks_industry = get_stock_industry(industry_name,date)

stocks_industry_merge = data.merge(stocks_industry, left_index=True,right_index=True,how=‘left’)

stocks_dropna = stocks_industry_merge.dropna()

columns = list(data.columns)

select_data = []

group_data = stocks_industry_merge.groupby(‘industry_code’)

group_data_mean = group_data.mean()

group_data = stocks_industry_merge.merge(group_data_mean,left_on=‘industry_code’,right_index=True,how=‘left’)

for column in columns:

    if type(data[column][0]) != str:

        group_data[column+'_x'][pd.isnull(group_data[column+'_x'])] = group_data[column+'_y'][pd.isnull(group_data[column+'_x'])]
        
        group_data[column] = group_data[column+'_x']
        #print(group_data.head())
        select_data.APPend(group_data[column])
        
result = pd.concat(select_data,axis=1)
#行业均值为Nan,用总体均值填充
mean = result.mean()
for i in result.columns:
    result[i].fillna(mean[i],inplace=True)
return result
#获取日期列表

def get_tradeday_list(start,end,frequency=None,count=None):

‘’’

input:

start:str or datetime,起始时间,与count二选一

end:str or datetime,终止时间

frequency:

str: day,month,quarter,halfyear,默认为day

int:间隔天数

count:int,与start二选一,默认使用start

‘’’

if isinstance(frequency,int):

all_trade_days = get_trade_days(start,end)

trade_days = all_trade_days[::frequency]

days = [datetime.datetime.strftime(i,’%Y-%m-%d’) for i in trade_days]

return days

if count != None:
    df = get_price('000001.XSHG',end_date=end,count=count)
else:
    df = get_price('000001.XSHG',start_date=start,end_date=end)
if frequency == None or frequency =='day':
    days = df.index
else:
    df['year-month'] = [str(i)[0:7] for i in df.index]
    if frequency == 'month':
        days = df.drop_duplicates('year-month').index
    elif frequency == 'quarter':
        df['month'] = [str(i)[5:7] for i in df.index]
        df = df[(df['month']=='01') | (df['month']=='04') | (df['month']=='07') | (df['month']=='10') ]
        days = df.drop_duplicates('year-month').index
    elif frequency =='halfyear':
        df['month'] = [str(i)[5:7] for i in df.index]
        df = df[(df['month']=='01') | (df['month']=='06')]
        days = df.drop_duplicates('year-month').index
trade_days = [datetime.datetime.strftime(i,'%Y-%m-%d') for i in days]
return trade_days

tl = get_tradeday_list(start_date,end_date,frequency=‘month’)

def get_date_list(begin_date, end_date):

‘’’

得到datetime类型时间序列

‘’’

dates = []

dt = datetime.datetime.strptime(begin_date,"%Y-%m-%d")

date = begin_date[:]

while date <= end_date:

dates.append(date)

dt += datetime.timedelta(days=1)

date = dt.strftime("%Y-%m-%d")

return dates

#去极值函数

#mad中位数去极值法

def filter_extreme_MAD(series,n): #MAD: 中位数去极值

median = series.quantile(0.5)

new_median = ((series - median).abs()).quantile(0.50)

max_range = median + nnew_median

min_range = median - nnew_median

return np.clip(series,min_range,max_range)

#进行标准化处理

def winsorize(factor, std=3, have_negative = True):

‘’’

去极值函数

factor:以股票code为index,因子值为value的Series

std为几倍的标准差,have_negative 为布尔值,是否包括负值

输出Series

‘’’

r=factor.dropna().copy()

if have_negative == False:

r = r[r>=0]

else:

pass

#取极值

edge_up = r.mean()+stdr.std()

edge_low = r.mean()-stdr.std()

r[r>edge_up] = edge_up

r[r<edge_low] = edge_low

return r

#标准化函数:

def standardize(s,ty=2):

‘’’

s为Series数据

ty为标准化类型:1 MinMax,2 Standard,3 maxabs

‘’’

data=s.dropna().copy()

if int(ty)1:

re = (data - data.min())/(data.max() - data.min())

elif ty2:

re = (data - data.mean())/data.std()

elif ty==3:

re = data/10**np.ceil(np.log10(data.abs().max()))

return re

#数据去极值及标准化

def winsorize_and_standarlize(data,qrange=[0.05,0.95],axis=0):

‘’’

input:

data:Dataframe or series,输入数据

qrange:list,list[0]下分位数,list[1],上分位数,极值用分位数代替

‘’’

if isinstance(data,pd.DataFrame):

if axis == 0:

q_down = data.quantile(qrange[0])

q_up = data.quantile(qrange[1])

index = data.index

col = data.columns

for n in col:

data[n][data[n] > q_up[n]] = q_up[n]

data[n][data[n] < q_down[n]] = q_down[n]

data = (data - data.mean())/data.std()

data = data.fillna(0)

else:

data = data.stack()

data = data.unstack(0)

q = data.quantile(qrange)

index = data.index

col = data.columns

for n in col:

data[n][data[n] > q[n]] = q[n]

data = (data - data.mean())/data.std()

data = data.stack().unstack(0)

data = data.fillna(0)

elif isinstance(data,pd.Series):
    name = data.name
    q = data.quantile(qrange)
    data = np.clip(data,q.values[0],q.values[1])
    data = (data - data.mean())/data.std()
return data

def neutralize(data,date,market_cap,industry_name=‘sw_l1’):

‘’’

中性化,使用行业和市值因子中性化

input:

data:DataFrame,index为股票代码,columns为因子,values为因子值

name:str,行业代码

“sw_l1”: 申万一级行业

“sw_l2”: 申万二级行业

“sw_l3”: 申万三级行业

“jq_l1”: 聚宽一级行业

“jq_l2”: 聚宽二级行业

“zjw”: 证监会行业

date:获取行业数据的时间

maket_cap:市值因子

‘’’

industry_se = get_stock_industry(industry_name,date)

columns = list(data.columns)

if isinstance(industry_se,pd.Series):

industry_se = industry_se.to_frame()

if isinstance(market_cap,pd.Series):

market_cap = market_cap.to_frame()

index = list(data.index)
industry_se = np.array(industry_se.ix[index,0].tolist())
industry_dummy = sm.categorical(industry_se,drop=True)
industry_dummy = pd.DataFrame(industry_dummy,index=index)
market_cap = np.log(market_cap.loc[index])
x = pd.concat([industry_dummy,market_cap],axis=1)
model = sm.OLS(data,x)
result = model.fit()
y_fitted =  result.fittedvalues
neu_result = data - y_fitted
return neu_result
def get_month_profit(stocks,start_date,end_date,month_num=1,cal_num=3):
'''
获取月收益率数据,数据为本月相对于上月的增长率
input:
stocks:list 股票代码
start_date:str, 初始日期
end_date:str,终止日期
month_num:计算几个月的收益率,默认为1,即一个月的收益率
cal_num:int,计算每月最后n天的收盘价均值,默认为3

'''
start_year = int(start_date[:4])
end_year = int(end_date[:4])
start_month = int(start_date[5:7])
end_month = int(end_date[5:7])
len_month = (end_year - start_year)*12 + (end_month - start_month)
price_list = []
#获取初始时间之前一个月的价格数据
if start_month == 1:
    last_date = str(start_year-1)+'-'+'12'+'-'+'01'
else:
    last_date = str(start_year-1)+'-'+str(start_month-1)+'-'+'01'
last_price = get_price(stocks,fields=['close'],count=cal_num,end_date=last_date)['close']
last_price = last_price.mean().to_frame()
last_price.columns = [last_date]
price_list.append(last_price)
#计算给定时间段内的月度价格数据
for i in range(len_month):
    date = str(start_year+i//12)+'-'+str(start_month+i%12)+'-'+'01'
    price = get_price(stocks,fields=['close'],count=cal_num,end_date=date)['close']
    price_mean = price.mean().to_frame()
    price_mean.columns = [date]
    price_list.append(price_mean)
month_profit = pd.concat(price_list,axis=1)
#计算月度收益率
month_profit_pct = month_profit.pct_change(month_num,axis=1).dropna(axis=1,how='all')
return month_profit_pct

def get_profit_depend_timelist(stocks,timelist,month_num=1,cal_num=3):

‘’’

input:

stocks:list 股票代码

timelist: 时间序列

month_num:计算几个月的收益率,默认为1,即一个月的收益率

cal_num:int,计算每月最后n天的收盘价均值,默认为3

‘’’

price_list = []

for date in timelist:

price = get_price(stocks,fields=[‘close’],count=cal_num,end_date=date)[‘close’]

price_mean = price.mean().to_frame()

price_mean.columns = [date]

price_list.append(price_mean)

profit = pd.concat(price_list,axis=1)

profit_pct = profit.pct_change(month_num,axis=1).dropna(axis=1,how=‘all’)

return profit_pct

def get_day_profit_forward(stocks,end_date,start_date=None,count=-1,pre_num=20):

‘’’

获取收益率,pre_num为计算时间差,在时间轴上的当期值是未来计算周期内的收益率,

例如:pre_num=3,2013-01-01对应的收益率是2013-01-04的收益率与01-01日收益率之差

input:

stocks:list or Series,股票代码

start_date:开始时间

end_date:结束时间

count:与start_date二选一,向前取值个数

pre_num:int,向后计算的天数

output:

profit:dataframe,index为日期,columns为股票代码,values为收益率

‘’’

if count == -1:

price = get_price(stocks,start_date,end_date,fields=[‘close’])[‘close’]

date_list = get_trade_days(start_date=start_date,end_date=end_date)

price.index = date_list

else:
    price = get_price(stocks,end_date=end_date,count=count,fields=['close'])['close']
    date_list = get_trade_days(end_date=end_date,count=count)
    price.index = date_list
profit = price.pct_change(periods=pre_num).shift(-pre_num).dropna()
return profit
def get_one_day_data(stocks,factor_list,ttm_factors,date,neu=False):
'''
获取一天的基本面数据
input:
stocks:list,股票列表
factor_list:list,普通因子列表
ttm_factors:list,ttm因子列表
date:str or datetime, 获取数据时间
neu:bool,是否进行中性化处理,使用市值和行业进行中性化,默认不进行中性化
'''
fund_data = get_fundamental_data(stocks,factor_list,ttm_factors,date)
fillna_data = fillna_with_industry(fund_data,date)
if neu == False:
    results = winsorize_and_standarlize(fillna_data)
elif 'MC' in fillna_data.columns:
    neu_data = neutralize(fillna_data,date,fillna_data['MC'])
    results = winsorize_and_standarlize(neu_data)
elif 'market_cap' in fillna_data.columns:
    neu_data = neutralize(fillna_data,date,fillna_data['market_cap'])
    results = winsorize_and_standarlize(neu_data)
else:
    print("error: please input 'market_cap' for neutralize")
    return None
return results
def get_timelist_data(stocks,factor_list,ttm_factors,timelist,neu=False):
dic = {}
for date in timelist:
    fund_date = get_one_day_data(stocks,factor_list,ttm_factors,date,neu=neu)
    dic[date] = fund_date
return dic

JQDATA

原作者连接

文章最后发布于: 2019-02-25 14:02:28

相关阅读

网店定位怎么写?2017淘宝店铺定位策略

网店定位怎么写?淘宝店铺定位策略有哪些?想要开网店,淘宝店铺定位很重要,那么你真的知道网店定位怎么写?想知道淘宝店铺定位策略有

什么是关键词seo优化策略?

关键词方面的seo优化策略包括了主要的几个部分,分别为选择关键词,拓展关键词,布局关键词。什么是关键词策略?也就是说,假设我们准备做

一起聊聊:什么才是好的产品策略?

你有想过吗,什么才是好的产品策略?“你的产品策略是什么?你需要一个战略,你更需要一个好的产品策略。”当我在脑海中一遍遍重复这几句

Rushmail:周期性调整邮件群发内容的策略

在邮件营销的过程中,邮件营销内容都是需要进行周期性的调整,大部分企业都会根据一些因素进行邮件营销策略调整,下面Rushmail来给大家

策略模式

策略模式 在策略模式(Strategy Pattern)中,一个类的行为或其算法可以在运行时更改。这种类型的设计模式属于行为型模式。在策略模式

分享到:

栏目导航

推荐阅读

热门阅读