Analysis of Provider Payment Reforms on Advancing China’s Health (APPROACH): An Evaluation of County Hospitals June 5, 2019 Policy Collaborators: Governments of Guizhou, China Funding from the Health Result Innovation Trust Fund, the World Bank Group, is gratefully acknowledged. 1 Table of Contents List of Abbreviations and Acronyms 4 Executive summary 5 Background 7 Country Context 7 Recent Health Reforms 7 Project Objectives 8 Description of the APPROACH project design 9 Project Background 9 Design of Provide Payment Intervention 10 Hypothesized Behavioral Response and Consequences 11 Study Design 14 Study Sites 14 Pairwise Randomization 14 Sample Selection 17 Data Source 18 NCMS Inpatient Claims Data 18 APPROACH County Hospital Survey 19 World Management Survey (WMS) Hospital Questionnaire 20 International Comparison of WMS Score 21 Timeline 22 Methods 23 Baseline Balance and Pre-intervention Trend 23 Model Specifications 24 Results 26 Data 26 2 Power Calculation 27 Balance at Baseline 29 Pre-intervention Trend 31 Pre-post Comparison of Outcome Variables and Hospital Attributes 33 Hypothesis 1, 1a-1c: Difference-in-differences Results 36 Hypothesis 3: Hospital governance and management 42 Discussions 46 Challenges 47 Lessons and Conclusion 50 3 List of Abbreviations and Acronyms CHS County Hospital Survey DD Difference-in-differences DRG Diagnosis Related Group EHR Electronic Health Records FFS Fee-for-service GNI Gross national income GB+P4P Global budget + Pay-for-performance HIC High-Income Countries LMIC Low- and Middle-Income Countries LOS Length of stay NCMS New Rural Cooperative Medical Scheme OOC Out-of-county OOP Out-of-pocket PPM Provider payment method SOP Standard of operating procedure THC Township health center TCM Traditional Chinese Medicine URBMI Urban Resident Basic Medical Insurance WMS World Management Survey 4 Executive summary In April 2009, the Chinese government launched a national health care reform program, with the goal to provide affordable, equitable and effective health care for all by 2020. Since then, government spending on health care has more than tripled. By 2012, with substantial government subsidies, 97 percent of the Chinese population were covered by one of the three basic medical insurance schemes in China. The expansion of insurance coverage, however, has not kept pace with expenditure escalation, resulting in increases in out-of-pocket (OOP) payments and financial burden. A key underlying reason is the misaligned incentives for health care providers. The Chinese government has highlighted provider payment reform as a top priority. They aim to improve efficiency and reduce health expenditure growth in order to ensure affordable health care and reduce the risk of catastrophic medical expenditure. The reform is particularly focused on the public hospitals, as they account for over 70 percent of total national health spending. To determine the optimal direction, the national government encourages local governments to pilot test new models of provider payment methods. This project’s primary objective therefore aims to test and evaluate an innovative provider payment method that attempts to reduce health expenditure and therefore patients’ OOP payments. The payment method should encourage hospitals to improve efficiency of delivery by reducing unnecessary expenditure and also reducing admissions at expensive large tertiary hospitals for non-complicated health conditions. This project was conducted in the rural areas of Guizhou province using a pairwise randomization study design. The primary intervention involved switching the local rural health insurance scheme – New Rural Cooperative Medical Scheme (NCMS) – from paying county hospitals using fee-for-service (FFS) scheme to global budget scheme. This should incentivize the hospitals to act as gatekeepers, minimize out-of-county (OOC) referrals for patients they can treat, and reduce unnecessary expenditure for county hospital treatment. The intervention was implemented in two waves. The impacts were variable. For one set of intervention counties/hospitals, the results show that the provider payment intervention led to reduced OOC admissions. This finding is consistent with the government’s objective to build a “tiered delivery” system in China, that is, less complicated cases should be shifted from tertiary hospitals to secondary hospitals as a way to improve allocative efficiency. There is also some evidence to suggest that faced with a global budget, hospitals cost shift by prescribing more services/drugs that are not eligible for NCMS reimbursement, thus leading to increased OOP spending. This effect is weaker for hospitals of which NCMS payment represents a significant share of the hospitals’ revenue. We did not find evidence that the payment intervention led hospitals to improve their managerial capability, as measured by the World Management Survey instrument. 5 In conclusion, prospective payment method can provide hospitals with important financial incentives for behavior change. However, these external incentives need to be supported by certain hospital organizational structures such as autonomy over hiring and firing, discretion over savings, and aligned physician incentives. Moreover, payment policy should be designed not as a facility-based intervention, but in a systematic view incorporating the inter-relationship between different levels of facilities. 6 Background Country Context China is an upper-middle-income country with a GNI (Gross national income) per capita of US$ 8,690 in 2017 (World Bank, Atlas method). It is the second largest economy and the most populous country in the world. Its total population reached 1.39 billion in 2018, of which 41.5% were rural residents. China has achieved significant reduction in poverty, especially for the rural population. By 2015, the poverty headcount (% of total population) by international standard has declined to 0.7% from 31.7% in 2002. Illness has historically been a major contributor to impoverishment in China. In 2016, illness-related impoverishment1 contributed to about 44% of all cases of poverty (NHFPC 2016). Recent Health Reforms In April 2009, the Chinese government launched a national health care reform program, with the goal to provide affordable, equitable and effective health care for all by 2020. Since then, government spending on health care has more than tripled. By 2012, with substantial government subsidies, 97 percent of the Chinese population were covered by one of the three basic medical insurance schemes in China: The Urban Employee Basic Medical Insurance (UEBMI) for urban employees, the Urban Resident Basic Medical Insurance (URBMI) for urban residents, and the New Rural Cooperative Medical Scheme (NCMS) for rural residents. The expansion of insurance coverage and corresponding increases in reimbursement rates, however, have not kept pace with expenditure escalation, resulting in increases in out-of-pocket (OOP) payments (Li et al. 2012; Meng et al. 2012; Yip et al. 2012). A key underlying reason is the misaligned incentives faced by health care providers. Since the introduction of market reforms in the late 1970s in which the government pared down subsidies for the health care system, providers were permitted to earn revenues (and profits) by charging for health services. Providers are paid by fee-for-service (FFS) according to a fee schedule set by the government. For historical reasons, the fee schedule overpays (priced above cost) for drugs and high-tech diagnostic tests, and underpays (price below cost) for low-tech and in particular labor-intensive services. These incentives have led providers to over-prescribe drugs and high-tech diagnostics, and under- provide unprofitable services, irrespective of clinical efficacy and needs. As a result, both over- 1 in Chinese: 因病致贫因病返贫 7 and under-provision are common in China. Meanwhile, expenditure continues to grow at a rapid rate, leading to unaffordable health care and impoverishment.2 In the latest Five-Year Plan (2016-2020), the Chinese government highlighted provider payment reform as a top priority to improve efficiency and reduce health expenditure growth in order to ensure access to affordable health care and lower risk of catastrophic medical expenditure. It is particularly focused on public hospitals, which account for over 70 percent of the total national health spending. Among public hospitals, county hospitals are identified as a top priority since they form the cornerstone of the rural health delivery system, serving 600 million people nationwide. Specifically, the Five-Year Plan calls for changing the provider payment method (PPM) from FFS to other prospective PPMs, such as global budget, capitation and case-based payments. Health insurance programs, which often account for 60-80 percent of hospital revenue, are required to initiate reforms by designing a new PPM within the next few years. Project Objectives In China, while the national government guides the health policies in broad strokes, the specific design and implementation are left to local governments. This provides a golden opportunity to collaborate with local governments to design and implement large-scale innovative PPM and evaluate their impacts through rigorous evaluation designs. The findings can directly feed into ongoing policy dialogue to inform China’s health care reform decisions. In addition to contributing to China’s health care reform, this study is also motivated by the paucity of knowledge in the literature regarding effective policies to improve hospital performances. There is an equal scarcity of knowledge in understanding how provider payment incentives affect hospital performance. In particular, little is known about how provider payment affects outcomes. Thus, another goal of this project is to unpack the “black box” within the hospital to shed light on why provider payment works – or does not work – in affecting hospital behavior. 2 Over the last 2 decades, health spending has increased at an annual rate of 16 percent on average, surpassing annual GDP growth by about 7 percent. 8 Description of the APPROACH project design Project Background The Analysis of Provider Payment Reforms on Advancing China’s Health (APPROACH) project was conducted in rural China where the administrative unit is a county. Within a county, the delivery system typically consists of county hospitals, followed by township health centers (THCs) at the next level, and finally village clinics at the bottom. Most counties have two county hospitals – the County People’s Hospital (a general hospital) and the County Traditional Chinese Medicine Hospital (a TCM hospital). They are mostly secondary hospitals. The County People’s Hospital usually has higher clinical competence, while the County TCM hospital has a stronger specialization in TCM treatments, such as acupuncture and cupping therapy. However, the County People’s Hospital and the County TCM hospital are often direct competitors, because they treat the same conditions. In fact, TCM hospitals treat conditions using standard western medicine as general hospitals do in more than 80% of the cases. The main difference is that TCM hospitals provide additional TCM therapies, such as acupuncture or herbal medicine. Other inpatient care facilities in a county may include specialized public health facilities, such as the Maternal and Child Health Center, and private hospitals, which tend to be small in rural areas. For the majority of public county hospitals, government budgetary subsidy typically makes up about 10-20% of their total revenues. The main revenue source is the rural health insurance scheme, NCMS, which often accounts for 40-60% of the total revenue. Patient OOP payments still make up a sizable share of hospitals’ total revenues because NCMS – administered at the county level and constrained by a county’s fiscal capacity – typically does not have a generous benefit package. Reimbursement is restricted to items in three NCMS catalogs – the drug catalog, the diagnosis and treatment catalog, and the medical services and facilities catalog. Drugs, treatments, and services outside of NCMS catalogs are ineligible for reimbursement. In addition, NCMS also sets a total cap for reimbursement. There is no gatekeeping in China’s health care delivery system. Patients can self-refer to any level of health care facilities. Rural patients have less confidence in the technical quality of rural providers than large tertiary hospitals in cities, so they often bypass county, town or village facilities and go directly to city or even provincial level hospitals. This is particularly so with improved road conditions. In many counties, 15-20% of NCMS enrollees used out-of-county (OOC) facilities. Because average expenditure per admission at tertiary hospitals in cities is typically 2 to 5 times higher than that at county hospitals, the 15-20% OOC admissions often account for more than half of NCMS spending. This has created major financial pressure on the NCMS budget and for patients. To address the problem of patients bypassing lower-level providers, the government has adopted NCMS designs that align incentives on the demand side. For example, the deductible varies by levels of care: the higher the level of care, the larger the deductible. 9 Additionally, NCMS reimbursement rate for admissions at OOC and higher-level providers is set lower than that for admissions at within-county and lower-level providers. Design of Provide Payment Intervention In designing the PPM intervention, the goals were to introduce incentives for county hospitals to: 1. Improve efficiency, and thus to reduce expenditure and patient OOP payments; 2. Reduce referral to OOC hospitals for health conditions that county hospitals are capable of treating; and 3. Improve the quality of care. Our intervention is through NCMS’s PPM for inpatient care. The primary intervention is to change from FFS to a mixed payment method combining disease based global budget and pay-for- performance (hereafter GB+P4P) for public county hospitals, i.e., the County People’s Hospital and the County TCM hospital. Under this method, a prospective budget is calculated for each intervention hospital at the beginning of the year. Thirty percent of the budget is withheld and disbursed based on performance assessment annually. 3 The estimation for the global budget determines the incentives of the global budget. In this case, in addition to incentivizing hospitals to reduce unnecessary care (which is the incentive of a typical global budget), we also wanted to create incentives for intervention hospitals to strengthen their clinical capabilities. That way, they can attract more patients and not lose them to OOC hospitals. The budget calculation is as follows: 1. Define a set of diseases by ICD-9 code (when available) or disease name that the intervention hospitals have the clinical capability to treat. 2. For this set of diseases, analyze the proportion of admissions at county hospitals versus secondary or tertiary hospitals in cities for all NCMS enrollees in the previous year. 3. For each health condition, estimate a condition-specific budget: {(#),-./01 × ( ),-./01 } + {(#)<<= × ( )<<= } 3 We propose 30% because we have piloted this design in one county in Ningxia and it seems that this amount is necessary to incentivize the hospital to make a number of changes that would eventually affect outcome. Moreover, research on p4p experiences in the US and UK hospitals have shown little impact of p4p in the US and UK in part because only a very small percentage of the hospital’s total income is “at risk” for performance assessment (2% US; 4% UK). 10 Where 0<δ<1 represents the share of previously OOC admissions that intervention hospitals plan to capture this year; 0<ρ<1 represents the proportion of average expenditure at OOC hospitals that is used as the per-admission payment (a profit margin included) for the previously OOC admissions to be captured by intervention hospitals. Since average expenditure per admission at OOC hospitals is higher than that at intervention county hospitals (for the same health condition), this provides a profit margin for every patient that the county hospital attracts. 4. A prospective budget is then estimated by summing up (3) across all diseases. At the end of the year, if actual expenditure of the intervention hospital exceeds the budget, the hospital has to bear the cost overrun. If there is savings, the hospital can keep the savings, conditional on its volume of services not falling below a pre-agreed volume in the original contract. In summary, this global budget aims to cover expenditures of local residents’ inpatient care use for a set of diseases that intervention hospitals are capable of treating, with a built-in incentive for intervention hospitals to play a gatekeeping role that minimizes referral to OOC hospitals for cases within their clinical capabilities. If an intervention hospital refers patients to OOC hospitals for diseases within its clinical capabilities, payment for expenditures incurred at these higher-level hospitals will be paid from the intervention hospital’s budget. To align the demand-side incentives, in addition to an already lower reimbursement rate for OOC admissions, the reimbursement rate for OOC admissions without a referral is further reduced to half of the rate for OOC admissions with a referral. In the initial design, 30% of the budget is withheld for quality assessment. Appendix A shows the quality indicators used for assessment as the baseline results for the intervention counties. However, due to budget limitations and other implementation issues, this component had to be dropped. More details are provided below and in the Discussion section. Hypothesized Behavioral Response and Consequences Given the PPM design, we develop two main hypotheses. Hypothesis 1: Compared to FFS, a disease-based global budget leads to a. Lower NCMS-eligible health expenditure per admission at county hospitals, b. Higher non-NCMS-eligible expenditure at county hospitals (cost shifting), and c. Unclear net effects on patient OOP payments as a result of (a) and (b). It is worth noting that the portion of the budget that rewards hospitals for retaining patients may lead to more prescriptions of advanced drugs and tests and less referrals of severe cases to OOC 11 hospitals. The higher service intensity and worsened case mix could to some extent offset the main effect of global budget in lowering NCMS-eligible health expenditure. We also examine how hospitals with different characteristics react to the PPM intervention differently. In particular, we hypothesize that: • Hypothesis 1a: a county hospital with a higher level of clinical capability is less likely to lower health expenditure, as it is more likely to treat patients with more severe health conditions, and more likely to cost shift; • Hypothesis 1b: a county hospital with a higher share of its total revenue derived from NCMS payments is more likely to reduce expenditure but has less room to cost shift; and • Hypothesis 1c: a county hospital with better management is more likely to reduce expenditure through efficiency improvements, but it is unclear how it will affect cost shifting, depending on the objectives of the hospital. Hypothesis 2: Compared to FFS, a disease based global budget leads to a. Lower probability of patients admitted at OOC facilities; b. Higher probability of patients admitted at county hospitals; and c. Lower average NCMS-eligible health expenditure (including admissions at OOC hospitals, county hospitals, and township health centers). We also examine how hospitals with different characteristics react to the PPM intervention differently. In particular, we hypothesize that: • Hypothesis 2a: a county hospital with a higher level of clinical capability is more likely to reduce OOC admissions and increase county-level admissions; • Hypothesis 2b: a county hospital with a higher share of its total revenue derived from NCMS payments is more likely to reduce OOC admissions and increase county-level admissions; and • Hypothesis 2c: a county hospital with better management is more likely to reduce OOC admissions and increase county-level admissions. Secondary Hypothesis 3: Compared to FFS, hospitals faced with a disease-based global budget is more likely to improve their management and other hospital attributes in order to respond to the incentives more effectively. In the original proposal, we planned to examine the policies’ impact on the quality of care, prescription, diagnostic tests, and expenditures, as shown in the following research questions. Primary Research Questions: 12 1. Does GB+P4P improve quality of care? 2. Does GB+P4P reduce unnecessary drug prescription and use of diagnostic tests? 3. Does GB+P4P reduce expenditure and thus out of pocket payment per admission? Secondary Research Questions: 1. Does GB+P4P lead county hospitals to adopt quality improvement and other management changes? 2. Do GB+P4P effects vary with institutional characteristics, such as organizational capacity and governance structure and management capacity? 3. Does GB+P4P lead to an increase in charges for services not covered by the public insurance scheme (and therefore not under the control of GB+P4P)? However, quality assessment (relating to the first primary research question and the first secondary research question) proved challenging, because diagnoses in the claims data was not readily usable. Manual collection of quality data was prohibitively cumbersome and expensive. Specifically, diagnoses in the claims data were recorded in unstandardized disease names (in Chinese) rather than in ICD codes, and therefore require a thorough review and re-coding by Chinese clinicians familiar with local clinical practices. The quality indicators in Appendix A uses information from medical records, which has to be extracted from paper forms as local hospitals did not have EHR (Electronic Health Records) systems. The process is as follows. First, local hospitals submitted discharge records to the research team. Second, the research team provided local hospitals with a list of randomly selected admissions for the disease names and ICD codes of the four conditions listed in Appendix A to be used for medical review. Third, local hospitals extracted medical records according to the list. Fourth, a company was hired to scan the extracted medical records and provide digitized medical records data to the research team. Finally, the research team created a database and organized a clinical team in Beijing to do the medical review. We could not use any clinician on the ground, because based on our past experience, medical review by different clinicians may introduce inconsistencies. The quality data collection involves 160 medical records (40 for each of the four conditions) from 63 hospitals from 32 counties (treatment and control) for at least two years. Notwithstanding the sizeable funding and human resources required, this data collection also required approval from 32 county health bureaus. The approval was challenging to obtain given that quality of care was not a priority in China. Assessment of the use of drugs and tests (relating to the second primary research question) also proved to be infeasible, because the claims data did not have information on drugs and diagnostic tests. 13 Study Design Study Sites We selected Guizhou province as the study site. Guizhou is one of the poorest provinces in China with a large rural population. Initially, we proposed to conduct the study in two provinces: Ningxia and Guizhou, both low income provinces in western China. However, due to budget constraints, we had to exclude Ningxia, which has a much smaller population than Guizhou. Although the results from Guizhou may not be generalizable to eastern China, they should be representative of China’s western regions and lower-income areas of the central region. Guizhou has 88 districts and counties. We first excluded districts/counties where NCMS enrollees make up less than 30% of the local population. These were mostly districts consisting of urban residents. We then excluded eight counties where NCMS is administered by the Department of Human Resource and Social Security rather than the Department of Health. We further excluded another eight counties; they were implementing other NGO-funded projects at that time, which may confound our interventions’ effects. Lastly, we excluded the two counties used to pilot the intervention (details in the Timeline section). The remaining 56 counties were eligible for the project. Pairwise Randomization To assign the 56 counties into treatment and control groups, we adopted a matched-pair cluster randomization strategy (Bruhn & McKenzie 2009; Imai, King & Nall 2009). Specifically, we calculated each county’s Mahalanobis distance to the other 55 counties using the most recently available baseline NCMS data (expenditure per admission at county hospitals, NCMS reimbursement rate at county hospitals, and proportion of admissions at county hospitals) and an indicator for the presence of a pre-existing provider payment reform pilot program. We then formed 28 pairs of counties using nearest neighbor matching without replacement, and randomly assign treatment to one county in each pair. We briefly assessed the randomization using the average expenditure per admission at county hospitals. The within-pair variation is 551, smaller than the between-pair variation of 588. Table 1 below shows the resulting 28 pairs of treatment and control counties after our matched-pair cluster randomization. Among the 28 treatment counties, 16 counties completed the intervention. Shuicheng refused to participate at the beginning, and 12 counties withdrew at various stages of the project. Jiangkou, Pan, Xiuwen, and Xifeng dropped out because their municipalities initiated other health care reforms, and required the county hospitals to implement different systems, such as DRGs 14 (Diagnosis Related Group) and integrated delivery system. Changshun, Wuchuan, Guiding, Luodian, Jinsha, Meitan, and Bijie were removed from the project due to serious delays in data collection and intervention implementation. Table 1 provides an overview of the implementation status of each treatment county. For the treatment counties that did not complete the intervention, the whole pairs (treatment and control) are removed from the evaluation. The non-compliance rate in wave 2 was higher than in wave 1 due to changes in the policy environment and the consequent weaker support from the provincial government. Specifically, the central government started to promote DRGs, which shifted the provincial government’s attention away from our GB+P4P intervention. The central government also emphasized the integration of NCMS with URBMI, an insurance scheme administered by a different government department – the Department of Human Resources and Social Security – and at a higher level – the city level. The merging of administrative offices and funding pools left considerable uncertainty regarding the existence and functions of NCMS offices at the Department of Health. As a result, more counties were reluctant to join in wave 2 and the provincial government also exerted less effort to help enforce the intervention. Those among the original wave 2 counties that actually joined were therefore likely more motivated to adopt our intervention. 15 Table 1. County pairs and implementation status. Treatment Implementation status Control county county Panel A. Wave 1 (intervention implemented since 2016) Jiangkou Withdrew in 2017 because an integrated delivery system reform was implemented on Yanhe the municipality level Sinan Implemented between 2016 and 2018 Zhengan Pingtang Ongoing implementation since 2016 Yuping Zhijin Ongoing implementation since 2016 Tongzi Xishui Ongoing implementation since 2016 Kaili Jianhe Implemented between 2016 and 2018 Congjiang Pingba Ongoing implementation since 2016 Chishui Taijiang Ongoing implementation since 2016 Nayong Suiyang Ongoing implementation since 2016 Weining Shuicheng Refused to participate in the intervention from the beginning Renhuai Pan Withdrew in 2018 becauase DRGs reform was implemented on the municipality level Qingzhen Cengong Ongoing implementation since 2016 Shibing Songtao Ongoing implementation since 2016 Longli Panel B. Wave 2 (intervention implemented since 2017) Changshun Withdrew due to serious implemention delay Yuqing Wuchuan Withdrew due to serious implemention delay Zhenyuan Guiding Withdrew due to serious implemention delay Qianxi Huangping Ongoing implementation since 2017 Hezhang Xiuwen Withdrew because another reform was implemented on the municipality level Huishui Yinjiang Ongoing implementation since 2017 Shiqian Leishan Ongoing implementation since 2017 Tongren Xifeng Withdrew because another reform was implemented on the municipality level Jinping Luodian Withdrew due to serious implemention delay Rongjiang Jinsha Withdrew due to serious implemention delay Zhenning Meitan Ongoing implementation since 2017 Daozhen Sansui Ongoing implementation since 2017 Libo Bijie Withdrew due to serious implemention delay Guanling Ziyun Ongoing implementation since 2017 Tianzhu Danzhai Ongoing implementation since 2017 Fenggang Figure 1 below shows the geographic location of the 16 pairs that completed the invention in Guizhou province. Most treatment and control pairs were located far away from each other, which may mitigate spillover effects. 16 Treatment county (T) Control county (C) Non-study county C5 C10 T3 C2 T7 T12 T9 C16 T10 C13 C12 T8 C1 C11 C8 T11 T14 C7 C15 C6 T2 C3 T6 T4 C9 T5 T13 T16 T1 T15 C4 C14 Note: T1-T10 and C1-C10 are wave 1 treatment and control counties, T11-T16 and C11-C16 are wave 2 treatment and control counties Figure 1. Map of treatment and control counties. Sample Selection We briefly assessed the comparability of the 16 pairs that completed the intervention to the 12 pairs of non-compliers and the 18 counties that were excluded. We used the latest publicly available county statistics before the intervention and the most recent NCMS indicators provided to us by local governments. Results are shown in Table 2 below. The remaining 16 pairs were largely comparable to the 12 pairs of non-compliers, with differences in GDP per capita, fiscal expenditure per capita, and total expenditure at county hospitals marginally significant at the 10% level. However, due to the aforementioned policy environment changes, we expect the remaining counties in wave 2 to be a selected group that were more supportive of our intervention. The comparison between remaining counties and ineligible counties showed no significant differences except in the reimbursement rate at county hospitals. Table 2. Representativeness of remaining counties. Remaining Non-compliant p- Ineligible p- counties counties value counties value GDP per capita in 2015 25151 31966 0.095 25719 0.824 Fiscal expenditure per capita in 8837 7948 0.098 8152 0.226 2015 17 Rural income in 2015 7424 7981 0.121 7417 0.985 Share of admissions at county 39.8 40.4 0.868 35.5 0.389 hospitals in 2012 Total expenditure at county 2530 2914 0.086 2496 0.872 hospitals in 2012 Reimbursement rate at county 72.5 69.6 0.171 66.3 0.020 hospitals in 2012 N 32 24 18 Data Source For evaluation, we use the following three sources to collect data on hospitals and NCMS enrollees: 1. NCMS inpatient claims data: Administrative data collected from the NCMS office for 2014-2018. 2. APPROACH County Hospital (CHS) Questionnaire: A facility-based survey designed and implemented by the research team, at both baseline and endline. See Appendix B. 3. World Management Survey (WMS) Hospital Questionnaire: A facility-based survey designed and implemented by the research team at both baseline and endline. See Appendix C. Table 3. Data sources Data Level Frequency Description of data NCMS inpatient Individual 2014-2018 Health care provider, date of admission, date of claims data discharge, disease name, disease code, total health expenditure, NCMS-eligible expenditure, NCMS reimbursement, OOP payment CHS data Facility Twice: baseline, Residual claims, decision rights, market exposure, social endline function, accountability WMS data Facility Twice: baseline, WMS scores measuring standardization of operations, endline performance monitoring, targets, and people management NCMS Inpatient Claims Data NCMS inpatient claims data is an administrative dataset maintained by the NCMS office. It records inpatient admissions for all NCMS enrollees registered in the local NCMS database, including admissions that occurred at OOC hospitals. Recorded information includes patients’ basic demographics such as age and gender, health care facility name (or category), date of admission, date of discharge, disease name (and disease code), and payments (e.g., the total 18 expenditure, the expenditure eligible for NCMS reimbursement, NCMS reimbursement, and the OOP payment). We obtained from the NCMS office the NCMS inpatient claims data in the treatment and control counties between January 2014 (i.e., two/three years before intervention implementation in wave 1/wave 2 treatment counties) and December 2018 (i.e., three/two years after intervention implementation in wave 1/wave 2 treatment counties). This dataset was used to measure the impact of GB+P4P on patient flow, NCMS-eligible expenditure, non-NCMS-eligible expenditure, OOP payment, and length of stay (LOS). APPROACH County Hospital Survey For evaluation purposes, we developed a county hospital survey (CHS) to collect information on hospitals’ financing, staffing, ownership, governance, and provider payment method, the market competition hospitals face, etc. (see Appendix B for details). To measure governance, we followed the Preker-Harding model. The model measures hospitals’ degree of autonomy based on the following five dimensions, and categories hospitals into budgetary units, autonomous units, corporatized units, and privatized units (increasing degree of autonomy): 1) Decision Rights: refers to the extent of the power transferred from the government to the managers of the hospital. Decision rights may include control over inputs, labor, scope of activities, financial management, clinical management and nonclinical administration, strategic management (formulation of institutional objectives), market strategy, sales, and the production process. 2) Residual Claims: refers to the hospital's right to decide on how to use “savings”, including whether the surplus needs to be turned over to the state treasury or the local government, and if yes, the proportion of surplus to be turned over, whether the surplus can be used autonomously and whether hospital has the right to invest. 3) Market Exposure: refers to whether the hospital's revenue mainly comes from the government's financial allocation or from market by delivering service. 4) Accountability: refers to the responsibility hospitals need to take for completing goals, complying with rules and regulations, ensuring the quality of medical services and so on. 5) Social Function: refers to the hospital's role in providing care to vulnerable groups and in responding to public health needs in major natural disasters and other calamities. We administered CHS to the intervention hospitals in treatment counties and their counterpart hospitals in the control counties at baseline. We repeated the process at endline. 19 World Management Survey (WMS) Hospital Questionnaire In order to measure organizational and management practices, we adopt the World Management Survey Hospital Questionnaire (see Appendix C). Developed by Bloom and Van Reenan (2010), WMS has been applied to over 1000 hospitals in 15 middle-income countries (Bloom et al. 2012), and most recently in nearly 600 clinical departments in the US (McConnell et al. 2013). It interviews hospital managers (CEOs, department chiefs, head nurses, etc.) using a double-blind survey technique, where managers are not told they are being scored and interviewers are not told anything about the organization’s performance. Scores are obtained on 20 questions categorized into four main dimensions: 1) standardization of operations (clinical processes, patient flows, patient feedback); 2) performance monitoring practices (monitoring errors and adverse events, continuous quality improvement practices; performance reviewed and communicated to staff); 3) targets (target clarity, balance and appropriateness); and 4) people management/employee incentives (rewarding high performers, retaining talent). Figure 2. World Management Survey Dimensions. We administered WMS to the same group of treatment and control hospitals surveyed in CHS at baseline and endline. At each hospital, we surveyed two managers selected from department chiefs, head nurses, etc. Their management practices were evaluated and scored from one 20 (“worst practice”) to five (“best practice”) on a pre-defined scoring grid. Scores from the 20 questions were then aggregated into four “dimensional” scores and a total management score. International Comparison of WMS Score We compared the WMS baseline survey data for this study to existing international studies to contextualize the current management situation for hospitals in Guizhou. As is shown in Table 4, the overall management score of the Guizhou sample was 2.43 in 2015, which was significantly worse than most of the HICs (Canada, Italy, Germany, Sweden, US, and UK, in 2012) and better than the two LMICs (India and Brazil, in 2009). When looking at the four dimensions of management practices, hospitals in Guizhou performed better in operation and monitoring dimensions than in target and personnel dimensions, a pattern similar to that found in other studies. Meanwhile, Guizhou surpassed the two LMICs in all four dimensions but lagged behind the average HICs in operation, monitoring, and target dimensions (Figure 3). The personnel dimension was ranked the best among the four dimensions, while the target dimension was the worst. More detailed baseline comparison can be found in Appendix C. Table 4. International comparisons of WMS. Generalized operations Overall Operation Monitoring Target Personnel (overall excl. personnel) US 3.00 3.03 3.21 2.87 2.92 3.04 UK 2.69 2.91 2.99 2.55 2.37 2.81 SW 2.68 2.52 2.99 2.75 2.46 2.77 GE 2.64 2.78 2.85 2.55 2.45 2.72 CA 2.52 2.78 2.82 2.44 2.17 2.67 IT 2.48 2.85 2.67 2.33 2.20 2.60 Guizhou 2.43 (7) 2.56 (7) 2.52 (8) 2.27 (8) 2.40 (4) 2.44 (8) (China) FR 2.40 2.87 2.59 2.29 2.03 2.56 BR 2.19 2.38 2.47 1.99 1.98 2.27 IN 1.90 2.11 2.03 1.55 1.93 1.88 Note: 1) Guizhou data are converted to comparable scores; 2) The rank of the Guizhou (China) is in the parentheses. 21 Note: Guizhou data are converted to comparable score. Figure 3. International comparisons of WMS. Timeline Because the local implementers found the GB+P4Q design complicated, county government officials requested the development of detailed standards of operating procedures (SOPs) and training to assist the implementation. After extensive discussion among our project team and the local team, a decision was made to first conduct a pilot in two non-project counties in 2014-15 to develop detailed SOPs. Subsequent evaluation excluded these two counties. After the detailed SOPs were developed, the intervention was rolled out in two waves due to implementation capacity constraint. The assignment of intervention to counties in waves 1 and 2 was random. As shown in Table 1, 13 counties were assigned to wave 1, and 15 counties to wave 2. Among the 10 wave 1 counties that completed the project, seven started the implementation in January 2016, and three started a few months later. Among the six wave 2 counties that completed the project, five started in January 2017, and one started a few months later. During the pilot, CHS and WMS questionnaires were tested in July 2014 in Zheng An county in Guizhou Province. Pilot results on WMS have been shared and discussed with Dr. Daniela Sur, a core team member of Bloom and Van Reenan. The Chinese translations were then revised and back translated for validation. Formal trainings of the survey team took place in May 2015 and July 2018, respectively. After the pilot testing was completed, we collected baseline WMS and CHS data in 2015, and collected endline WMS and CHS in 2018. We received NCMS inpatient claims data in early 2019. 22 Methods Baseline Balance and Pre-intervention Trend The original pairwise randomization achieved good balance in matching variables between treatment and control counties at baseline. However, balance in these variables could worsen as 12 pairs withdrew from the study at different stages of the project and another three pairs were excluded from statistical analysis due to data availability or data quality issues. Therefore, it was necessary to reassess whether treatment and control counties were still balanced on matching variables. It was also important to assess whether treatment and control counties were balanced on relevant variables that we did not have data on at the time of randomization. These variables include inpatient admission characteristics obtained from claims data, such as non-NCMS-eligible expenditure and the LOS, and hospital characteristics obtained from CHS and WMS, such as number of medical staff and quality of management practices. We assessed the baseline balance using 2014-2015 data for wave 1 counties, and 2014-2016 data for wave 2 counties. A t-test is performed to test for significant differences between treatment and control counties. After examining the baseline balance on a variety of inpatient admission and hospital characteristics (presented in Table 7 in the Results section), we decided that a difference-in- differences (DD) approach would be more appropriate. DD models require that the outcome in treatment and control counties would follow parallel trends in the absence of treatment. While this parallel trend assumption is untestable, previous literature often assess its plausibility by examining the pre-intervention trend in the outcome. We followed this practice and tested for a differential pre-intervention time trend in treatment counties using 2014-2015 data for wave 1 and 2014-2016 data for wave 2. The model is specified as follows: ,@A. = D + E ,@A + G ,@A. + I ,@A × ,@A. + ,@A + ,@A + ,@A. (1) where ,@A. is the outcome of interest for admission i at hospital h in county c in year t, ,@A is a dummy for treatment counties, ,@A. is the year in which admission i is completed, ,@A is a vector of baseline characteristics for hospital h including number of NCMS enrollees, total revenue, revenue from inpatient services, number of beds, number of medical equipment worth more than 10,000 RMB, dummies for management style (decentralization to departments, some decentralization, no decentralization), and dummies for self-perceived degree of competition (fierce, some, none), and ,@A are pair fixed effects. When ,@A. is defined for all admissions from a county, ,@A takes the general hospital’s characteristics because every county has a general hospital, and the general hospital is the single largest provider of inpatient care (see shares of admissions at different providers in Table 7). Standard errors are clustered at the county level. We run Equation (1) with and without controlling for ,@A . Results are presented in Table 8 in the Results section. 23 Model Specifications For Hypothesis 1 and 2: We first used a standard DD model to examine the average treatment effect, and then include more interaction terms to allow for heterogeneity of the treatment effect. The first basic DD model, Model 1, is specified as follows: Model 1: ,@A. = D + E ,@A + G ,@A. + I ,@A. + ,@A + ,@A + ,@A. (2) where ,@A. is the outcome of interest for admission i at hospital h in county c in time period t, ,@A is a dummy for treatment counties, ,@A. is a dummy for post-intervention time periods, ,@A. is an interaction term of treatment and post-intervention dummies, ,@A is a vector of baseline characteristics for hospital h specified as in Equation (1). Again, when ,@A. is defined for all admissions from a county, ,@A takes the general hospital’s characteristics. Standard errors are clustered at the county level. The coefficient of interest is I , which measures the average treatment effect of the intervention on the treated. As the intervention was rolled out in two waves, intervention hospitals in wave 1 were exposed to GB+P4P at an earlier stage than intervention hospitals in wave 2. To examine effect heterogeneity by the length of exposure, we ran a second model including a wave 2 dummy and its interaction with DD terms: Model 2: ,@A. = D + E ,@A + G ,@A. + I ,@A. + D 2,@A + E ,@A × 2,@A + G ,@A. × 2,@A + I ,@A. × 2,@A + ,@A + ,@A + ,@A. (3) where 2,@A is a dummy for wave 2 counties. The coefficients of interest are I and I , which measure the average treatment effect for wave 1 treatment counties, and the differential treatment effect for wave 2 treatment counties, respectively. For Hypotheses 1a-c and 2a-c, we used a DD with heterogeneous effect in model 3: Model 3: ,@A. = D + E ,@A + G ,@A. + I S S ,@A. + ∑ SUV,X,Y E S ,@A + ∑SUV,X,Y G ,@A. × S S S ,@A + E 2,@A + G ,@A. × 2,@A + ∑SUV,X,Y E ,@A. × 2,@A × ,@A + ,@A + ,@A + ,@A. (4) S V where ,@A ( = , , ) is a set of three characteristics for hospital h: ,@A is the de-meaned X total number of medical staff (in 100 persons); ,@A is a dummy for having a large share of Y revenue from NCMS (i.e., among the highest 30%); ,@A is the de-meaned WMS management score, with 1 indicating worst management quality and 5 indicating best management quality. 24 The coefficients of interest include I , which measures the treatment effect for wave 1 intervention hospitals that have an average number of medical staff, a NCMS share in the bottom 70% of the distribution, and an average WMS management score; and G , which measures the differential treatment effect for such intervention hospitals in wave 2. We are also interested in S coefficients E ( = , , ) , which capture the heterogeneous response from wave 1 intervention hospitals with different characteristics. Specifically, having 100 more medical staff V than the mean is associated with a E × 100% change in the outcome; having a large share of X revenue from NCMS (among the highest 30%) is associated with a E × 100% change in the Y outcome, rating 1 point higher than the mean WMS score is associated with a E × 1% change V X Y in the outcome. The last set of coefficients of interest are E , E , and E , which capture how wave 2 intervention hospitals behave differently from wave 1 intervention hospitals with the same characteristics. Based on our hypotheses, the expected sign of some key coefficients is summarized below. Table 5. Hypothesized treatment effects. Outcome variables Main Moderating effects of hospital attributes effects Clinical NCMS Managerial capability revenue capability share V X Y NCMS-eligible expenditure per I <0 E > 0 E < 0 E > 0 or < 0 admission at county hospitals V X Y Non-NCMS-eligible expenditure per I >0 E >0 E <0 E > 0 or < 0 admission at county hospitals V X Y OOP I > or < 0 E >0 E <0 E > 0 or < 0 V X Y Length of stay (LOS) I < 0 E > 0 E < 0 E > 0 or < 0 V X Y Admission at OOC I <0 E <0 E <0 E <0 V X Y Admission at county hospital I > 0 E > 0 E > 0 E > 0 V X Y Total county-level expenditure I <0 E <0 E <0 E <0 For Hypothesis 3, the DD model is specified as follows: @A. = D + E @A + G @A. + I @A. + D 2@A + E A. × 2@A + G @A. × 2@A + I ,A. × 2@A + @A + A + @A. (5) where @A. is the outcome of interest (e.g., staffing, financial statement, governance, leadership, and provider payment method) for hospital h in county c in time period t. 2@A is a dummy for wave 2 hospitals. The coefficients of interest are I and G , which measure the average treatment effect for wave 1 treatment hospitals, and the differential treatment effect for wave 2 treatment counties, respectively. 25 Results Data The NCMS office provided us with NCMS inpatient claims data during 2014-2018 for 16 pairs of treatment and control counties as shown in Table 6. The dataset included local NCMS enrollees’ admissions at both within-county and OOC health care facilities. We excluded pair 6 from statistical analysis, because the control county did not provide pre-intervention data. A closer examination revealed that the data was incomplete for pair 7 in 2014 and pair 11 in 2016. Therefore, we also excluded pairs 7 and 11 due to the lack of reliable pre-intervention data. This left us with 13 county pairs (hereafter referred to as the “full sample”). After dropping observations missing key information (e.g., medical expenditure and facility name), we obtained a sample of 8,064,519 inpatient admissions. Table 6. County pairs/years with claims data. Treatment 2014 2015 2016 2017 2018 Control 2014 2015 2016 2017 2018 county county Panel A. Wave 1 (intervention implemented since 2016) 1 Pingtang X X X X X Yuping X X X X X 2 Zhijin X X X X X Tongzi X X X X X 3 Xishui X X X X X Kaili X X X X X 4 Jianhe X X X X X Congjiang X X X X X 5 Pingba X X X X X Chishui X X X X X 6 Taijiang X X X X X Nayong X X X 7 Suiyang X X X X Weining X X X X X 8 Cengong X X X X X Shibin X X X X X 9 Songtao X X X X X Longli X X X X X 10 Sinan X X X X X Zhengan X X X X X Panel B. Wave 2 (intervention implemented since 2017) 11 Huangping X X X X X Hezhang X X X 12 Yinjiang X X X X X Shiqian X X X X X 13 Leishan X X X X X Tongren X X X X X 14 Sansui X X X X X Libo X X X X X 15 Ziyun X X X X X Tianzhu X X X X X 16 Danzhai X X X X X Fenggang X X X X X Notes: Data are collected and compiled from provincial and county-level NCMS claims database. Pair 6 is dropped due to a lack of pre-intervention data in the control county. Pairs 7 and 11 are dropped due to the low quality of pre-intervention data. As our intervention was implemented at public county hospitals, the main analysis focused on inpatient admissions at County People’s Hospitals and County TCM Hospitals (hereafter referred to as the general hospital subsample and the TCM hospital subsample). Another five 26 pairs were dropped in TCM subsample analysis, because the treatment county (and sometimes also the control county) did not have a TCM hospital. The sample included 1,989,738 inpatient admissions from 13 county pairs for the general hospital subsample, and 514,737 inpatient admissions from eight county pairs for the TCM hospital subsample. For Hypothesis 3, we included hospitals surveyed in CHS and WMS both at baseline and at endline, and matched them with the claims data. This returned a total sample of 63 hospitals (hereafter referred to as the “facility sample”). Power Calculation Ex-ante power calculation was conducted using the log-transformed average total expenditure at county hospitals in the original 28 pairs of treatment and control counties. The parameters were set as follows: baseline difference between treatment and control counties was 0.01 with a standard deviation of 0.09, intra-class correlation was 0.3, and power was 0.8. We calculated the detectable difference at significance levels 0.05 and 0.1 respectively for 23-28 clusters. Results are shown below. Figure 4. Detectable difference in ln(total expenditure). Given 12 pairs of non-compliers and 3 pairs of compliers were dropped in our analysis, we conducted an ex-post power calculation using the log-transformed NCMS-eligible expenditure in the full sample of 13 county pairs. At baseline, the difference between treatment and control counties was 0.16 with a standard deviation of 0.08 and an intra-class correlation of 0.01. Figure 27 5 plots the detectable difference at significance levels 0.05 and 0.1 respectively for 8-13 clusters. The detectable difference increases substantially in comparison to Figure 4. Figure 5. Detectable difference in ln(NCMS-eligible exp). We further conducted power calculation for WMS score at the county level. The baseline difference between treatment and control counties was 0.13 with a standard deviation of 0.09 and an intra-class correlation of 0.23. Figure 6 plots the detectable difference at significance levels 0.05 and 0.1 respectively for 8-13 clusters. Figure 6. Detectable difference in WMS score. 28 Balance at Baseline We used all three data sources to assess the baseline balance in a series of outcomes and hospital characteristics between treatment and control groups. Table 7 presents the means and p-values from a test of equality for treatment and control counties by wave. Panel A presents inpatient admission characteristics obtained from NCMS claims data. Recall that pairs were formed by matching on proportion of admissions, expenditure per admission, and NCMS reimbursement rate at county hospitals. Despite the removal of 12 pairs, there was good balance in these variables, and in expenditure not covered by NCMS and LOS at major county hospitals in both wave 1 and wave 2. This confirmed that the randomization was properly done. 29 Table 7. Baseline balance. Wave 1 Wave 2 N Treatm Control p - Treatm Control p - ent (T) (C) value ent (T) (C) value Panel A: Inpatient admission characteristics Share of admissions at OOC hospitals 0.14 0.18 0.06 0.11 0.13 0.39 3173324 Share of admissions at public county hospitals 0.34 0.27 0.25 0.42 0.46 0.58 3173324 at general hospitals 0.25 0.23 0.63 0.31 0.28 0.56 3173324 at TCM hospitals 0.07 0.04 0.09 0.11 0.15 0.54 3173324 Share of admissions at specialized/private county hospitals 0.14 0.21 0.07 0.18 0.12 0.17 3173324 Share of admissions at township health centers 0.39 0.34 0.51 0.30 0.29 0.99 3173324 NCMS-eligible expenditure 2657 3022 0.11 2588 2805 0.40 3457571 NCMS-eligible expenditure at public county hospitals 2913 3086 0.43 2831 2935 0.75 1154237 at general hospitals 2950 3194 0.33 2930 2943 0.97 848203 at TCM hospitals 2738 2400 0.09 2533 2485 0.85 280923 Non-NCMS-eligible expenditure 164 218 0.34 162 212 0.15 3457571 Non-NCMS-eligible expenditure at public county hospitals 224 241 0.74 248 281 0.66 1154237 at general hospitals 245 260 0.78 261 271 0.92 848203 at TCM hospitals 165 120 0.48 207 193 0.69 280923 NCMS reimbursement rate 0.66 0.66 0.91 0.66 0.61 0.10 3457571 NCMS reimbursement rate at public county hospitals 0.63 0.62 0.77 0.64 0.59 0.16 1154237 at general hospitals 0.62 0.61 0.61 0.63 0.61 0.52 848203 at TCM hospitals 0.65 0.70 0.33 0.65 0.58 0.00 280923 OOP payment 1072 1309 0.08 1009 1270 0.14 3457571 OOP payment at public county hospitals 1052 1160 0.35 993 1276 0.26 1154237 at general hospitals 1091 1230 0.29 1030 1166 0.52 848203 at TCM hospitals 923 719 0.25 882 1039 0.15 280923 LOS 7.2 7.8 0.24 7.2 7.9 0.01 3152254 LOS at public county hospitals 7.4 7.9 0.56 7.3 7.7 0.47 1108210 at general hospitals 7.4 7.8 0.67 7.2 7.3 0.86 813489 at TCM hospitals 7.3 8.8 0.08 7.5 7.7 0.85 223527 Panel B: Hospital characteristics Total revenue at general hospitals 11020 9996 0.69 7632 9770 0.42 31 Revenue from NCMS at general hospitals 4565 3814 0.57 3349 3298 0.96 31 % total revenue from NCMS at general hospitals 0.37 0.37 0.97 0.42 0.36 0.28 31 % revenue from inpatient services at general hospitals 6812 5950 0.61 4852 6415 0.37 31 # beds in operation at general hospitals 501 418 0.26 341 450 0.12 31 # ICU beds at general hospitals 9 8 0.54 8 8 0.84 30 # medical professionals at general hospitals 442 347 0.13 262 349 0.22 31 # departments at general hospitals 21 20 0.66 21 23 0.75 31 # medical equipment worth > 10,000 RMB at general hospitals 243 180 0.37 197 324 0.38 31 % reporting facing a high level of competition at general hospitals 0.6 0.5 0.82 0.7 0.5 0.60 31 % reporting no decentralization in hospital management at general hospitals 0.8 0.7 0.72 0.5 0.2 0.26 31 WMS management score at general hospitals 2.8 2.7 0.83 2.7 2.8 0.62 31 People score at general hospitals 2.7 2.7 0.91 2.6 2.9 0.02 31 Targets score at general hospitals 2.6 2.7 0.69 2.7 2.5 0.48 31 Monitoring score at general hospitals 2.8 2.8 0.94 2.7 2.9 0.33 31 Operations score at general hospitals 3.0 2.8 0.20 2.9 2.9 0.78 31 Total revenue at TCM hospitals 4653 1697 0.02 5004 3495 0.37 26 Revenue from NCMS at TCM hospitals 1331 537 0.23 1852 1658 0.86 24 % total revenue from NCMS at TCM hospitals 0.3 0.3 0.70 0.4 0.4 1.00 24 % revenue from inpatient services at TCM hospitals 2425 873 0.09 2789 2106 0.58 26 # beds in operation at TCM hospitals 257 118 0.06 256 223 0.73 26 # ICU beds at TCM hospitals 2.1 0.0 0.17 2.0 0.0 0.34 22 # medical professionals at TCM hospitals 201.4 95.4 0.03 168.5 172.0 0.95 26 # departments at TCM hospitals 13 10 0.27 15 13 0.72 26 # medical equipment worth > 10,000 RMB at TCM hospitals 156 35 0.02 75 94 0.79 26 % reporting facing a high level of competition at TCM hospitals 0.8 0.7 0.73 0.8 1.0 0.34 26 % reporting no decentralization in hospital management at TCM hospitals 0.6 0.6 0.79 0.5 0.4 0.80 26 WMS management score at TCM hospitals 2.9 2.3 0.01 2.8 2.5 0.39 25 People score at TCM hospitals 2.8 2.3 0.00 2.7 2.3 0.22 25 Targets score at TCM hospitals 2.8 2.1 0.03 2.7 2.5 0.48 25 Monitoring score at TCM hospitals 2.9 2.4 0.04 2.8 2.7 0.67 25 Operations score at TCM hospitals 2.9 2.5 0.02 3.0 2.7 0.47 25 OOC = out-of-county; TCM = Traditional Chinese Medicine; NCMS = New Rural Cooperative Medical Scheme; OOP = out-of-pocket payment; LOS = length of stay; WMS = World Management Survey 30 However, there were some significant differences at both more aggregate and more disaggregated levels. For example, in wave 1, treatment and control counties differed at the 10% level of significance in the proportion of admissions at OOC hospitals, TCM hospitals, and other county hospitals (i.e., specialized public and private county hospitals). Wave 1 treatment and control counties also differed at the 10% level of significance in average total expenditure both on the county level and at TCM hospitals, in OOP payments on the county level, and in LOS at TCM hospitals. It appears that compared to TCM hospitals in control counties, those in the treatment counties were stronger in the sense that they attracted more patients and have higher expenditures, and more efficient in the sense that they had shorter LOS. In wave 2, general and TCM hospitals were fairly comparable between treatment and control counties. Treatment and control counties only differed significantly (at the 1% level of significance) in LOS on the county level. Panel B of Table 7 presents hospital characteristics obtained from CHS and WMS. Consistent with the patterns in Panel A, general hospitals in wave 1 treatment and control counties did not differ significantly, but TCM hospitals did. Compared to TCM hospitals in control counties, those in treatment counties (1) had a much higher total revenue, and a larger share of revenue from inpatient services; (2) were better equipped with significantly more beds, more medical staff, and more expensive medical equipment; and (3) were rated higher in overall WMS management score, and all of its four components. A comparison with TCM hospitals from wave 2 counties suggests that TCM hospitals in wave 1 control counties were distinctively smaller in capacity. In wave 2, there was no significant difference between treatment and control counties, except for one component – the people domain – of the WMS score. We accounted for the differences between public county hospitals in treatment and control counties by including a variety of hospital characteristics in all our models. Pre-intervention Trend Table 8 presents the coefficient estimates of the time trend and the interaction term of Model 1 for testing the parallel trend assumption for the full sample. 31 Table 8. Pre-intervention trend. Full sample Wave1 Wave2 Year Treat x Year Treat x N Year Treat x Year Treat x N1 Year Treat x Year Treat x N2 trend Year trend Year trend Year trend Year trend Year trend Year ln(NCMS-eligible exp) 0.057* 0.056 0.061* 0.037 3373549 0.092* -0.001 0.084* 0.012 1937995 0.054* 0.063 0.053* 0.047 1435554 (0.023) (0.050) (0.016) (0.037) (0.017) (0.053) (0.014) (0.056) (0.024) (0.055) (0.020) (0.048) Share of admissions at OOC 0.012 0.001 0.008 0.007 3088464 0.014 -0.005 0.015 -0.007 1790756 0.006 0.017 0.006 0.015 1297708 hospitals (0.007) (0.011) (0.006) (0.009) (0.010) (0.015) (0.010) (0.015) (0.007) (0.011) (0.007) (0.011) Share of admissions at general 0.031* 0.011 0.029* 0.014 3088464 0.047* -0.005 0.049* -0.006 1790756 0.022* 0.024* 0.022* 0.021* 1297708 hospitals (0.007) (0.014) (0.006) (0.009) (0.013) (0.018) (0.013) (0.019) (0.004) (0.008) (0.004) (0.007) Share of admissions at TCM -0.006 0.030* 0.008 0.002 3088464 -0.005 0.019 -0.005 0.02 1790756 0.014 -0.007 0.013 -0.006 1297708 hospitals (0.008) (0.010) (0.006) (0.007) (0.008) (0.011) (0.008) (0.010) (0.008) (0.008) (0.007) (0.008) ln(NCMS-eligible exp) at general -0.006 -0.009 -0.012 0.005 818764 0.016 -0.032 0.02 -0.035 434503 -0.021 0.015 -0.021 0.018 384261 hospitals (0.017) (0.023) (0.014) (0.022) (0.025) (0.039) (0.024) (0.038) (0.016) (0.025) (0.016) (0.026) ln(Non-NCMS-eligible exp) at 0.036 -0.104 -0.014 -0.023 768731 0.154 -0.26 0.076 -0.205 402316 -0.031 0.013 -0.025 0.02 366415 general hospitals (0.106) (0.153) (0.111) (0.144) (0.119) (0.238) (0.109) (0.229) (0.145) (0.173) (0.146) (0.176) ln(OOP) at general hospitals 0.047 -0.041 0.04 -0.028 816920 0.130* -0.085 0.119* -0.075 433088 0.021 -0.036 0.022 -0.032 383832 (0.046) (0.046) (0.051) (0.053) (0.044) (0.052) (0.048) (0.055) (0.065) (0.065) (0.065) (0.067) LOS at general hospitals -0.134 -0.176 -0.185 -0.074 783953 0.163 0.074 0.186 0.05 398696 -0.289* -0.196 -0.303* -0.161 385257 (0.118) (0.356) (0.111) (0.296) (0.264) (0.262) (0.275) (0.289) (0.095) (0.416) (0.091) (0.413) ln(NCMS-eligible exp) at TCM 0.021 -0.01 0.013 -0.001 225504 0.082 -0.137 0.022 -0.076 100095 0.012 0.031 0.012 0.031 125409 hospitals (0.018) (0.035) (0.019) (0.034) (0.069) (0.077) (0.051) (0.060) (0.023) (0.048) (0.023) (0.048) ln(Non-NCMS-eligible exp) at TCM -0.149* 0.177 -0.145* 0.162 205522 -0.125 0.283 -0.246 0.401 82448 -0.135 0.098 -0.134 0.099 123074 hospitals (0.062) (0.103) (0.063) (0.106) (0.145) (0.194) (0.182) (0.226) (0.077) (0.139) (0.077) (0.139) ln(OOP) at TCM hospitals 0.035 0.033 0.063 -0.032 225108 0.536 -0.457 0.295 -0.212 99701 0.037 -0.031 0.037 -0.03 125407 (0.054) (0.064) (0.066) (0.073) (0.303) (0.317) (0.233) (0.243) (0.065) (0.067) (0.065) (0.067) LOS at TCM hospitals 0.136 -0.263 0.034 -0.09 213827 0.876 -1.019 0.55 -0.666 88336 0 -0.041 -0.002 -0.033 125491 (0.103) (0.148) (0.082) (0.137) (0.753) (0.736) (0.876) (0.883) (0.068) (0.173) (0.069) (0.170) Paired FE X X X X X X X X X X X X Controls X X X X X X OOC = out-of-county; TCM = Traditional Chinese Medicine; OOP = out-of-pocket payment; LOS = length of stay Notes: Robust standard errors in parentheses; Controls include number of NCMS enrollees, total revenue, revenue from inpatient services, number of beds, number of medical equipment worth more than 10,000 RMB, dummies for management style (decentralization to departments, some decentralization, no decentralization), and dummies for self-perceived degree of competition (fierce, some, none). * p < 0.05 For most of the outcomes, there was no differential time trend in treatment counties before the intervention, with and without controlling for hospital characteristics. However, results in column 2 show that, without controlling for hospital characteristics, there was a significant upward trend in the proportion of admissions and a significant downward trend in LOS at TCM hospitals in treatment counties. Column 4 shows that, after controlling for hospital characteristics, the differential time trends disappeared. This suggests that the differential time trend was mostly attributable to the treatment and control counties having different TCM hospitals (by CHS and WMS metrics), rather than similar TCM hospitals in treatment and control counties behaving differently. We further tested the pre-trend for wave 1 and wave 2 counties separately. We found no differences in the pre-trend between treatment and control counties in wave 1, with and without controlling for hospital characteristics. However, for wave 2, there was a significant but small difference in the pre-trend in the share of admissions at general hospitals between treatment and control counties, both with and without controlling for hospital characteristics. Overall, results suggest that, after conditioning on hospital characteristics, time trends in our outcomes were largely parallel before the intervention. 32 Pre-post Comparison of Outcome Variables and Hospital Attributes We first present a simple comparison of the before-after changes in treatment and control counties before showing the regression estimates. Table 9 presents the descriptive statistics for inpatient admission characteristics. The first five rows of both panels show a consistent overall increase in NCMS-eligible expenditure and shares of admissions at OOC and public county hospitals (as opposed to lower-level providers such THCs) in both treatment and control counties. The increase appears to be slightly larger in control counties in wave 1, and in treatment counties in wave 2. Table 9. Pre-post comparison of admission-related variables. Treatment Control Dif in N Post Pre Dif Post Pre Dif changes Panel A. Wave 1 NCMS-eligible expenditure 3000 2657 343 3454 3022 432 -89 5363829 Share of admissions at OOC hospitals 0.14 0.14 0.00 0.20 0.18 0.01 -0.01 4745101 Share of admissions at public county hospitals 0.39 0.34 0.05 0.33 0.27 0.06 0 4745101 at general hospitals 0.30 0.26 0.04 0.29 0.23 0.05 -0.01 4745101 at TCM hospitals 0.09 0.07 0.01 0.04 0.04 0.00 0.01 4745101 NCMS-eligible expenditure at general hospitals 3279 2950 330 3374 3194 180 150 1293063 Non-NCMS-eligible expenditure at general hospitals 273 245 28 250 260 -11 38 1293063 OOP payment at general hospitals 1110 1091 18 1048 1230 -182 201 1293063 LOS at general hospitals 7.7 7.4 0.2 8 7.8 0.2 0 1221602 NCMS-eligible expenditure at TCM hospitals 3089 2738 351 2749 2400 350 1 302453 Non-NCMS-eligible expenditure at TCM hospitals 193 165 28 110 120 -10 39 302453 OOP payment at TCM hospitals 933 923 10 722 719 3 7 302453 LOS at TCM hospitals 7.2 7.3 -0.1 9.5 8.8 0.7 -0.8 265275 Panel B. Wave 2 NCMS-eligible expenditure 3121 2588 533 3029 2805 224 309 2700690 Share of admissions at OOC hospitals 0.14 0.11 0.03 0.13 0.13 0.00 0.03 2234757 Share of admissions at public county hospitals 0.52 0.42 0.10 0.50 0.46 0.04 0.06 2234757 at general hospitals 0.38 0.31 0.07 0.30 0.28 0.01 0.05 2234757 at TCM hospitals 0.14 0.11 0.04 0.17 0.15 0.02 0.01 2234757 NCMS-eligible expenditure at general hospitals 3082 2930 152 3041 2943 98 54 696675 Non-NCMS-eligible expenditure at general hospitals 231 261 -30 242 271 -28 -2 696675 OOP payment at general hospitals 903 1030 -127 1039 1166 -127 0 696675 LOS at general hospitals 7 7.2 -0.2 7.2 7.3 -0.2 0 699046 NCMS-eligible expenditure at TCM hospitals 2933 2533 400 2963 2485 478 -78 316310 Non-NCMS-eligible expenditure at TCM hospitals 213 207 6 177 193 -16 22 316310 OOP payment at TCM hospitals 836 882 -46 906 1039 -134 88 316310 LOS at TCM hospitals 7.5 7.5 0 7.5 7.7 -0.1 0.2 227013 OOC = out-of-county; TCM = Traditional Chinese Medicine; OOP = out-of-pocket payment; LOS = length of stay Table 10 examines and compares the descriptive statistics for hospital characteristics. Both panels first present the change in overall WMS management score and all of its four components. 33 Panel A shows that both treatment and control hospitals in wave 1 experienced an increase in the rating of WMS management score after the treatment. However, the increase in control hospitals was larger in treatment hospitals. Wave 2 results in Panel B also showed similar patterns in the management score. The rest of both panels present the hospital characteristics that could be affected by PPM introduction, including hospital staffing, beds and equipment, revenue, expenses, and surplus. Hospitals post-treatment overall have more staff members (total, medical), beds (total, ICU), medical equipment (value and number), departments, revenue (total, from outpatient/inpatient services, from NCMS), and expenses. Treatment hospitals seemed to experience faster growth in staffing, beds, and equipment than control hospitals in both waves. However, the revenue increased from inpatient services and NCMS in treatment hospitals in wave 2 is lower than its comparators. The above simple comparisons of inpatient admission and hospital characteristics did not control for any baseline difference between treatment and control counties. In the next few sections, we present a more in-depth DD comparison after controlling for key baseline characteristics. 34 Table 10. Pre-post comparison of WMS and other hospital attributes. Treatment Control Dif in N Post Pre Dif Post Pre Dif changes Panel A. Wave 1 WMS overall score 2.91 2.80 0.11 2.79 2.53 0.26 -0.15 72 WMS people management/employee incentives 2.74 2.75 -0.01 2.69 2.51 0.18 -0.20 72 WMS targets setting 2.85 2.72 0.13 2.64 2.44 0.20 -0.08 72 WMS performance monitoring 3.07 2.83 0.24 3.00 2.58 0.42 -0.19 72 WMS standardization of operations 3.05 2.94 0.11 2.84 2.62 0.22 -0.12 72 Number of staffs on duty 532 395 137 417 279 138 -1 73 Number of medical professionals 449 329 120 358 228 130 -10 73 number of newly recruited medical practitioners‘ 18 19 -2 15 6 8 -10 73 number of newly recruited assistant medical practitioners 5 6 -1 3 4 -1 0 67 Number of beds in operation 480 386 94 356 276 80 14 73 Number of ICU beds 8 6 2 6 4 2 1 71 Total value of medical equipment with value greater than 10,000 RMB 6715 3016 3699 4315 1551 2764 935 72 Number of medical equipments with value greater than 10,000 RMB 443 202 241 212 111 101 140 72 Number of departments 22 17 5 17 15 2 4 73 Total revenue 14502 8024 6478 10553 6065 4488 1990 73 Revenue from outpatient services 3492 1946 1546 2406 1440 966 580 73 Revenue from inpatient services 7849 4747 3101 5552 3545 2007 1094 73 Revenue from insurance 6028 3872 2156 4467 2833 1634 522 71 Revenue from NCMS insurance 4952 3150 1802 3576 2262 1314 488 71 Total expense 14049 7250 6799 9624 6867 2757 4042 73 Total surplus as share of total medical service revenue 6 12 -6 15 8 7 -13 61 Panel B. Wave 2 WMS overall score 2.75 2.74 0.01 2.82 2.67 0.14 -0.14 42 WMS people management/employee incentives 2.64 2.64 0.01 2.76 2.65 0.11 -0.10 42 WMS targets setting 2.60 2.71 -0.11 2.73 2.48 0.25 -0.36 42 WMS performance monitoring 2.89 2.75 0.13 2.94 2.82 0.12 0.01 42 WMS standardization of operations 2.92 2.93 -0.01 2.87 2.77 0.09 -0.10 42 Number of staffs on duty 401 278 123 435 331 104 19 42 Number of medical professionals 340 225 115 373 269 104 11 42 number of newly recruited medical practitioners‘ 22 7 14 10 7 4 11 40 number of newly recruited assistant medical practitioners 2 5 -3 1 16 -15 12 38 Number of beds in operation 416 307 110 455 347 108 2 42 Number of ICU beds 8 6 2 4 5 -1 2 39 Total value of medical equipment with value greater than 10,000 RMB 10279 2353 7926 3630 2578 1052 6874 40 Number of medical equipments with value greater than 10,000 RMB 341 148 193 353 219 134 59 39 Number of departments 21 19 2 21 18 2 0 42 Total revenue 16311 6581 9730 10820 6918 3903 5827 42 Revenue from outpatient services 2317 1345 972 2673 1499 1173 -201 42 Revenue from inpatient services 5159 4027 1133 6367 4457 1911 -778 42 Revenue from insurance 4134 6402 -2268 5133 3085 2048 -4316 41 Revenue from NCMS insurance 3872 2750 1122 4358 2642 1716 -594 40 Total expense 9751 5669 4082 10314 6809 3505 577 42 Total surplus as share of total medical service revenue 19 19 -1 7 8 -1 0 35 WMS = World Management Survey; NCMS = New Rural Cooperative Medical Scheme; OOP = out-of-pocket payment; ICU = Intensive Care Unit 35 Hypothesis 1, 1a-1c: Difference-in-differences Results The results for Equations 2-4 (corresponding to Models 1 and 3) for county general and TCM hospitals are presented in Tables 11 and 12, respectively. Table 11. Difference-in-differences (DD) estimates for general hospital admissions. ln(NCMS- ln(Non-NCMS- ln(OOP) LOS eligible exp) eligible exp) Model 1 DD 0.062 0.105 0.121 0.03 (0.036) (0.216) (0.076) (0.543) Model 2 DD 0.058 0.16 0.14 0.095 (0.039) (0.279) (0.094) (0.662) DD x Wave2 0.011 -0.177 -0.062 -0.206 (0.036) (0.330) (0.097) (0.831) Model 3 DD 0.056 0.309 0.179 -0.592 (0.045) (0.388) (0.106) (0.663) DD x staff (in 100) 0.019 -0.107 0.007 0.265 (0.013) (0.128) (0.034) (0.245) DD x high NCMS share -0.077* -0.09 -0.189 1.823 (0.036) (0.471) (0.150) (1.035) DD x WMS score -0.063 0.045 0.356 -2.277 (0.067) (0.730) (0.201) (1.258) DD x Wave2 0.063 0.706 0.385* 3.295* (0.041) (0.482) (0.097) (1.407) DD x Wave2 x staff (in 100) 0.049 0.958* 0.395* 2.625* (0.025) (0.307) (0.036) (1.215) DD x Wave2 x high NCMS share 0.094 -1.330* -0.453* -4.374* (0.048) (0.632) (0.151) (2.085) DD x Wave2 x WMS score 0.08 2.269 -0.772* 3.367 (0.106) (1.311) (0.206) (4.894) N 1989738 1899585 1983206 1920648 NCMS = New Rural Cooperative Medical Scheme; WMS = World Management Survey; LOS = length of stay; OOP = out-of-pocket payment Notes: This admission level analysis uses 13 pairs of general hospitals. DD refers to the interaction term of treatment dummy and post dummy, "staff" refers to the demean number of medical staff in 100, "high NCMS share" refers to being in the top 30% of the distribution of revenue from NCMS. All models control for pair fixed effects and baseline hospital characteristics, including number of NCMS enrollees, total revenue, revenue from inpatient services, number of beds, number of medical equipment worth more than 10,000 RMB, dummies for management style (decentralization to departments, some decentralization, no decentralization), and dummies for self-perceived degree of competition (fierce, some, none). Standard errors are clustered at the county level and shown in parentheses. * p < 0.05 In Table 11, the results for Model 1 suggest that the payment intervention led to a noticeable but insignificant increase in NCMS-eligible total expenditure, non-NCMS-eligible expenditure, and 36 OOP payment, and negligible change in LOS at general hospitals. These changes were not significantly different between wave 1 and wave 2 hospitals. Model 3 examined differential impacts of the payment interventions by hospital characteristics (as laid out in hypothesis 1a-1c). To proxy for a hospital’s clinical capability, we used the number of medical staff that a hospital had. To measure NCMS revenue share, we used a dummy variable to indicate that the hospital had a NCMS share greater than 70% of the hospitals in the sample. To measure managerial capability, we used the WMS management score. For both the medical staff and WMS variables, their values were de-meaned. For NCMS revenue share, using different cutoffs (i.e., 60% and 80%) hardly changed the results (see Appendix D for DD estimates). For wave 1 hospitals, the results showed some sign of differential response by hospital characteristics. For example, hospitals with a high NCMS share significantly reduced NCMS- eligible expenditure, and hospitals with higher WMS management scores saw a sizeable, though insignificant, decrease in LOS. However, in general, responses to our intervention did not seem to vary by hospital attributes among wave 1 general hospitals. On the other hand, results for wave 2 hospitals were more often statistically significant and generally consistent with our hypotheses. For example, hospitals with a larger number of medical staff members increased non-NCMS-eligible expenditure (an indicator for cost shifting) and therefore increased OOP payments and LOS more than hospitals with a smaller number of medical staff members. Hospitals with a high NCMS revenue share, as hypothesized, have less room for cost shifting and therefore increase non-NCMS-eligible expenditure by a smaller amount compared with hospitals with a low NCMS revenue share. The net effect then is that the OOP payments and LOS increased less (or reduced more). However, the results for hospitals with different WMS scores were inconsistent with expectation. We illustrate the interpretation of the DD estimates by hospital characteristics for both waves in Figures 7-9. The DD estimates were obtained at the mean number of medical staff (358 persons), the mean WMS score (2.75), and a low share of revenue from NCMS (among the lowest 70%). 37 Figure 7. Difference-in-Differences (DD) estimates by clinical capability. % change in eligible exp % change in ineligible exp .15 1 .1 .5 .05 0 0 -.05 -.5 Low share High share Low share High share Share of revenue from NCMS Share of revenue from NCMS % change in OOP exp Change in LOS (days) .6 3 .4 2 .2 1 0 0 -.2 -1 Low share High share Low share High share Share of revenue from NCMS Share of revenue from NCMS Wave 1 Wave 2 Figure 8. Difference-in-Differences (DD) estimates by NCMS revenue share. 38 Figure 9. Difference-in-Differences (DD) estimates by managerial capability. 39 Table 12. Difference-in-differences (DD) estimates for TCM hospital admissions. ln(NCMS- ln(Non-NCMS- ln(OOP) LOS eligible exp) eligible exp) Model 1 DD 0.062* 0.043 0.071 -0.309 (0.021) (0.197) (0.156) (0.337) Model 2 DD 0.045* -0.051 0.095 -0.399 (0.020) (0.238) (0.165) (0.400) DD x Wave2 0.046* 0.232 -0.063 0.233 (0.020) (0.238) (0.125) (0.282) Model 3 DD 0.089* -0.496 -0.119 0.727 (0.019) (0.328) (0.202) (0.409) DD x staff (in 100) -0.041* 0.675 0.381 -1.701* (0.008) (0.730) (0.199) (0.507) DD x high NCMS share 0.043* -0.004 -0.296 2.054* (0.009) (0.714) (0.183) (0.477) DD x WMS score -0.156* 0.443 0.335 -3.761* (0.025) (0.747) (0.402) (0.760) DD x Wave2 -0.012 0.473 0.302 -2.719* (0.012) (0.952) (0.290) (0.691) DD x Wave2 x staff (in 100) 0.041* -0.476 -0.253 2.452* (0.011) (0.870) (0.262) (0.629) DD x Wave2 x high NCMS share - - - - - - - - DD x Wave2 x WMS score - - - - - - - - N 514737 489640 514093 492288 NCMS = New Rural Cooperative Medical Scheme; WMS = World Management Survey; LOS = length of stay; OOP = out-of-pocket payment Notes: This admission level analysis uses eight pairs of Traditional Chinese Medicine hospitals. DD refers to the interaction term of treatment dummy and post dummy, "staff" refers to the demean number of medical staff in 100, "high NCMS share" refers to being in the top 30% of the distribution of revenue from NCMS. All models control for pair fixed effects and baseline hospital characteristics, including number of NCMS enrollees, total revenue, revenue from inpatient services, number of beds, number of medical equipment worth more than 10,000 RMB, dummies for management style (decentralization to departments, some decentralization, no decentralization), and dummies for self-perceived degree of competition (fierce, some, none). Standard errors are clustered at the county level and shown in parentheses. * p < 0.05 Table 12 presents the results for TCM hospitals. Model 1 shows that the payment intervention led to a significant increase in NCMS-eligible expenditure per admission, moderate increase in non-NCMS-eligible expenditure and OOP payments, and a reduction in LOS, although not all of them were statistically significant. Model 2 shows a similar pattern, and that the increase in 40 NCMS-eligible expenditure was significantly larger in wave 2, and again, without significantly increasing non-NCMS-eligible expenditure, OOP payments, and LOS. Model 3 further examines the effect heterogeneity by hospital characteristics. While wave 1 TCM hospitals in general saw an increase in NCMS-eligible total expenditure, the increase was smaller when the hospital had more medical staff or better management. The sizable increase in non- NCMS-eligible expenditure and OOP payments, though insignificant, suggests that TCM hospitals may have increased their use of uncovered drugs and treatments. The changes in expenditures were accompanied by a significant decrease in LOS. In wave 2, the intervention led to an overall moderate (and statistically insignificant) reduction in NCMS-eligible expenditure, but to an increase at hospitals with more medical staff members. This is probably because TCM hospitals with more medical staff were able to attract more complicated patients, thus increasing expenditure in general and correspondingly, LOS. The interaction terms with NCMS revenue share and WMS were dropped during model estimation due to insufficient variation. Hypothesis 2, 2a-2c: Difference-in-Differences Results Table 13 presents the results for hypotheses 2 (and 2a-2c). Model 1 shows that on average, the intervention had no effect on patient flow. Model 2 shows that for wave 2 counties, the intervention increased the share of patient flow at county hospitals, as hypothesized, but also the patient flow to OOC hospitals, which was inconsistent with expectations. Model 3 further examines the heterogeneous effects by hospital characteristics. For wave 1 counties, the treatment effect did not seem to vary with the clinical competency of county hospitals (as proxied by the number of medical staff members at general hospitals). Having better managed hospitals was associated with a bigger increase in TCM shares. Overall, the results for wave 1 appear mixed. The results for wave 2 counties seem more consistent with expectations. The payment intervention led to a significant reduction in the share of OOC admissions with an associated reduction in total health expenditure for NCMS. Counties with better clinical competency further reduced the share of OOC admissions whereas counties with high NCMS share responds to the intervention with an increase in OOC share, compared to those with smaller NCMS share. The results for the admission share at general and TCM hospitals are not all consistent with expectations. One plausible explanation is that either or both county hospitals respond by attracting more OOC admissions, while also shifting some cases to the next level—the township health centers. There can also be risk selection between the general and TCM hospitals as well. Our discussions and interviews with the local providers suggest that TCM hospitals tend to refer sicker patients to county general hospitals. 41 Table 13. Difference-in-differences (DD) estimates for all admissions. ln(NCMS- Share of Share of Share of eligible exp) admissions at admissions at admissions at OOC hospitals general hospitals TCM hospitals Model 1 DD 0.074 -0.004 0.001 0.000 (0.042) (0.015) (0.016) (0.012) Model 2 DD 0.053 -0.015 -0.011 -0.002 (0.040) (0.014) (0.017) (0.013) DD x Wave 2 0.087 0.049* 0.052* 0.007 (0.077) (0.016) (0.020) (0.013) Model 3 DD 0.058 -0.011 0.000 -0.003 (0.054) (0.015) (0.023) (0.014) DD x staff (in 100) 0.002 0.001 0.003 -0.006 (0.021) (0.002) (0.008) (0.009) DD x high NCMS share -0.075 0.007 -0.012 -0.043* (0.057) (0.010) (0.026) (0.008) DD x WMS score -0.144 -0.007 -0.032 0.050* (0.095) (0.016) (0.030) (0.023) DD x Wave2 -0.575* -0.076* -0.035 -0.013 (0.135) (0.027) (0.032) (0.009) DD x Wave2 x staff (in 100) -0.592* -0.092* -0.067* -0.013 (0.138) (0.024) (0.025) (0.012) DD x Wave2 x high NCMS share 0.858* 0.142* 0.055 0.083* (0.185) (0.039) (0.046) (0.007) DD x Wave2 x WMS score 0.962* 0.004 -0.064 - (0.334) (0.107) (0.112) - N 8064519 6979858 6979858 6979858 OOC = out-of-county; TCM = Traditional Chinese Medicine; NCMS = New Rural Cooperative Medical Scheme; WMS = World Management Survey Notes: This admission level analysis uses 13 pairs of counties. DD refers to the interaction term of treatment dummy and post dummy, "staff" refers to the demean number of medical staff in 100, "high NCMS share" refers to being in the top 30% of the distribution of revenue from NCMS. All models control for pair fixed effects and baseline hospital characteristics, including number of NCMS enrollees, total revenue, revenue from inpatient services, number of beds, number of medical equipment worth more than 10,000 RMB, dummies for management style (decentralization to departments, some decentralization, no decentralization), and dummies for self-perceived degree of competition (fierce, some, none). Columns (1)-(3) control for the characteristics of general hospitals. Colume (4) controls for the characteristics of the Traditional Chinese Medicine hospitals. Standard errors are clustered at the county level and shown in parentheses. * p < 0.05 Hypothesis 3: Hospital governance and management To further explore the potential mechanisms through which the payment intervention affects the individual outcome, we examined the policy impact on hospital level intermediate outcomes 42 using a DD model (Equation 5). Tables 14 and 15 present the results for county general and TCM hospitals respectively. The first column DD results shows the treatment effect for hospitals in wave 1 and the fourth column DD x wave2 shows the additional effect for hospitals in wave 2 compared to wave 1. The treatment effects were similar between wave 1 and 2 even though hospitals in wave 2 were exposed to the policy one year later. The WMS score ratings did not appear to be influenced by the implementation of payment reform, except that treatment TCM hospitals had a poorer score in personnel management after the intervention. The majority of the hospital characteristics, such as staffing, beds, and equipment, were not influenced either. However, hospital financial statements—such as expenses and surplus—did change. The total revenue for TCM hospitals did not change, but their total expense increased after the intervention. This reduced the total surplus as share of total medical service revenue. Also, there was an increase in the revenue from NCMS insurance for treatment TCM hospitals. The total expense in general hospitals also increased after the intervention, but their surplus as share of total medical service revenue did not decrease, which suggests that the increase of total expense was not driven by medical services cost. 43 Table 14. Difference-in-differences (DD) estimates for general hospital characteristics. DD S.E. p value DD x S.E. p value N wave2 Panel A. World Management Survey Overall score 0.18 0.18 0.33 -0.08 0.29 0.80 63 People management/employee incentives 0.12 0.16 0.45 0.11 0.25 0.68 63 Targets setting 0.30 0.27 0.28 -0.56 0.44 0.21 63 Performance monitoring 0.22 0.24 0.38 0.03 0.39 0.94 63 Standardization of operations 0.07 0.23 0.77 0.13 0.38 0.74 63 Panel B. County Hospital Survey Number of staffs on duty -18 62 0.78 15 100 0.88 63 Number of medical professionals -31 48 0.53 -20 77 0.80 63 number of newly recruited medical practitioners‘ -25 13 0.08 35 22 0.12 62 number of newly recruited assistant medical practitioners -3 5 0.58 3 8 0.74 55 Number of beds in operation 28 66 0.67 -65 106 0.54 63 Number of ICU beds 1 1 0.57 0 2 0.88 62 Total value of medical equipment with value greater than 10,000 RMB 947 4544 0.84 8377 7227 0.26 62 Number of medical equipments with value greater than 10,000 RMB 242 160 0.14 -121 263 0.65 61 Number of departments 5 3 0.10 -7 4 0.15 63 Total revenue 1936 2062 0.36 -2363 3325 0.48 63 Revenue from outpatient services 770 597 0.21 -1522 963 0.12 63 Revenue from inpatient services 1140 1574 0.47 -3314 2539 0.20 63 Revenue from insurance 565 1859 0.76 -2958 2949 0.32 62 Revenue from NCMS insurance 132 1815 0.94 -2248 2878 0.44 62 Total expense 3851 1990 0.06 -4297 3209 0.19 63 Total surplus as share of total medical service revenue -15 10 0.17 21 17 0.24 50 WMS = World Management Survey; CHS = County Hospital Survey; NCMS = New Rural Cooperative Medical Scheme; ICU = Intensive Care Unit Note: This facility level analysis uses Model 6, which includes the interaction of the wave 2 dummy with the DD interaction term. All analyses control for pair fixed effects and baseline hospital characteristics, namely number of NCMS enrollees, total revenue, revenue from inpatient services, number of beds, number of medical equipment worth more than 10,000 RMB, dummies for management style (decentralization to departments, some decentralization, no decentralization), and dummies for self-perceived degree of competition (fierce, some, none). * p < 0.05 44 Table 15. Difference-in-differences (DD) estimates for TCM hospital characteristics. DD S.E. p value DD x S.E. p value N wave2 WMS Overall score -0.43 0.27 0.12 -0.05 0.45 0.91 51 People management/employee incentives -0.52 0.29 0.09 -0.03 0.48 0.96 51 Targets setting -0.38 0.31 0.24 -0.15 0.52 0.77 51 Performance monitoring -0.49 0.32 0.14 0.18 0.54 0.74 51 Standardization of operations -0.30 0.29 0.31 -0.24 0.48 0.62 51 CHS Number of staffs on duty 24 29 0.42 21 49 0.67 52 Number of medical professionals 22 31 0.48 73 53 0.18 52 number of newly recruited medical practitioners‘ 6 6 0.27 4 10 0.68 51 number of newly recruited assistant medical practitioners -5 12 0.65 28 21 0.19 50 Number of beds in operation 17 38 0.67 37 66 0.58 52 Number of ICU beds 0 2 0.97 5 3 0.14 48 Total value of medical equipment with value greater than 10,000 RMB 836 783 0.30 1588 1459 0.29 50 Number of medical equipments with value greater than 10,000 RMB 0 133 1.00 34 247 0.89 50 Number of departments 3 2 0.27 0 4 0.97 52 Total revenue 1961 4915 0.69 12776 8378 0.14 52 Revenue from outpatient services 320 226 0.17 117 385 0.76 52 Revenue from inpatient services 1088 629 0.10 6 1072 1.00 52 Revenue from insurance 854 3565 0.81 -8768 6142 0.17 50 Revenue from NCMS insurance 1108 603 0.08 -128 1096 0.91 48 Total expense 4151 1986 0.05 -2456 3386 0.48 52 Total surplus as share of total medical service revenue -22 9 0.03 15 15 0.32 46 WMS = World Management Survey; CHS = County Hospital Survey; NCMS = New Rural Cooperative Medical Scheme; ICU = Intensive Care Unit Note: This facility level analysis uses Model 6, which includes the interaction of the wave 2 dummy with the DD interaction term. All analyses control for pair fixed effects and baseline hospital characteristics, namely number of NCMS enrollees, total revenue, revenue from inpatient services, number of beds, number of medical equipment worth more than 10,000 RMB, dummies for management style (decentralization to departments, some decentralization, no decentralization), and dummies for self-perceived degree of competition (fierce, some, none). * p < 0.05 45 Discussions Overall, the payment intervention demonstrated limited and variable impacts. First, we found little impact among wave 1 hospitals and counties. The results among wave 2 counties were more consistent with expectations, though not all results were statistically significant. In general, the provider payment intervention has led to a reduction in OOC admissions and thus an overall reduction in health expenditure for NCMS. This effect is consistent with the government’s objective for its policy on “tiered delivery”, that is, less complicated cases should be shifted from tertiary hospitals to secondary hospitals as a way to improve allocative efficiency. Our results among wave 2 counties also showed that the effect was stronger in county hospitals with higher technical capacity. We, however, do not find that a reduction in OOC admissions was associated with a significant increase in public county hospital admissions. One plausible explanation is that county hospitals responded by attracting more complicated cases that used to go to OOC facilities, as these cases have a higher profit margin. Meanwhile, under capacity constraint, they shifted less complicated cases to THCs or other smaller scale private hospitals, or even to the outpatient sector. This is a likely explanation, as public hospitals in China do not have hiring and firing rights. As a result, they are limited by their number of medical staff and cannot easily expand supply. These results are also consistent with the findings that the intervention led to higher expenditure per admission at general hospitals in wave 2, especially among hospitals with more advanced clinical competency. The results also showed that these hospitals engaged in cost shifting under a fixed global budget by increasing prescription of drugs and services not reimbursed by NCMS, thus leading to increased patient OOP payments. Such cost shifting behavior was, however, more limited as a hospital’s revenue share derived from NCMS increases. In the context of China, some of the WMS dimensions do not have much variation across hospitals. We also did exploratory analysis using two WMS dimensions, target and monitoring in Appendix F, as these two are the dimensions with the largest variations and the lowest ranks. The results are largely consistent with previous results and the effect sizes are generally more statistically significant. These results taken together suggest that the disease-based global budget can potentially have the biggest hypothesized effects on hospitals with higher clinical competency and high NCMS revenue share. Referring back to the goals of the initial design of the PPM intervention, this evaluation contributes to the literature on the PPMs’ effectiveness on improving efficiency and reducing OOC referrals. It was somewhat puzzling to find differential effects between wave 1 and 2 as their assignment was based on a random process. By the end of the intervention (end of 2018), wave 1 had been exposed to the intervention for more than two years whereas wave 2 had only been exposed for 46 more than one year. To explore whether there was any differential impact by duration of exposure to the intervention, we replaced the post-intervention dummy with a set of post- intervention quarter dummies in models A1 and A2, and set the rest of the specifications similar to models 1 and 3. The equations of models A1 and A2 can be found in Appendix E. The results showed that the effects were stronger among wave 2 counties hospitals. To explore why the wave 2 effects were stronger than the wave 1 effects, we estimated the effects for year 1 and year 2 of the wave 1 counties separately by adding a set of year2 interaction dummies as shown in Table G1 and G2 in Appendix G. The results in Table G1 and G2 suggest that year 2 effects for wave 1 counties were generally stronger than year 1 effects. This is possible as by the time of year 2 the implementation team and local partners had more experience on how to effectively implement the complicated payment interventions. We also estimate the effect for wave 1 in year 1 using wave 1 control counties and all wave 2 counties (both treatment and control counties) as control to increase power. However, as is shown in Tables G3 and G4, most estimates remained statistically insignificant. We next estimated effects of wave 2 with only the pre-intervention data (includes the calendar year 2016, which equals to program year 1 for wave 1) using model 1 as a falsification test. The results in Table G5 verify that there was no effect of wave 2 before the intervention. In addition, due to the broader environment change we mentioned before in explaining the differential non- compliance between wave 1 and 2, we acknowledge that wave 2 counties may have supported the intervention more strongly than wave 1 counties. Furthermore, we found that the payment intervention did not lead to any changes in the managerial capability. This quantitative finding is consistent with our field observations and discussions with hospital managers. Challenges There are a number of challenges surrounding the design, implementation and evaluation of this project. First, the APPROACH project had multiple goals: to reduce health expenditure growth and thus both minimize OOP payments while improving quality. The design of the provider payment intervention was therefore necessarily complex and predicated on several inter-related mechanisms. Reducing health expenditure growth relied on two pathways: 1) improving allocative efficiency, that is, reduce the use of OOC hospitals for health conditions that can be treated by county hospitals and at lower costs; and 2) improving technical efficiency, that is, given the site of care, change the input mix of production, more specifically by reducing average LOS, 47 reducing unnecessary drug prescriptions and diagnostic tests. Improving allocative efficiency in turn works through the following mechanisms: 1) incentives for county hospitals to develop their clinical competency and quality in order to attract patients to choose them as the first point of visit and be satisfied with their care and quality; 2) incentives for OOC hospitals to not treat patients with simple health conditions and to refer them downward to county hospitals; and 3) incentives for patients to use county hospitals, if the monetary cost is lower and quality is higher at county hospitals than OOC hospitals. The GB+P4P design created incentives for county hospitals to improve their clinical competency and service quality in order to attract patients and to reduce LOS and unnecessary drug and diagnostic tests. The insurance benefit package was also redesigned so that deductibles and copayment rates were lower at county than OOC hospitals. Furthermore, copayment rates for OOC hospitals were higher when patients were not referred by county hospitals. However, there was a major weakness in the design, namely that OOC hospitals had no incentives to refer patients downward. If patients do not voluntarily select county hospital as their first point of care, it is beyond the hospital’s ability to prevent patients from bypassing it. Second, external incentives, as defined by how payers pay the providers, have to be aligned with other organizational structures of the hospitals to have effects. The organizational structures include, for example, whether the hospitals have autonomy over hiring and firing decisions and over the use of savings from prospective payments; whether physicians’ and clinical departments’ financial interests are aligned with the GB+P4P design; or whether hospitals are held accountable for performance measures aligned with the goals in GB+P4P design. As our monitoring team has observed, many county hospitals continued to pay physicians with salary and bonus; the bonus was actually tied to volume and hospital revenue rather than cost reduction and quality improvement. As physicians were the key decision makers of treatment modalities, they were essentially still motivated by increasing volume and profits. Third, the complexity of the design also made evaluation challenging as hypothesized outcomes were not always unambiguous. For example, if county hospitals responded to the intervention by improving quality and attract patients with worse case mix, expenditure per admission at county hospitals would increase. At the same time, if county hospitals responded to the intervention by improving technical efficiency, cost per admission would decrease. A priori, it is not possible to predict the net effect. It would require estimating the impact on case mix; unfortunately, the data on diagnosis was incomplete and unusable, precluding the isolation of the two mechanisms. Fourth, in terms of implementation, a major challenge was that the relevant unit of the provider payment intervention was the county as a whole, since NCMS is organized at the county level. Although we tried to convince NCMS officials to randomize hospitals within a county for the new 48 payment method, our proposal was rejected as it would be administratively cumbersome to manage two types of payment methods. This meant that we had to have a large number of counties to achieve statistical power. Although the Department of Health of Guizhou province agreed to conduct the project in the province, the research team had to work with each treatment county to calculate the global budget, conduct end-of-year reconciliation, and train their staff. In essence, the project team had to work with 16 counties and their respective hospitals. This limited the number of counties that we could feasibly manage. The overall evaluation was compromised by small sample size. Although the study sample covers 8 million NCMS claims records, the unit of analysis was at the county/hospital levels. Similarly, the project team initially developed a monitoring system which included monthly reports from NCMS showing monthly admissions and expenditures by level of facilities, and monthly reports from the intervention hospitals on the number of cases and the types of health conditions that have been referred to OOC hospitals. The research team were to follow up with each of the treatment counties and intervention hospitals monthly to collect these reports. However, the team eventually had to abandon such frequent monitoring due to budgetary and human resource constraints. Fifth, another implementation challenge was the low level of technical capacity of the implementers, including NCMS managers, hospital directors and managers. Ideally all relevant government officials and hospital administrators should have a good understanding of the new payment method. However, this was not achieved despite numerous training sessions. In various locations, key local contacts were replaced as they were transferred to other posts. The short duration of the project also did not offer enough time for various local personnel and institutions to build a strong enough buy-in to the new payment method. Sixth, data quality was poor, and the information system was fragmented. Several NCMS claims data systems were used in Guizhou. They were all developed by different software companies, and did not necessarily use the same format or coding. The research team had to spend a significant amount of time cleaning and recoding the data before estimating the global budget. There were also significant lags in data availability. For example, it was usually not until April, or even June, that the data – even routinely collected data such as claims data – from the previous year became available for budget calculation. Finally, perhaps the most significant challenge was that many control counties were implementing other health care reforms to different degrees during the APPRAOCH project. Although the baseline outcomes achieved balance, there were many confounding factors that we were unable to account for in the analysis. For example, as noted earlier, in the last two years, the central government advocated for a DRGs-based payment system. Many counties started to experiment a DRGs-based reimbursement model for a small group of diseases and conditions. As 49 the scope of the DRGs reform varied across different facilities, it was unclear whether this reform affected treatment and control counties equally. Lessons and Conclusion Public hospitals are complex organizations. Provider payment policy is a very important policy tool for changing hospital behavior. However, changing provider payment alone is not sufficient to achieve the desired behavioral change of public hospitals. Specifically, external financial incentives need to be aligned with organizational structural changes. For example, under a prospective payment system, hospitals need to have the autonomy to hire and fire and to use the savings flexibly in order to motivate the staff to reduce inefficiency and improve quality. Equally important is that the hospital directors need to have the knowledge to adapt the hospitals’ management, such as changing financial management that was previously developed for a FFS payment system to one for a prospective global budget system; changing internal incentives and performance evaluations for clinical departments and physicians to align their incentives with those of the new payment system. This implies that when countries adopt new provider payment methods, they also need to pay attention to reforming the organizational structure and governance of the hospitals. Hospital directors also need to be trained in management or to recruit a team of managers to support the changes. In most countries, hospital directors are physicians by training without strong management skills. As population ages and noncommunicable disease burden increases, what China and many other countries need is a delivery system that integrates primary, secondary, and tertiary care. Thus, public hospital reform should be designed and viewed in this context. Policies need to move away from facility-based intervention to approaches that incorporate the inter-relationships among the different levels of facilities. APPROACH is innovative, because the design of GB+P4P is precisely trying to tackle the inter-relationship between county-level and higher-level OOC hospitals. A number of integrated delivery (or otherwise known as medical alliances) systems that are being rolled out in rural China only include county-level hospitals, THCs, and village clinics, but have not considered how these county-hospital-led networks should relate and coordinate with the OOC tertiary hospitals. However, admissions at OOC tertiary hospitals take up a lion’s share of the health insurance budget. Until OOC tertiary hospitals are incentivized to redirect uncomplicated cases back to the county, the long-term impact of county-hospital-led integrated delivery is unclear. Similarly, China is currently rolling out DRGs nationwide. However, DRGs is a facility-based provider payment method that was developed for acute care episodes. It incentivizes hospitals to increase admissions, while offering no incentives for hospitals to help strengthen primary care or integrate with primary care. 50 In addition, China and other countries trying to achieve universal health coverage need to pay a lot more attention to improving the quality of care. As our project shows, China is currently focused on expenditure control and not much attention is paid to the quality of care. Without quality improvement, it would be difficult for county hospital to expand their scope of service and compete with OOC hospitals for patients. Without quality assessment, a prospective provider payment method may lead to excessive efforts in cost control at the expense of the quality of care. On the other hand, it is also important to take small steps in the beginning. For example, payment can be tied to quality with a small set of quality indicators that can be reliably measured by existing data. Otherwise, implementation would not be feasible. Over time, as the behavior and culture of quality improvement gradually develop and data becomes more available, more indicators can be added. The standard impact evaluation methods are usually predicated on large sample size, simple-to- implement interventions with expected results within a relatively short time frame, and the ability to keep controls from other confounding policy interventions. As policy interventions become more complex, it is no longer appropriate to use this standard evaluation method. The impact evaluation community needs to put together concerted effort to develop alternative gold standards for evaluating complex interventions. The new standards would draw on a mix of quantitative and qualitative methods and other theory-based approaches to provide a deeper understanding of the complex underlying mechanisms of changes. 51