Stop COVID NYC Cohort

At the height of the COVID-19 outbreak in NYC, on April 1 2020, a team of researchers from Psychiatric Genomics and MSCIC launched the STOP COVID NYC survey: a phone-based platform deployed across the city to survey symptoms and factors affecting exposure (eg. social interactions, household occupancy, job type, pre-existing conditions). The survey collected data on 45,865 New York residents across a variety of measures at baseline, with daily follow-up of multiple symptoms and behaviors.


While a small fraction of the cohort had been tested for SARS-CoV-2, the team inferred infection status in the untested individuals using a Bayesian prediction model that accounted for collider bias. Marked differences were shown in infection rates according to job type, neighborhood, household occupancy, transport use etc; a downward trend in infection during April, but a slower reduction in specific population strata; a spatial distribution of the outbreak that closely matched that of antibody data; behavior by NYC residents during lockdown that limited the spread of the virus, as well as work commitments that likely increased the spread.


To use this data, you must read, agree and sign the Data Use Agreement (you must be logged in through the Mount Sinai campus network or secure remote VPN). On Minerva, you can load module $ module load dataark to see the path variables.

More information

The deidentified baseline data from STOP COVID NYC were uploaded to the Data Ark by Paul O’Reilly ( co-lead of the study, with Laura Huckins (, on 3/2/21.

Data Ark Data Sets

Please visit the Data Ark Data Set webpage to explore other data sets.