Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Belz, A., Thomson, C., Reiter, E., Abercrombie, G., Alonso Moral, J. M., Arvan, M., Cheung, J., Cieliebak, M., Clark, E., van Deemter, K., Dinkar, T., Dušek, O., Eger, S., Fang, Q., Gatt, A., Gkatzia, D., González Corbelle, J., Hovy, D., Hürlimann, M., Ito, T., Kelleher, J. D., Klubicka, F., Lai, H., van der Lee, C., van Miltenburg, E., Li, Y., Mahamood, S., Mieskes, M., Nissim, M., Parde, N., Plátek, O., Rieser, V., Mosteiro Romero, P., Tetreault, J., Toral, A., Wang, X., Wanner, L., Watson, L., Yang, D.
Chapters in Books, Reports and Conference Proceedings: Conference Proceedings
State-of-the-art generalisation research in NLP: a taxonomy and review
Hupkes, D., Giulianelli, M., Dankers, V., Artetxe, M., Elazar, Y., Pimentel, T., Christodoulopoulos, C., Lasri, K., Saphra, N., Sinclair, A., Ulmer, D., Schottmann, F., Batsuren, K., Sun, K., Sinha, K., Khalatbari, L., Ryskina, M., Technology, H., Cotterell, R., Jin, Z.