February 2019 Newsletter -

Ramana Kumar and Scott Garrabrant argue that the AGI safety community should begin prioritizing “approaches that work well in the absence of human models”:

[T]o the extent that human modelling is a good idea, it is important to do it very well; to the extent that it is a bad idea, it is best to not do it at all. Thus, whether or not to do human modelling at all is a configuration bit that should probably be set early when conceiving of an approach to building safe AGI.
New research forum posts: Conditional Oracle EDT Equilibria in Games; Non-Consequentialist Cooperation?; When is CDT Dutch-Bookable?; CDT=EDT=UDT
The MIRI Summer Fellows Program is accepting applications through the end of March! MSFP is a free two-week August retreat co-run by MIRI and CFAR, intended to bring people up to speed on problems related to embedded agency and AI alignment, train research-relevant skills and habits, and investigate open problems in the field.
MIRI’s Head of Growth, Colm Ó Riain, reviews how our 2018 fundraiser went.
From Eliezer Yudkowsky: “Along with adversarial resistance and transparency, what I’d term ‘conservatism’, or trying to keep everything as interpolation rather than extrapolation, is one of the few areas modern ML can explore that I see as having potential to carry over directly to serious AGI safety.”

Eric Drexler has released his book-length AI safety proposal: Reframing Superintelligence: Comprehensive AI Services as General Intelligence. See discussion by Peter McCluskey, Richard Ngo, and Rohin Shah.
Other recent AI alignment posts include Andreas Stuhlmüller’s Factored Cognition and Alex Turner’s Penalizing Impact via Attainable Utility Preservation, and a host of new write-ups by Stuart Armstrong.

THE FUTURE IS HERE