Year one of AI Safety Tokyo

Technical AI safety ACX Grants 2024 AI governance EA community

Blaine William Rogers

ActiveImpact certificate

$600raised

$1,000funding goal

$60,000valuation

Longer description of your proposed project

AI Safety Tokyo (https://aisafety.tokyo) is a special interest group for AI Safety in Japan. We run reading groups, social events, and generally act to get people in Japan interested in and educated about safety to the point that they could make a career move. I started AI Safety Tokyo with the aim of building a safety community in Tokyo from zero. My highest hopes were to become the AI safety hub for Asia, finding talent and funnelling it to where it can do the most good.

This proposal is for an impact certificate for the activities of the first year of AI Safety Tokyo. I did not seek funding when starting the organization (visa issues, now resolved), instead funding the project out of pocket. I would now like to sell the impact in exchange for cold hard cash.

AI Safety Tokyo’s central activity is a multidisciplinary AI safety study group (benkyoukai). You can find a selection of past topics on our site: https://aisafety.tokyo/benkyoukai. In the first year we held 47 weekly study groups, with between 4 and 13 attendees, averaging 7. We had 52 unique attendees, 31 of whom attended multiple times. Our attendees include professionals and academics from the Tokyo area (Google, Amazon, Rakuten, University of Tokyo, Shibaura IoT, RIKEN), independent safety researchers travelling through Tokyo, etc. We had four guest lecturers: Eliezer Yudkowsky, Colin Rowat, Nicky Pochinkov and Stephen Fowler.

We had three speaking engagements this year:

- TEDxOtemachi: I gave a talk to the general public on the need for caution around large language models.

- Shibaura Institute of Technology: I gave a talk to undergrads on the mathematics behind large language models, touching on safety topics in the process.

- Meritas Asia: I gave a talk to legal professionals on intuitions behind generative AI, what applications are more or less risky, and how to mitigate robustness issues in large language models to use them effectively in your professional life.

We are currently organising (in collaboration with Noeon Research) an international conference on technical AI safety (https://tais2024.cc), to be hosted in Tokyo in April 2024.

AI Safety Tokyo has also had less quantifiable results. We’ve also made good impressions on many Japanese academics, particularly in the artificial life community. We've also made meaningful connections with various Japanese institutions. Safety is starting to be seen as a legitimate research topic, not just something that happens on obscure internet forums.

Pleasingly, AI Safety Tokyo cannot take full credit for the nascent safety scene in Japan. LLMs exploded last year, making safety a global priority. In March 2023, three months after AI Safety Tokyo started, Conjecture and Araya organised the Japan Alignment Conference (JAC 2023). In July 2023, Simon MacGregor and co organised a special session on alignment at ALIFE 2023. One of our members, Masayuki Nagai of EA Japan, hosts an online Japanese-language Alignment 101/201 study group. @bioshok3, the Twitter personality, spoke about Alignment in Japanese at SIG-AGI and other events. We’re happy that other people are also having success reaching the Japanese community.

In the language of CEA, AI Safety Tokyo has converted at least 4 mid career academics and professionals into HEAs. One is now working closely with ALIGN, a new Japan-native funding organization founded in the wake of JAC 2023; another is investigating a transition into safety research; another is looking for ML engineering positions in the safety teams of large AI labs; another facilitated a Blue Dot Impact AI Safety Fundamentals Governance course and is starting a new organisation aiming to share best practices with respect to safety among AI companies.

Describe why you think you're qualified to work on this

I started AI Safety Tokyo with nought but the clothes on my back, a master’s degree in CS/Maths and four years of experience as an ML engineer. I’ve done a lot of science communication; I’ve written essays, spoken to crowds, and appeared on children’s TV. At Five, a European autonomous vehicle startup, I regularly hosted ML reading groups. You can check my CV on LinkedIn for more professional qualifications.

More importantly, I started AI Safety Tokyo as the only person in Japan inclined to work on AI safety (to my knowledge, after some light research). I speak Japanese (JLPT N2), and more importantly, I was physically in Japan and able to knock on people's doors, invite them to lunch, etc.

Also, since I’m applying for retrospective funding: I tried it, and it worked. I must, therefore, have been qualified.

Testimonials from AI Safety Tokyo members:

- ”Blaine always turns up well prepared, and combines solid technical understanding with a clear communication style. I believe Blaine’s efforts have set the foundation for efficiently communicating this content to Japanese AI researchers, and building a robust local AI safety community.”

- “Why is Blaine qualified to run AI Safety Tokyo? What a question. If anything, I feel he is grossly overqualified to provide such a great and useful service for free.”

- “Blaine is by far the most knowledgeable person on AI that I have had the pleasure of talking with. He is excellent at disseminating complex information about technical topics, in a way that laypeople can easily understand yet no ML practitioner complains about. I personally wouldn’t have been nearly as engaged nor interested in shifting my career towards AI safety if it wasn’t for his deep expertise and kind guidance.”

- “Blaine has the mathematical maturity to engage with the nuanced details of theory. He can typically meet pointed technical questions on their own turf, and, if not, he has demonstrable ability to autodidact away any holes that show up. He’s legit one of the best cross-community communicators I know. Blaine has also fomented clear trust and identity within the local AIS community. His views and insights matter to people.”

Other ways I can learn about you

https://www.alignmentforum.org/users/blaine

https://www.linkedin.com/in/paperclipbadger/

https://aisafety.tokyo/

https://tais2024.cc/

TEDxOtemachi Talk: https://www.youtube.com/watch?v=irHTlQ99bVk

How much money do you need?

https://resources.eagroups.org/how-to-get-funding suggests that OpenPhil funds non-undergraduate university group organizers at $60,000–$95,000 per year. CEA’s CBG program, when it was accepting applications, pays $60,000–$90,000 per year with a minimum of $20,000 for group expenses. I think AI Safety Tokyo was as successful in its first year as the EA university groups I know; that seems to be the going rate.

Estimate your probability of succeeding if you get the amount of money you asked for

A widely repeated (but never cited) statistic, supposedly from the National Center for Charitable Statistics, says that 30% of nonprofits fail in the first 10 years. Assuming exponential decay with a constant yearly survival rate, that gives us a 0.7^(1/10) = 96% chance of AI Safety Tokyo surviving its first year. Surviving doesn’t mean thriving, I would have given AI Safety Tokyo perhaps a 50% chance of success, where success means getting anyone other than my immediate friends to show up and stick around. In actuality, AI Safety Tokyo was much more successful than that, getting important people in the room and facilitating actual career changes in its membership. Let’s say it had a 20% chance of doing that well, but I’m pulling that number of thin air and I’m generally poorly calibrated.

Of course, since I’m applying for retrospective funding, AI Safety Tokyo has a 100% chance of having done as well as it actually did in its first year.

holds 0.00833%

Jason

8 months ago

Shamelessly offering $5 at a sky high valuation, maybe ~ global net worth (but I was too lazy to count the zeros). Given my prediction that few certs will get funded, I seek to maximize the odds of having at least one of mine get bought!

holds 0.908%

Austin Chen

8 months ago

I'm funding this up to the minimum funding bar, based on:

Having met @luiscostigan and hearing about the work of AI Safety Tokyo, while visiting earlier this January
The prominence of the TAIS Conference in Tokyo -- the fact that two of Manifund's AI Safety regrantors ( @DanHendrycks and @RyanKidd ) are going, and that Scott reposted about it on his most recent open thread, are both strong signals of the conference's value.
Holding regular weekly study sessions might seem like a small thing, but I really respect the dedication it shows!

I'm happy to buy this as retroactive impact certificate; I don't know if the large retro funders in this round are excited to buy back first-year impact (I hope they will be!), but either way I want to support this work.

🐞

Alyssa Riceman

8 months ago

I didn't fund this, because of the very high minimum valuation such that I don't expect funding it to pay off given the risk of the grantmakers passing it up, but this does strike me as high-value work worth a good amount of retrospective money, and perhaps others will be more optimistic than I regarding the grantmakers' likelihood of agreeing with me here.

holds 0.00833%

Jason

8 months ago

How much of the value for AI Safety Tokyo's first year do you think is fairly attributable to your work? (I am assuming this is a retrogrant for your salary, which is fine, but in that case it seems that we should be careful to retrocompensate you only for your own share of the impact.) In theory, anyone who contributed to the impact could post their own certificates for sale, and if we all paid on counterfactual impact, we would end up significantly overpaying for the sum total of impact produced.

holds 99%

Blaine William Rogers

8 months ago

@Jason I guess it's the job of the oracular funder to deduce how much of the impact is attributable to my work? And the job of investors to guess what conclusion the funder will come to and buy shares based on that? This credit assignment problem exists for all impact markets; if a research team funded by an impact market creates a new kind of cheap lab grown meat, and at the same time another team lobbies for animal right laws that make traditional meat more expensive, who is responsible for lowering meat consumption? How much of the decrease is attributable to the impact certificate funded team?

holds 99%

Blaine William Rogers

8 months ago

@Jason Answering more directly: all of the success metrics in the certificate description, except those explicitly called out as not being attributable to AI Safety Tokyo, are fairly attributable to my work (number of sessions run, number of new HEAs, existence of TAIS 2024, etc).

🧡

🦁

Luis Costigan

8 months ago

@Jason I'd vouch for the validity of Blaine's claim above (speaking as someone who has been involved with AI Safety Tokyo, but with the disclaimer that Blaine is also a friend of mine).

holds 0.0833%

Saul Munn

9 months ago

quick thoughts:

things that make me excited about this:
- AI safety tokyo seems like a really great group
- you seem like a generally competent organizer/community builder/person/etc
- i'm excited about there being an AI safety group in tokyo!
i'm quite interested in supporting retrofunding/impact certs, but i'm wary of supporting this type of retrofunding/impact certs. (here are my thoughts on how impact markets should happen — tldr is "philanthropy gives a prize/outlines their goals; everyone else tries to build stuff to achieve those goals; if they need startup capital, they sell equity [impact certs] in prizes, conditional on them winning.")
- i'm especially quite wary of this sort of double retroactive funding — everyone else on this impact market is getting retroactive funding "once," but you'd be getting it "twice." haven't thought about the implications of this enough to be sure it's a good (or bad!) idea.
this seems to obviously be within the range of interest of the LTFF.
i gave $50 to prevent funder chicken, signal interest, and generally get other investors to put their eyes on this grant.

holds 0.0833%

Saul Munn

9 months ago

@saulmunn oh, one other thing — i'm confused why the minimum valuation is at $60k? seems like with some other numbers you could pretty easily get much smaller numbers. i'd have been a lot more keen to fund this at e.g. a $20k valuation, or even a $5k valuation (and would have probably bought more impact equity!)

Chris Lakin

9 months ago

@saulmunn yeah i agree. @blaineaisafetytokyo ?

holds 99%

Blaine William Rogers

9 months ago

@Chris-Lakin I agree that this certificate is not a great fit for the manifund model; I submitted this as an ACX grant and ticked the box because why not. Here, investors are betting purely on whether or not a retroactive funder will buy the certificate from them at a markup, not predicting whether the project will be successful. I guess investors might also be acting as retroactive funders themselves. I tried to minimize the double-funding problem by retaining as much of the equity as Manifund allows (99%). I set the minimum valuation such that the Manifund interface showed the value of the certificate as being equal to the going rate for group organizers given by OpenPhil and CEA, but I don't know much about how Manifund works. Should I instead have released 100% of the shares to the market and set the minimum valuation as low as possible?