AI x crypto

Dan BonehAli YahyaSonal Chokshi



with @alive_eth @danboneh @smc90

This week’s all-new episode covers the convergence of two important, very top-of-mind trends: AI (artificial intelligence) & blockchains/ crypto. These domains together have major implications for how we all live our lives everyday; so this episode is for anyone just curious about, or already building in the space. The conversation covers topics ranging from deep fakes, bots, and the need for proof-of-humanity in a world of AI; to big data, large language models like ChatGPT, user control, governance, privacy and security, zero knowledge and zkML; to MEV, media, art, and much more.

Our expert guests — in conversation with host Sonal Chokshi — include:

  • Dan Boneh, Stanford Professor (and Senior Research Advisor at a16z crypto), a cryptographer who’s been working on blockchains for over a decade and who specializes in cryptography, computer security, and machine learning — all of which intersect in this episode;
  • Ali Yahya, general partner at a16z crypto, who also previously worked at Google — where he not only worked on a distributed system for a fleet of robots (a sort of “collective reinforcement learning”) but also worked on Google Brain, where he was one of the core contributors to the machine learning library TensorFlow built at Google.

The first half of the hallway-style conversation between Ali & Dan (who go back together as student and professor at Stanford) is all about how AI could benefit from crypto, and the second half on how crypto could benefit from AI… the thread throughout is the tension between centralization vs. decentralization.  So we also discuss where the intersection of crypto and AI can bring about things that aren’t possible by either one of them alone…

pieces referenced in this episode/ related reading:


Welcome to “web3 with a16z”, a show about building the next generation of the internet from the team at a16z crypto; that includes me, your host, Sonal Chokshi. Today’s all-new episode covers the convergence of two important, top of mind trends: AI (artificial intelligence), and crypto. This has major implications for how we all live our lives everyday; so this episode is for anyone just curious about, or already building in the space.

Our special guests today are: Dan Boneh, Stanford Professor (and Senior Research Advisor at a16z crypto) – he’s a cryptographer who’s been working on blockchains for over a decade, and, the topics [in this episode] have a strong intersection between cryptography, computer security, and machine learning, all of which are his areas of expertise… And then we also have Ali Yahya – general partner at a16z crypto who also worked at Google previously, where he not only worked on a distributed system for robotics (more specifically a sort of “collective reinforcement learning” which involved training a single neural network contributing to the actions of an entire fleet of robots), but also worked on Google Brain, where he was one of the core contributors to the machine learning library TensorFlow. And actually: Dan & Ali go back since Ali was an undergrad and masters student at Stanford, so this conversation is really more of a hallway jam between them that I asked to join – and we cover everything from deep fakes and bots to proof-of-humanity in a world of AI and much, much more…

The first half is all about how AI could benefit from crypto, and the second half on how crypto could benefit from AI; and the thread throughout is the tension between centralization vs. decentralization. As a reminder: none of the following should be taken as investment, legal, business, or tax advice; please see for more important information — including to a link to a list of our investments – especially since we are investors in companies mentioned in this episode. But first: we begin with how the two worlds intersect – with a quick sharing of areas (or visions!) that they’re excited about; the first voice you’ll hear is Ali’s~

Ali: There is a really good sci-fi novel called “The Diamond Age” by Neal Stephenson, in which there is this device known as the “illustrated primer” that is a kind of artificially intelligent device that acts as your mentor and your teacher throughout life.

And so when you’re born, you’re paired to an AI, essentially, that knows you really well; learns your preferences; follows you throughout life; and helps you make decisions, and steers you in the right direction.

So there’s like a sci-fi future… in which you could build such an AI – but you very much wouldn’t want that AI to be controlled by a monopolistic tech giant in the middle. Because that position would provide that company with a great deal of control, and there’s all these kind of questions of privacy and sovereignty, and you’d want to have kind of control over it.

And then also what if the company goes away or they change the rules or they change the pricing? It would be great if you could build an AI that could run for a very, very long time and could get to know you over the course of a lifetime – but have that really be yours.

And so there is this vision in which you could do that with a blockchain: You could embed an AI within a smart contract.

And with the power of zero knowledge proofs, you could also keep your data private.

And then over the course of decades, this AI can become smarter, and can help you, and then you have the option to do whatever you want with it. Or change it in whichever way you want; or shut it down.

And so that’s kind of an interesting vision for long-running AIs that are continually evolving and continually becoming better: It’d be better if it were the case that they weren’t just controlled by a single centralized company.

Of course: it’s a very science fiction idea, because there are lots of problems – including the problems of verification; and the problems of keeping data private using cryptography and still being able to compute on top of that data, maybe with fully homomorphic encryption. <Sonal: mhm> All of these problems continue to be outstanding, but it’s not something that’s inconceivable.

Dan: Wow; I love Ali’s vision there!

Sonal: I love it too, especially given that quote (I think it was Asimov?) that today’s science fiction is tomorrow’s science fact.

Ali, I know you have a meta-framework for thinking about all this stuff that I’ve heard you share  before, can you share that now too.

Ali: Yeah, there is this broader narrative that has existed for quite some time now that’s only becoming much more accentuated now with the development of things like LLMs.

Sonal: Actually define that really quick Ali just for listeners who aren’t already familiar, just as context.

Ali: So LLM stands for “large language model” and it uses some of the technology that was developed at Google back in 2017 – there’s this famous paper known as “Attention is All You Need” (that was the title of the paper)and it outlined what are now known as transformers – and that’s the basis, basically, of some of the new models that people have been training these days, including ChatGPT and so on, all of these are large language models or LLMs.

<Sonal: mmm!>

There was that famous, I think (2018) line from Peter Thiel that “AI is communist and crypto is libertarian”… That line is like very on point, actually, because AI and crypto in many ways are natural counterweights for one another. <Sonal: uh huh?> And maybe we can go deep over the course of the podcast into each one of these as we go through examples, <Sonal: yah> – but there are four major ways in which that’s true:

[1] The first is that AI is very much a technology that thrives and enables top-down centralized control, whereas crypto is a technology that’s all about bottom-up decentralized cooperation. And in many ways, actually, you can think of crypto as the study of building systems that are decentralized, that enable large-scale cooperation of humans – where there isn’t really any central point of control.

And so that’s one natural way in which these two technologies are counterweights for one another.

[2] Another one is: that AI is a sustaining innovation in that it reinforces the business models of existing technology companies <Sonal: mhm> because it helps them make top-down decisions. And the best example of this would be Google being able to decide exactly what ad to display for each of their users across billions of users and billions of page views.

Whereas crypto is actually a fundamentally disruptive innovation, in that it has a business model that’s fundamentally at odds with the business models of big tech companies. And so as a result, it’s a movement that is spearheaded by rebels/ by the fringes – as opposed to being led by the incumbents. <Sonal: mhm> So that’s the second.

[3] The third one is that AI will probably relate and interplay a lot with all of the trends towards privacy… Because AI as a technology has built in all sorts of incentives that move us towards less individual privacy – because we will have companies that want access to all of our data; and AI models that are trained on more and more data will become more and more effective. And so I think that that leads us down a path of the AI panopticon, where there’s just collective aggregation of everyone’s data into the training of these enormous models in order to make these models as good as possible.

Whereas crypto moves us towards the opposite direction, which is a direction of increasing individual privacy. It’s a direction of increasing sovereignty, <Sonal: mhm> where users have control over their own data. And those two trends, I think, will be very important. And this is just another important way in which crypto is the counterweight, for AI.

(4) And maybe the final one has to do with this latest trend in AI – the fact that AI is now very clearly a powerful technology for generating new art; it’s now a creative tool <Sonal: mm! totally> – that will lead us to infinite abundance of media, infinite creativity in many ways.

And crypto is a counterweight to that because it helps us cut through all of the abundance, in helping us distinguish what’s created by humans versus what’s created by AI. And cryptography will be an essential part of maintaining and preserving what actually is human in a world where 1000x more of the content <Sonal: right> is actually artificially generated.

So these are all things that we can talk about, but I think that there is this important meta-narrative; and these two technologies are very much diametrically opposing, in many respects.

Dan: So maybe Ali, to add to that – this is a wonderful summary – and I would say also that there’s also a lot of areas, where techniques from AI are having an impact in blockchains; and vice versa, <mhm> where techniques from blockchains are having an impact in AI.

I’ll give a brief answer here because we’re going to dive into the details in just a minute; but there are many points of intersection: I guess we’ll talk about applications of zero knowledge for machine learning in just a minute…

But I also want to touch on all these applications where machine learning itself can be used to write code. So for example, machine learning can be used to write Solidity code that goes into contract. It can be used to find maybe errors in codes and so on.

There’s points of intersection where machine learning can be used to generate deep fakes and blockchains can actually help to protect against uh deep fakes. And so I guess we’re going to touch on all these points – but, the interesting thing is that there’s really quite a lot of intersection between blockchains and machine learning.

Sonal: Yeah, before we dive into those – one question I have for you Dan, is do you agree with that? I mean I definitely hear Ali’s point that AI and crypto are very natural complements actually, or counterweights really for each other, or they can be different forces that can kind of check and balance each other almost… But, is this an inherent quality to AI and crypto in your opinion? Or is this just an artifact of the way things are done right now; what parts might you agree or disagree with?

Dan: Yeah; so I would say that, if you look at it from far away… the techniques that are used in AI, they seem very different from the techniques that are used in blockchains, right? So blockchains is about cryptography, decentralization, finance, and economics and so on;

Whereas AI is you know about statistics, the mathematics of machine learning and so on; it’s about big data. The techniques actually look quite different. <Sonal: mhm> but they’re actually a lot of places where one side can help the other  <mhm> and vice versa.

So maybe the first one to start with is kind of the obvious one that’s been on a lot of people’s minds, which is what’s called the applications of zero knowledge for machine learning;  this is kind of an emerging area, it’s called zkML. And the reason this has become interesting is because ZK techniques have improved dramatically – because of their application in blockchains.

What’s happened over the last 10 years is sort of unbelievable; you know, something that we don’t see very often: This idea of zero knowledge proofs, and proof systems in general, they were considered very theoretical a decade ago. And because of all of their applications in blockchains, all of a sudden there was a lot of effort in making them more practical and real world. And as a result, there’s been tremendous progress (as our listeners know), that now these things are actually deployed, and used to protect real systems.

So, the question then is, can zero knowledge techniques be used to help machine learning? And there are a couple of examples – honestly, we could spend the whole podcast just on zkML – but maybe I can just give a taste, one or two examples where ZK is useful for machine learning;

And so imagine Alice has a secret model that she spent a lot of time training and that model is actually very important to her. It’s very important that people don’t know how the model works. But she still wants to be able to service requests from Bob, right: So Bob would like to send her some data, she would apply the model to the data, send the result back to Bob.

Bob has no idea whether he’s getting the correct results on the model, right? Maybe he paid for a certain model, and you want to make sure that Alice is really using that model, right; maybe he paid for GPT-4 and he wants to make sure Alice is really using GPT-4, and not GPT-3.

Well it turns out ZK techniques can help here a lot. So what Alice would do, she would commit to her model, make the commitment publicly available. And then whenever Bob submits a piece of data, Alice could run the model on that data, sends the results back to Bob – along with the proof that the model was evaluated correctly. So Bob now would have a guarantee that in fact the model that was committed to IS the one that was run on Bob’s data. Yeah? So that’s an example where ZK techniques can be useful in the ML case.

And I want to kind of stress why this is so important: <Sonal: Yah!> So let’s look at one example –

So suppose we have a function, a model that’s actually used to affect people’s lives. Like imagine, you know, maybe we use a model to decide whether we grant a loan, or grant a mortgage; you know a financial institution might want to use a model like that. Well, you want to make sure that the same model as being applied to everyone, right – that it’s not the case that, you know, one model is being applied to me <Sonal: yah> and a different model is being applied to you <Sonal: right>. Well, by basically having the bank commit to the model, right; and then everyone can verify that their data is being assessed by the same committed model <mmm> – we can make sure that the same model is being applied to everyone.

And I have to say that there’s a wonderful open problem here: Which is, that even though zero knowledge techniques can make sure that the same model is being applied to everyone, there is this question you know: Is the model fair? Models can have biases, could lead to unfair results. And so, there’s a whole area of research – it’s called algorithmic fairness; there are many, many papers on algorithmic fairness – and it’s really interesting to ask: Well now that we have a committed model, can we prove in zero knowledge that the model satisfies some fairness definition from the area of algorithmic fairness.

And, like how do we make sure that the training process ran correctly.

Ali: Well, everything that you said about ZKML is extremely exciting. And as a technology I think it’ll play a role at making machine learning, and AI sort of generally, more transparent and more trustworthy – both within the context of crypto and outside of it.

I think an even crazier, and maybe longer-term, and more ambitious, application of ZKML – and some of the other verification techniques that the crypto community has been working on <mhm> – is just generally decentralizing AI. Because as we were talking about before, AI is a technology that is almost, inherently centralizing – in that it very much thrives from things like scale effects, because having things within a datacenter makes things more efficient – and so scaling things in a centralized way makes for things to become more powerful, and more centralized as a result.

Then also, data is usually controlled by a small number of tech companies in the middle. And, as a result also kind of leads to additional centralization.

And then finally: machine learning models, and machine learning talent, also kind of controlled by a small number of players.

And so crypto can again help on this front by building technology using things like ZKML that can help us decentralize AI.

So there are three main things that go into AI: There’s the compute aspect, and that requires sort of large scale use of GPUs (usually in data centers). There’s the data piece, which, again, most of the centralized companies control. And then there’s the machine learning models themselves. <Sonal: yah>

And the easiest one might be the prong of compute. Like can you actually decentralize the compute for the training and the inference of machine learning models? And this is where some of the techniques that Dan was talking about – things like zero knowledge proofs that you can use to prove that the process of actually conducting inference, or, training a model was actually done correctly – so that you can outsource that process to a large community. And you can have a distributed process by which anyone who has a GPU can contribute computation to the network, and have a model be trained in that way – without necessarily having to rely on a massive data center with all of the GPUs in a centralized manner.

And there’s a big question still of whether or not that economically ends up making sense… But at the very least, through the right incentives, you can actually tap into the long tail: You can tap into all of the idle GPU capacity that might exist; have all of those people contribute that computation to the training of a model, or to the running of inference; and provide an alternative to what otherwise would be just the big tech companies in the middle that currently control everything. <Sonal: mmmm!> There are all sorts of important technical problems that would have to be solved in order for that to be possible –

There’s actually a company in the space that is called Gensyn which is building exactly this. They are building a decentralized marketplace for GPU compute. Very much for the purpose of training machine learning models. And it’s a marketplace where anyone could contribute their GPU compute – whether it be in their kind of personal computer under their desk, or whether it be idle inside of some data center.

And then on the other side, anyone can leverage whatever compute exists in the network to train their large machine learning models. And this would be an alternative to the very centralized, sort of OpenAI / Google / Meta / all – you know, insert your favorite big tech company here <Sonal: yah! chuckles> – alternative that currently you would necessarily have to go with.

Sonal: So, before we go into more of this decentralization framework – cuz Ali, you were breaking down compute, and I think you were going to share the other two of those three prongs – But before we do…both of you guys talked a little bit about all the technical challenges here. So what are some of the technical challenges that need to be overcome here, and that people may or may not already be solving? (I definitely want builders who listen to this episode to also think about what the opportunities are in this space, and where they can address existing challenges; or what are some of the challenges they’re going to face in building solutions here.)

Dan: Yeah, so maybe I can mention two that I think would be interesting to folks –

[1] So one is basically, uh-imagine you have a situation where Alice actually has a model that she wants to protect. She wants to send the model in an encrypted form to some other party, let’s say to Bob. So Bob receives an encrypted model, and it needs to be able to run its data on this encrypted model.

Well how do you do that? If you have a piece of data that you want to run on a model, but you only have the encryption of the model, how do you make that possible?

And that is something that we would use what’s called fully homomorphic encryption for. <Sonal: Yah. (FHE).> It’s a- fully homomorphic encryption is this remarkable tool that allows you to compute on encrypted data.

This is kind of mind-boggling that this is possible – but you can have an encrypted model, and you might have some clear text data, and you can actually run the encrypted model on the clear text data and receive and obtain an encrypted result. You would send the encrypted result back to Alice, and she would be able to decrypt and see the results in the clear.

So this is actually something that’s already- there’s actually quite a bit of demand for this in practice; <Sonal: yah> it doesn’t take much effort to see that the DoD is interested in this. There are many other applications where you can send an encrypted model to a third party; the third party would run the encrypted model on their data; sends you back the results; you can decrypt, and learn something about the data that was given as input to the encrypted model.

The question of course, is how do we scale that up? Right now this works well for medium-sized models… and the question is, can we scale it up to much larger models? So this is quite a challenge: a couple of startups in the space, and again, very very interesting technology, it’s kind of amazing that this is possible at all. <Sonal: yah… really> And we’re probably going to see much more of that in the future.

[2] The other area is actually what Ali mentioned, another very important area, which is: how do we know that the model was trained correctly?

So if I send my data to someone and I asked him to train a model on that data, maybe fine tune the model on that data – how do I know that they did that correctly? Right, they might send me a model back, but how do I know that the model doesn’t have backdoors in it. There’s actually a fair amount of work on showing that if the training is done incorrectly, it could send you back a model that would work correctly on all your test data – but it has a backdoor, meaning that it would fail catastrophically on one particular input. This is possible if your training process is not verified.

And again, this is an area where ZKML comes in: We can prove to you that the training ran correctly; or maybe there’s some other techniques that might be possible, to prove that the training was done correctly. But again, this is another area – a very active area of research – and I would encourage many of the listeners, this is like a very very difficult problem: proving that training was done correctly; proving that the training data even was collected correctly, and was filtered correctly, and so on.

So that actually is a wonderful area to get into if people are looking to do more work in the space.

Sonal: Fantastic! Ali, is there anything you would add to that?

Ali: Yeah, definitely. Well I guess if we continue down the path of talking about what it would take to help decentralize the AI stack, I think that in order to decentralize the compute prong – and there are the three important prongs – if we wanted to decentralize the compute aspect, there are two very important, open technical challenges –

The first is the problem of verification (which Dan just mentioned) <Sonal: mhm> which you could use ZKML for. And you can ideally over time use zero-knowledge proofs to prove that the actual work – that the people who are contributing to this network – was actually done correctly.

And the challenge there is that the performance of these cryptographic primitives is nowhere near where it needs to be to be able to do either training or inference of the very very large models. So the models today, like sort of the LLMs that we all kind of know and love now (like ChatGPT), would not be provable using the current state-of-the-art of ZKML. And so there’s a lot of work that’s being done towards improving the performance of the proving process, so that you can prove larger and larger workloads efficiently. But that’s an open problem, and something <Sonal: yah> that a lot of people are working on.

And in the meantime, companies like Gensyn are using other techniques that are not just cryptographic – and instead are game-theoretic in nature, where they just get a larger number of people who are independent from one another to do the work; and compare their work with one another, to make sure that the work’s done correctly. <Sonal: Ohhh… interesting.> That is more of a game-theoretic/ optimistic approach that is not relying on cryptography, but is still aligned with this greater goal of decentralizing AI – or helping create an ecosystem for AI that is much more organic, community-owned, and bottom-up – as opposed to the top-down that’s being sort of put forth by companies like OpenAI.

So that’s the first problem; the first big problem is the problem of verification.

And the second big one is the problem of distributed systems: Like how do you actually coordinate a large community of people who are contributing GPUs to a network <Sonal: yup> such that it all feels like an integrated, unified substrate for computation.

And there will be lots of interesting challenges along the lines of: We’re breaking up that machine learning workload, in a way that makes sense; and shipping off different pieces of it to different nodes in the network; figuring out how to do all of that efficiently; and then also when nodes fail, figure out how to recover, and assign new nodes to then take over (whatever work was being done by the node that failed). So there are lots of messy details at a distributed-systems level that companies will have to solve – in order to give us this decentralized network that can perform machine learning workloads in a way that’s perhaps even cheaper than just using the cloud.

Sonal: Yeah… That’s great.

Dan: …Yeah and it’s totally, definitely true that the ZK techniques today will handle the smaller models that are being used – but definitely the LLMs are probably too big for these techniques to handle today, the ZK techniques; but, you know they’re constantly getting better, the hardware is getting better, and so hopefully they’ll catch up.

Sonal: Yeah; before we go on, can we just do a really clear pulse-check then on where we are *exactly* in that – So obviously, what I’m hearing you guys say is that there are tremendous applications at the intersection of general verifiable computing – which blockchains and crypto have definitely been significantly advancing and accelerating that whole area (we’ve been covering a ton of it ourselves: if you look at our ZK canon and zero knowledge category, you’ll see so much of this covered there) –

But where are we exactly right now in terms of what they can do? Because you guys talked a lot about what they can’t do yet, and what the opportunity is, which is exciting; but where are we right now, like what can they actually do?

Dan: Yeah. So right now, they can basically do classification for medium-sized models. So not something as big as GPT-3 or 4, but medium-sized models. <yup> It is possible to prove that the classification was done correctly. Training is probably beyond what can be done right now, just because training is sooo compute intensive <Sonal: right> that for proof systems we’re not there yet.

But like Ali said, we have other ways to do it. For example, we can have multiple people do the training, and then compare the results. Yah? So that now there are game theory incentives for people not to cheat. If somebody cheats, somebody else might be able to complain that they computed the training incorrectly, and then whoever cheated will not be paid for their effort <Sonal: right right, yah> So there’s an incentive for people to actually run the training the way it was supposed to run.

Sonal: Right. And so basically that sort of – not like a hybrid phase, but it’s basically like alt approaches until more of this comes to scale, and performance is scaled to a point where we can get there.

Dan: Yah; I would say that for some models classification can be proved in zero knowledge today. For training right now, we have to rely on optimistic techniques.

Sonal: Yeah, great.

So Ali, you mentioned compute is one of the three prongs – and you also mentioned that data and then the models for machine learning themselves – do you want to tackle now data, and sort of the opportunities and challenges there (where it comes to the crypto/AI intersection).

Ali: Yeah, absolutely. So, I think that there is an opportunity – even though the problems involved are very difficult – to both decentralize the process of sourcing data for training of large machine learning models <mhm> from a broader community. Again, instead of having a single centralized player, just collect all of the data <yah> and then train the models themselves.

And this could work by creating a kind of marketplace that’s similar to the one that we just described for compute – but instead, incentivize people to contribute new data to some big dataset that then gets used to train a model.

The difficulty with this is similar in that there’s a verification challenge – you have to somehow verify that the data that people are contributing is actually good data <Sonal: yah!> and that it’s not either duplicate data or garbage data that was just sort of randomly generated or not real in some way. And to also make sure that the data doesn’t somehow subvert the model – some kind of poisoning attack – where the model actually becomes either backdoored, or just sort of less good or less performant than it used to be. And so there’s the question of how do you verify that that is the case?

And that is maybe an open hard problem for the community to solve. It may be impossible to solve completely, and you might have to rely on a combination of technological solutions with social solutions – where you also have some kind of reputation metric. That members in the community are able to earn, to build up credibility such that when they contribute data, the data can then be trusted a little bit more than it would be otherwise.

But what this might allow you to do, is that you can now truly cover the very very long tail of the data distribution.

And one of the things that’s very challenging in the world of machine learning, is that your model is really only as good as the coverage of the distribution that your training dataset can achieve. <yup> And if there are inputs that are far, far out of the distribution of the training data, then your model might actually behave in a way that’s completely unpredictable. And in order to actually get the model to perform well in the edge cases – and the sort of black swan data points, or data inputs that you might experience in the real world – you do want to have your dataset be as comprehensive as possible.

And so: If you had this kind of open, decentralized marketplace for the contribution of data to a dataset, you could have anyone who has very very unique data out in the world contribute that data to the network… Which is a better way to do this, because if you try to do this as a central company, you have no way of knowing who has that data. And so if you flip it around and create an incentive for those people to come forward – and provide that data on their own accord – then I think you can actually get significantly better coverage of the long tail.

And as we’ve seen, the performance of machine learning models continues to improve as the dataset grows, and as the diversity of the data points in the dataset grows. And so this can actually supercharge the performance of our machine learning models to an even greater degree; [where] we’re able to get even more comprehensive datasets that could cover the whole community.

Dan: So let me turn this on its head in that uh…

Sonal: Oooh, go for it, Dan!

Dan: …If we’re going to incentivize people to contribute data, basically we’re going to incentivize people to create fake data <Sonal: yes!> so they can get paid. Yeah? So we have to have some sort of a mechanism to make sure that the data you’re contributing is authentic. <Ali: Exactly>

And you can imagine a couple of ways of doing this, right? I mean, one way is actually by relying on trusted hardware <Sonal: mhm>: Maybe the sensors themselves are embedded in some trusted hardware that we would only trust data that’s properly signed by the hardware. That’s one way to do things.

Otherwise, we would have to have some other mechanism by which we can tell whether the data is authentic or not.

Ali: Completely agree.

That’d be the biggest open problem to solve… And I think that as benchmarking for machine learning models gets better, I think there’s two important trends in machine learning at the moment <Sonal: yah> –

[1] There’s improving the measurement of the performance of a machine learning model. And for LLMs, that’s still very much in its early stages and that it’s actually quite hard to know how good an LLM is. Because it’s not as if it were like a classifier where what the performance of a model is very clearly defined. With an LLM, it’s almost as if you’re testing the intelligence of a human. Right? <Sonal: mhm!> And coming up with the right way of testing how intelligent an LLM like Chat-GPT is an open area of research. But over time, I think that’ll become better and better.

[2] And then the other trend is that we’re getting better at being able to explain how it is that a model works.

And so with both of those things, at some point, it might become feasible to understand the effect that a dataset has on a machine learning model’s performance. <Sonal: yes> And if we can have a good understanding of whether or not a dataset that was contributed by a third party helped the machine learning model’s performance, then we can reward that contribution –  and we can create the incentive for that marketplace to exist.

Sonal: So just to summarize so far, what I heard you guys say is that –

  • There’s trusted hardware that can help check the accuracy of the data that’s being contributed, and the models that are being contributed;
  • Ali, you mentioned briefly reputation metrics and that type of thing can help;
  • You also mentioned that there might be a way (not necessarily now, but sometime in the near future) to check how the data is influencing the outcomes in a particular model so that you can actually…it’s not quite explainable, but the idea is that you can actually attribute that this dataset caused this particular effect.

So there’s a range of various techniques you guys have shared so far.

Ali: Well, the final thing is that you could do the same thing for the third prong, which is models.

Imagine if you could create an open marketplace for people to contribute a trained model that is able to solve a particular kind of problem. So imagine if on Ethereum you created a smart contract that embedded some kind of test – be it like a cognitive test that an LLM could solve, or some classification test that a classifier machine learning model that is a classifier could solve –

And if using ZKML, someone could provide a model alongside a proof that that model can solve that test – then again you now have the tools that you need to create a marketplace that incentivizes people to contribute machine learning models, that can solve certain problems.

So many of the problems that we’ve discussed – the open problems that we’ve discussed – on how to do that are also present here… In particular, there’s a ZKML piece where you have to be able to prove that the model is actually doing the work that it’s claiming that it’s doing. And then also, we need to be able to have good tests of how good a model is. Right, so being able to embed a test inside of a smart contract that then you can subject a machine learning model to evaluate how good it is. This is another very nascent part of this whole technology trend.

But in theory, it’d be great if we eventually get to a world where we do have these very open, bottom-up, transparent marketplaces that allow people to contribute and source compute, data, and machine learning models for machine learning – that essentially act again as a counterweight to the very very centralized, enormous tech companies that are driving all of the AI work today.

Sonal: I really love Ali how you mentioned that, because it’s been a long standing problem in AI in practice that it can solve a lot of things for like the bell curve, the middle of the norm, but not the tail ends. And a classic example of this is self-driving cars, right, like you can do everything with a certain amount of standard behaviors, but it’s the edge cases where the real accidents and catastrophic things can happen. <Ali: Right>

So that was super helpful; And I know you talked about some of the incentive alignment, and incentives for providing accurate and quality data – and even incentives for just even contributing anything in the long tail of data overall –

But on the long-tail side: a quick question that popped up for me when you were talking, like, it sort of begs the question of then who makes money *where*, in this system? I couldn’t help but wonder, what does the business model kind of come in then in terms of making money for companies? Because I always understood that in the long tail of AI (in a world of this kind of available datasets), that your proprietary data is actually your unique domain knowledge, and kind of the thing that only you know in that long tail – so do you guys have any quick responses to that?

Ali: So I think that the vision behind crypto intersecting with AI is that you could create a set of protocols that distributes the value that will eventually be captured by this new technology, by AI, amongst a much larger group of people. Essentially a community of people, all of whom can contribute and all of whom can take part of the upside of this new technology. <Sonal: mhm>

So then the people who would be making money would be the people who are contributing compute, or the people who are contributing data, or the people who are contributing new machine learning models to the network – such that better machine learning models can be trained and better, bigger, more important problems can be solved.

The other people that would be making money at the same time are the people who, on the other side, are on the demand side of this network: People who are using this as infrastructure for training their own machine learning models. <Sonal: yes> Maybe their model does something interesting in the world, maybe it’s like the next generation of ChatGPT. And then that then goes on to make its way into a bunch of different applications – like say enterprise applications or whatever it is that those models may be used for – and those models drive value capture in their own right, because those companies will have a business model of their own.

And then finally, the people who might also make money are the people who build this network. Right, so for example: Create a token for the network; that token will be distributed to the community; and all of those people will have collective ownership over this decentralized network for compute data and models that may also capture some value of all of the economic activity that goes through this network.

So you can imagine any payment for compute, or any payment for data, or any payment for models <mhm> could have some fee imparted on it. It might just go to some treasury that’s controlled by this decentralized network that all tokenholders that are part of this network have collective ownership and access to as well (as the creators and owners of the marketplace).

And that fee might just go to the network. So you can imagine that every transaction that goes through this network, every form of payment that pays for compute or pays for data or pays for models – might have some fee that’s imparted on it that goes to some treasury that’s controlled by the whole network, and by the token holders that collectively own the network.

And so that’s essentially a business model for the network itself.

Sonal: Great!

Okay. So so far we’ve been talking a lot about the way that crypto can help AI – I mean, to be clear, it’s not like unidirectional; these things are kind of reinforcing and bidirectional and more interactive than one way –

But, for the purpose of this discussion, we’re really talking about it being like here’s how crypto can help AI; let’s now kind of turn it on its head and talk a little bit more about ways that AI can help crypto.

Dan: Yah, so there’re a couple of interesting touchpoints there.

So one that’s actually worth bringing up is this idea of machine learning models that are used to generate code. So many of the listeners have probably heard of Copilot, which is a tool that’s used to generate code. And what’s interesting is you can try to use these code generation tools to write Solidity contracts, or to write cryptography code.

And I want to stress that this is actually a very dangerous thing to do.

Sonal: Ohhh! Do NOT do this at home; okay. <chuckles>

Dan: Yeah. Yah. Do not try this at home.

Because what happens is very often these systems actually will generate code that works, you know, when you try to run it – and, you know, encryption is the opposite of decryption and so on – so the code will actually work, but it will actually be insecure.

We’ve actually written a paper about this recently that shows that if you try to get Copilot to just write something as simple as just an encryption function, it will give you something that does encryption correctly – but it uses an incorrect mode of operation yah, so that you’ll end up with an insecure encryption mode.

Similarly, if you try to get it to generate Solidity code, you might end up with Solidity code that works – but it will have vulnerabilities, in it.

So you might wonder, why does that happen? And one of the reasons is because these models are basically trained on codes that’s out there, they’re trained on GitHub repositories. Well, you know, a lot of the GitHub repositories actually are vulnerable to all sorts of attacks. And so these models learn about code that works – but not code that is secure; it’s almost like garbage in, garbage out yah? <Sonal: mhmmm!>

And so I do want to make sure people are very, very careful – when they use these generative models to generate code – that they very, very carefully check that the code actually does what it’s supposed to do, and that it does it securely.

Ali: One idea on that front – I’m curious what you think about this – is that you can use AI models like LLM (sort of like ChatGPT) to generate code, in conjunction with other tools to try to make the process less error-prone?

And so one example, one idea <oh> would be to use an LLM to generate a spec for a formal verification system: So basically you describe your program in English; and you ask the LLM to generate a spec for a formal verification tool; then you ask the same instance of the LLM to generate the program, that meets that spec. <mm!> And then you use a formal verification tool to see whether the program actually meets the spec.

And if there are errors, that tool will catch the errors; those errors can then be used as feedback back to the LLM. And then ideally, hopefully, the LLM can then revise its work, and then produce another version of the code that is correct. And eventually – if you do this again and again – you end up with a piece of code that ideally fully meets the spec, and is formally verified to meet the spec.

And because the spec is maybe readable by a human, you can kind of maybe go through the spec and see like, yes, this is the program that I intended to write. And that can be an actually pretty good way to use LLMs to write code that also isn’t as prone to errors as it might be if you were to just ask ChatGPT to generate a smart contract for you.

Sonal: Clever!

Dan: Yeah, this is great – and actually this leads into another topic that is worth discussing, which is basically using LLMs to find bugs.

So suppose a programmer actually writes some Solidity code; and now you want to test, is that code correct, is it secure? And like Ali said, you can actually try to use the LLM to find vulnerabilities in that code. And there’s been actually quite a bit of work on trying to assess how good LLMs are at finding finding bugs in software, in Solidity smart contracts, in C and C++.

There’s one paper that came out recently that’s actually very relevant – it’s a paper from the University of Manchester <Sonal: mhm> – whuch says that you would run a standard static analysis tool to find bugs in your code; and it would find all sorts of memory-management bugs or potential bugs – just a standard static analysis tool; no machine learning whatsoever.

But then you would use an LLM to try and fix the code. Yah? <Sonal: mm> <Ali: exactly> So it proposes a fix to the bug automatically. And then you would run the static analyzer again on the fixed code, <Ali: yes> and the static analyzer would say, oh, the bug is still there or the bug is not there. And you would keep iterating until the static analyzer says: yeah, now the bug has been fixed and there’s no more issues there.

So that was kind of an interesting paper; this paper literally came out like two weeks ago.

Sonal: So, for both of these papers you just referenced, Dan – the one from the University of Manchester and also the one that you guys recently wrote on LLMs not being trusted to write correct code (it could be working code, but not necessarily secure), I’ll link to those papers in the show notes <Dan: great> so that listeners can find them.

Just one quick question before we move on from this… So this is about the current state; is this a temporary situation, or do you think there will be a time when LLMs can be trusted to write correct – not just working, but secure – smart contract code? Is that possible or is that just like way far off?

Dan: That’s a difficult question to answer. You know these models are improving by leaps and bounds every week, right. <Sonal: Yah> So it’s possible that by next year, these issues will already be addressed, and that they could be trusted to write more secure code. I guess we’re saying that right now, the current models that we have (GPT-4, GPT-3, and so on), if you use them to generate code, you have to be very very careful and verify that the code that they wrote actually does what it’s supposed to do and is, is it secure.

Sonal: Got it.

Ali: And by the way, will we get to a point where the code that LLMs generate is less likely to contain bugs than the code that a human generates? <Sonal: yes!!!> And maybe that’s a more important question, right?

Because in the same way that you can never say that a self-driving car will never crash – the real question that actually matters is: Is it less likely to crash than if it were a human driver. <Sonal: That’s exactly right> Because the truth is that it’s probably impossible to guarantee that there will never be a car crash that is caused by a self-driving car, or that there will never be a bug <Sonal: right> that’s generated by an LLM that you’ve asked to write any code.

And I think this will only, by the way, get more and more powerful the more you integrate it into existing toolchains. So as we discussed, you can integrate this into formal verification toolchains. You can integrate this into other tools, like the one that Dan described where you have a tool that checks for memory management issues. You can also integrate it into unit testing, and integration testing toolchains… So that the LLM is not just acting in a vacuum: It is getting real-time feedback from other tools that are connecting it to the ground truth.

And I think that through the combination of machine learning models that are extremely big, trained with all of the data in the world – combined with these other tools – might actually make for programmers that are quite a bit superior than human programmers. <mhm> And even if they might still make mistakes, they might just be superhuman.

And that’ll be a big moment for the world of software engineering generally.

Sonal: Yeah. That’s a great framing, Ali…

So, what are some of the other trends that come in for where AI can help crypto, and vice versa.

Ali: Yah… One exciting possibility in the space is that we may be able to build decentralized social networks that actually behave a lot like Twitter might – but where the social graph is actually fully on chain and it’s almost like a public good that anyone can build on top of.

And you as a user, you control your own identity on the social graph. You control your own data, you control who you follow and who can follow you. And then there’s a whole ecosystem of companies that build portals into the social graph that provide users experiences that are maybe somewhat like Twitter, or somewhat like Instagram, or somewhat like TikTok; or whatever else they may want to build.

But it’s all on top of this same social graph <Sonal: yah> that nobody owns that there’s no billion-dollar tech company in the middle that has complete control over it, and that can decide what happens on it.

And so in that world, like that’s an exciting world because it means that it can be much more dynamic; and there can be this whole ecosystem of people building things and there’s much more control by each of the users over what they see and what they get to do on the platform. But there’s also the need to filter the signal from the noise. And there’s for example the need to come up with sensible recommendation algorithms that filter all of the content and show you a newsfeed that you actually want to see.

And this will open the door to a whole marketplace – a competitive environment – of participants who provide you maybe with algorithms, with AI-based algorithms that curate content for you. And you as a user might have a choice: You can decide whether to go with one particular algorithm, maybe the algorithm that was built by Twitter, or, you can also go with one that’s built by someone completely different.

And that kind of autonomy will be great – but again, you’re going to need tools like machine learning and AI to help you sift through the noise, and to help parse through all of the spam that inevitably will exist in a world where generative models can create all of the spam in the world.

Sonal: What’s interesting about what you said too is like it’s not even about choosing between – It goes back to this idea you mentioned earlier, and you mentioned this briefly about just giving users the options to pick from marketplaces of free ideas and approaches that they can decide –

But it’s also interesting because it’s not even only at a company-to-company level, it’s really just like, what approach works for you? Like you might be a person who’s maybe more interested in the collaborative filtering algorithm that was in the original form of original recommendation systems, which is like collaborative filtering across people <Ali: yah> so your friends’ recommendations are the things you follow.

When in fact, I personally am very different and much more interested in an interest graph; and therefore, I might be much more interested in people who just have similar interests to me – and I might pick that approach versus say, something else that’s sort of like hey this is a hybrid approach, your only thing it’s going to do is X Y and Z.

Just even being able to pick and choose that is already tremendously empowering. That’s just simply not possible, right now. And it can only be possible with crypto and AI. So that’s a great example. <oh yah>

Was there anything more to say on how AI can help with trust and security??

Ali: So I think that kind of the meta picture is that crypto is the Wild West. Because it’s completely permissionless, because anyone can participate – you kind of have to assume that whomever is participating might be an adversary <yes> and maybe trying to game the system or hack the system or do something malicious.

And so there’s much more of a need for tools that help you filter the honest participants from the dishonest ones <mm> – and machine learning, and AI, as an intelligence tool can actually be very helpful on that front.

So for example, there’s a project called Stelo, which uses machine learning to identify suspicious transactions that are submitted to a wallet – and that flags those transactions for the user before those transactions are submitted to the blockchain. And that could be a good way to prevent the user from accidentally sending all of their funds to an attacker, or from doing something that they will regret later. <mhm> And that company basically sells to wallets (to companies like Metamask) such that then Metamask can use the intel – and then do whatever it wants with it, either block the transaction or warn the user; or, sort of, reframe the transaction so that it’s no longer dangerous. And so that’s one example.

There are other examples as well in the context of MEV – which stands for minor extractable value <Sonal: yah> or maximum extractable value, depending on who you ask <Sonal: yah! chuckles> – which is the value that can be extracted by the people who have control over the ordering of transactions on a blockchain. And that’s often the miners or the validators of a blockchain.

And AI here – I mean, can cut both ways in that those participants – if you’re a validator on a blockchain, and you have control over ordering of transactions, you can do all sorts of clever things to order those transactions in such a way that you profit. You can for example front-run transactions, you can back-run transactions, you can sandwich orders on Uniswap. There’s a lot- a lot of transactions that you could craft such that you can profit from this ability of ordering transactions. And machine learning and AI might supercharge that ability because it can search for opportunities <yes> to capture more and more MEV.

But then on the other hand, machine learning may help in the other way: in that it may help as a defensive tool. You may be aware, before you submit a transaction, that there is MEV that might be extractable from that transaction. And so then maybe you will either split up your transaction into multiple transactions so that there isn’t a single validator <mhm> that can completely control it, or do something as a way of protecting yourself from an MEV extractor, at some point in the transaction pipeline.

So this is a way again where crypto plays a big role when it comes to security, when it comes to trust, when it comes to making the space more trustworthy to the end user.

Sonal: That’s an example of AI making things difficult in crypto, and then crypto coming back and making things better for… <chuckles>.

Dan: I actually have another example like that. <Sonal: yah!> So just like ML models can be used to detect fake data or maybe malicious activity <mm>, there’s the flip side where ML models can actually be used to generate fake data. <Sonal: yes>

The classic example of that is deep fakes, right – where you can create a video of someone saying things they never said, then that video looks fairly realistic. And the interesting thing is that actually blockchains can help to alleviate the problem. And so let me just walk you through one possible approach, where blockchains might be useful:

Imagine it’s a solution that might be only applicable to well-known figures like politicians or maybe uh movie stars and such <Sonal: mhm>. But imagine basically a politician would wear a camera on their chest, and kind of record what they do all day long <Sonal: yah> – and then kind of create a Merkle tree out of that recording, and push the Merkle tree commitments onto the blockchain. <mhm>

So now on the blockchain, there’s a timestamp saying you know on this and this date, you said such and such; on this and that date, you said such and such. And now if somebody creates a deep fake video of this politician saying things they never said; well, the politician can say look, at this time where the video said I said this and that, I was actually somewhere completely different, doing something unrelated.

And the fact that all this data, the real data, the real authentic data is recorded on a blockchain can be used to prove that the deep fake really is fake and not real data. Yeah?

So this is not something that exists yet. <Sonal: yup> It would be kind of fun for someone to build something like this – but I thought it’s kind of an interesting example where blockchains might actually be helpful <Sonal: yah> in combating deep fakes.

Sonal: Is there also a way to solve that problem and show other timestamps or provenance – where you can do that sort of verification of what’s true/ what’s not true without having to make a politician walk around with like a camera <Dan chuckles> on their chest.

Dan: Yes, absolutely. We can also rely on trusted hardware for this; <mhm> So imagine you know, our cameras, the cameras in our phones and such, they would actually sign the images and video that they take <Sonal: yup>.

There’s a standard, it’s called C2PA that specifies how cameras will sign data – in fact, there’s a camera by Sony that now will actually take pictures and take videos and then produce C2PA signatures on those videos… So now you basically have authentic data; you can actually kind of prove that the data really came from a C2PA camera.

And now Sonal if you maybe read a newspaper article, and there’s a picture in the newspaper article, and it claims to be from one place but in fact it’s taken from a different place; the signature could actually be verified the fact that it’s C2PA-signed.

There’s a lot of nuances there – C2PA is a fairly complicated topic – <mhm> that there’s a lot of nuances to discuss and maybe we won’t get into here;

Sonal: Dan I remember you talking about this work with me previously (I think it was at our offite); but I also remember from that that it doesn’t stand up to editing? And as you know, editorial people like me and other content creators – and honestly just about anyone (who uses Instagram or any online posting)… no one, like uploads anything purely, rawly like as they were originally created…

Dan: Yeah, typically when newspapers will publish pictures in a newspaper, they don’t publish the picture from the camera as-is; they will crop it. There’s like a couple of authorized things they are allowed to do to the pictures: maybe they grayscale it; definitely they downsample it (so that they don’t take a lot of bandwidth).

The minute you start editing the picture, that means that the recipient – the end reader, the user on the browser who’s reading the article – can no longer verify the C2PA signature. Because they don’t have the original image. <Sonal: Right!> So the question is, how do you let the user verify that the image they’re looking at really was properly signed by a C2PA camera.

Well, as usual, this is exactly where zero-knowledge techniques come in <Sonal: mmm!> – where you can prove that the edited image actually is the result of applying just downsampling and grayscaling to a properly signed larger image. Yah? And so instead of a C2PA signature, we would have a ZK proof – a short ZK proof – associated with each one of these images. And now the readers can still verify <mhm> that they’re looking at authentic images.

So it’s very interesting that ZK techniques can be used to fight disinformation. It’s a bit of an unexpected application.

Sonal: That’s fantastic.

Ali: A very related problem by the way, is proving that you’re actually human <Sonal: mhm> in a world where all of the deep fakes creating the appearance of humanity will generally outnumber humans a thousand to one or a million to one. And most things on the internet might actually be generated by AI.

And so one potential solution – which is related to what you’re saying – is to use biometrics to be able to establish that someone is actually human. But to then use zero-knowledge proofs to protect the privacy, of the people, who are using those biometrics to prove their humanity.

So one project in this category is called WorldCoin <Sonal: yah> – it’s also a project in our portfolio – and they use this orb; people may have seen this like shiny silver orb, that uses retinal scans as biometric information to verify that you’re actually a real human; and it also has all sorts of other sensors to make sure that you’re actually alive and that it can’t actually be a picture of an eye. And it’s this system that has secure hardware and is very difficult to tamper with –

Such that the proof that emerges on the other end, which is a zero-knowledge proof that obscures your actual biometric information, is very, very difficult to forge. And this way, politicians could for example <Sonal: mhm> prove that their video stream or that a signature of theirs or that a participation of theirs on some online forum is actually their own, and that they’re actually human.

Sonal: What’s really interesting about what you said, Ali – that’s a great follow up to what Dan was saying about ways to verify like authentic media vs. like fake or deep-fake media – and this world of infinite media (as you would say) that we live in –

But what are the other applications for proof-of-personhood-type technologies like that? I think it’s important, because this is actually another example of how crypto can help AI more broadly too. Going back…we’re kind of flipping back and forth here, <Ali: yah> but that’s okay because we’re just talking about really interesting applications, period so it’s fine.

Ali: Well that’s a really good question… One of the things that will become important in a world where anyone can participate online is to be able to prove that you are human – for various different purposes. There is that famous saying from the ’90s that “on the internet, nobody knows you’re a dog”. <Sonal: oh yah, yah> And I think maybe-maybe a reshaped form of that saying is that on the internet, nobody knows you’re a bot. <Sonal: uh huh?> And so then I guess this is exactly where proof of humanity projects become very important <Sonal: yah> – because it will become important to know whether you’re interacting with a bot or with a human…

For example, in the world of crypto, there’s this whole question of governance: How do you govern systems <yes> that are decentralized, that don’t have any single point of control, and that are bottom-up and community-owned? You would want some kind of governance system that allows you to control the evolution of those systems.

And the problem today is that if you don’t have proof of humanity, then you can’t tell whether one address belongs to a single human; or whether it belongs to a bunch of humans; or whether 10000 addresses actually belong to a single human, and are pretending to be 10000 different people.

And so today you actually kind of have to just use amount of money as a proxy for voting power, which leads to plutocratic governance. <Sonal: yah exactly> But if every participant in a governance system could prove that they’re actually human, and they could do so in a way that’s unique – such that they can’t actually pretend to be more than one human because they only have a single set of eyeballs – then the governance system could be much more fair, and less plutocratic, and can be based more on each individual’s preferences. Rather than on the preference of the largest amount of money that’s locked up in some smart contract.

Dan: Actually, just to give an example of that –

Today, we’re forced to use “one token, one vote” because we don’t have proof of humanity. Maybe we’d like to do one human, one vote – but if you can pretend to be five humans, then, of course, that doesn’t work. And so one example where this comes up is something called quadratic voting. <Sonal: yup>

So in quadratic voting, basically, if you want to vote five times for something, you have to you have to, kind of, put 25 chits down to do that. But, of course, you can do the same thing, you can just pretend to be five different people each voting once, and that would kind of defeat the mechanism of quadratic voting. So the only way to prevent you from doing that is this exact proof of humanity – where in order to vote, you have to prove that you’re a single entity rather than a Sybil of entities. And that’s exactly where proof of humanity would play an important role.

Generally, identity on-chain is actually becoming quite important for governance.

Sonal: Totally…  That by the way reminded me of an episode that you and I did Ali years ago with Phil Daian, <Ali: oh yah> remember, on “dark DAOs”? <Ali: yes… exactly> That was such an interesting discussion. Totally relevant there.

Ali: Totally.

Sonal: By the way, is the phrase “proof of personhood” or “proof of humanity”? What’s the difference, is it the same thing?

Ali: Ah yah, people use them interchangeably – proof of human, proof of humanity, proof of personhood.

Sonal: Yah, yah.

Okay, so keep going on this theme then of media, and this kind of “infinite abundance” of media, like what are other examples – and again, we’re talking about crypto helping AI, AI helping crypto <Ali: yah> – are there any other examples that we haven’t covered here where the intersection of crypto and AI can bring about things, that aren’t possible by either one of them alone?

Ali: Completely. I mean, another implication of these generative models is that we’re going to live in a world of infinite abundance of media. And in such a world, things like community around any one particular piece of media – or the narrative around a particular piece of media – will become ever more important.

Just to make this very concrete, there’s two good examples here: is building a decentralized music streaming platform enabling artists (musicians, essentially) to upload music, and to then connect directly with their communities – by selling them NFTs that give people in those communities certain privileges. Like for example: the ability to post a comment on the website on the track, such that anyone else who plays the song can also see the comment. (This is similar to the old SoundCloud feature that people might remember where you could have like this whole social experience on the music itself as it played on the website.)

It’s this ability to allow people to engage with the media and to engage with each other – Often in a way that’s economic because they’re essentially buying this NFT from the artist as a way of being able to do this. And they are, as a side effect, they are supporting the artist, and helping the artist be sustainable and be able to create more music.

But the beauty of all of this, is that actually gives an artist a forum to really interact with their community. And the artist is a human artist. And as a result of crypto being in the loop here: You can create a community around a piece of music – that wouldn’t automatically exist around a piece of music – that was just created by a machine learning model that’s devoid of any human element, that doesn’t have a community around it.

And so I think again – in a world where a lot of the music that we’re going to be exposed to will be fully generated by AI – the tools to build community and to tell a story around art, around music, around other kinds of media will be very important as a way of… sort of distinguishing media that we really care about and really want to invest in and spend time with – from media that may also be very good, but it’s just a different kind of media. It’s media that was just generated by AI with less of a human element. <yah> And it may be, by the way, that there’s some synergy between the two:

Like, it could be that a lot of the music will be AI enhanced or AI generated. But if there’s also a human element – like say for example if a creator leveraged AI tools to create a new piece of music – but they also have a personality on Sound, they have an artist page, they have built a community for themselves, and they have a following – then now you have like this kind of synergy between the two worlds. <yah> Where you both have the best music because it’s augmented by the superpowers that AI gives you; but you also have a human element and a story, that was coordinated and made real by this crypto aspect (which lets you bring all of those people together into one platform).

Dan: It’s really quite amazing that even in the world of music – just like we talked about in the world of coding, where you have a human coder that’s being enhanced by tools like Copilot that generate code – we are seeing things like this, where an artist is being enhanced by ML systems that help write (or at least parts of the music are being written and generated by an ML system). So it’s definitely a new world that we’re kind of moving into in terms of content generation – basically there’s going to be a lot of spam that’s generated by machine-generated art <yah>, which people might not value as much as they value art that’s generated by an actual human.

Maybe another way to say it is one of the points of NFTs was to support the artists. <Sonal: yes> But if the artists themselves are now machine-learning models, then who exactly are we supporting? <chuckles> <Sonal: yah… yah> And so it’s a question of how do we distinguish, how do we differentiate human-generated art that needs support <Sonal: Yes> versus machine-generated art?

Sonal: Well, this is a philosophical discussion for over drinks maybe – but I would maybe go so far as to say that the prompter is also an artist, of sorts. <Ali: yah> And in fact, I would make the case that that person is an artist. And the same thing has come about with – as this is a discussion and debate as old as time: and it just simply is new technologies, old behaviors. It’s the same thing that’s been playing out for eons. And the same thing’s playing out in the writing, etc., totally.

Dan: Very true.

Ali: Well that actually opens up the door for collective art <Sonal: mhm> – for art that’s generated through the creative process of a whole community, as opposed to a single artist;

There are actually already projects that are doing this, where: You have a process by which a community influences through some voting process on-chain what the prompt for a machine learning model like DALL-E will be. <mhm> Then you have DALL-E use that prompt to generate a work of art – maybe you generate not 1 work of art but like 10000 – and then you use another machine learning model, that’s also trained from feedback by the community, to pick from those 10000 the best one.

Right and so then now you have a work of art; that was generated from the input of the community; that was also pruned and selected from a set of 10000 variants of that work of art <yup> – also through a machine learning model that’s trained by the community – to then generate one work of art. That is kind of the product of this collective collaboration.

That’s incredible.

Sonal: I love it. Well guys, that’s a great note to end on; thank you both for sharing all that with the listeners of “web3 with a16z”.

Dan: Thanks Sonal.

Ali: Thank you so much.


The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles mnaged by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at

Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see for additional important information.