My friend Fraser took a large number of AVC blog posts over the years and trained an AI model on them.

The result is a blog written by a machine.

You can see it here.

One one hand, it is kind of amazing that you can train a machine to write like someone.

On the other hand, I don’t think I will be out of a job anytime soon.

#machine learning

Comments (Archived):

  1. kenberger

    yeah someone also did that for the AVC comments section, and the peanut gallery here probably won’t have any interest from anyone else, real or virtual, anytime soon 😉

  2. Tom Labus

    No Knicks!!!

  3. William Mougayar

    How is it actually done?

    1. sigmaalgebra

      A guess: Just take all the actual sentences and the words. Take the 100 or so most frequent non-trivial words, say, the key words, and the sentences heavy on those words. Two such sentences are similar if they have similar key words. The output is those sentences in some order determined by the similarity.

    2. kidmercury

      if fraser can chime in that would be great. my guess is he took all of fred’s posts, prepped them for a training set that could be fed to a sequence to sequence neural net on sagemaker (…. then with teh trained model you give it a setence, and it predicts the next chunk of text. so maybe he fed it the title and it generated the post.

      1. Fraser

        This is directionally right! The past year has seen incredible advances in AI for natural language processing. There are now pre-trained language models that can be fine tuned on specific data sets.GPT2 is a transformer-based language model that OpenAI created and released. I scraped a decade of posts from AVC and used those to train GPT2 for 10 hours using an advanced GPU. The resulting model generates text.I’m not an engineer but modern tooling has made it possible to do this with some technical sophistication and a few hours of effort. I’m not an engineer and this was (somewhat) accessible.

        1. kidmercury

          awesome, thanks a lot for sharing. one of the projects i had discussed with a colleague of mine is to do stylistic language transformation using this same type of neural net. so you feed it a post from fred, and it responds with that translated to a different style of text, i.e. country south or hip hop. i will let you know if/when we get around to it so we can all have fun exploring what fred’s blog would sound like if it fred was a rapper from arkansas 😀

          1. Fraser

            OpenAI did something similar to this for music with GPT2.You can have the model generate a, say, country song, but use the first few notes from lady gaga’s poker face as the prefix.

          2. kidmercury

            was not aware of that, thanks for the link! i do think there is a great opportunity for “style transfer as a service” — there is more of that in images, apparently some of that in sound as evidenced by the link you shared, and i think a lot of cool stuff will come in text at some point if not already.

    3. Michael Brill

      I think most NLG research is based on transformer models now (…. You can play with an example model here: https://transformer.hugging… – Just enter a sentence and start hitting tab. Of course it wasn’t trained on a useful corpus, but I think you can see what would happen if it were trained on the avc data set.

      1. William Mougayar

        Interesting. Thank you.

    4. Sierra Choi

      OT but William, I’m wondering if I could contact you via email related to IEO? You seem to be quite the expert on all things crypto and I would like to get your quick input on something.

  4. Erin

    The Washington Post used AI to write a newspiece and they put it on the front page a few years ago. Newspaper companies have been experimenting with AI for a few years now. Kind of creepy the lengths owners will go to save money. I can’t stand how creepy AI is. I always say, just because you can do it, doesn’t mean you should. Is there a better, more humane way of using AI other than taking over jobs?

    1. sigmaalgebra

      If you are worried, then you have been taken in by the irresponsible hype. Advice: F’get about it.E.g., so far their best tool is regression analysis, and it is 100+ years old. At the time of the first computers, regression had a flurry of interest, but it didn’t take over the world.Closer is just old, and closely related, principal components factor analysis. The empirical evidence is that human personalities have only about 14 dimensions. We don’t really know what each of the 14 mean, but the 14 can have good predictive value. Uh, of these 14, the most important one is IQ. Factor analysis has not taken over the world, either.Math, science, and parts of engineering are, and for long have been, just AWASH in techniques MUCH more powerful than AI, and those techniques have enormously helped the world. E.g., the Navier-Stokes equations are from Newton’s laws and are for fluid flow. The equations are challenging to solve numerically and apparently impossible to solve in closed form or well approximately by, say, series techniques, even locally (I tried once). Still it’s possible just to look at the equations, see how they scale, and derive Reynold’s number. From that can look at the thin wings of the Wright Brothers, see that their scaling from their wind tunnel in the back of their bicycle shop was wrong, and that we should have thicker wings. When we did, we got wings with more lift and less drag and got rid of biplanes.

  5. sigmaalgebra

    “Artificial”? YES!!! “Intelligent”? NO, not even as much as a weak little hollow hint of a tiny clue, not even as much as the dumbest mammal.Uh, Newton had real intelligence, Soooooo, let’s see the current work in AI without use of Newton’s (i) force equals mass times acceleration, (ii) law of gravity, and (iii) calculus so accurately predict the motions of the planets. Curve fitting just will NOT cut it; e.g., there is nowhere near enough data.The curve fitting is dumb, brain-dead, nothing behind it but does it fit for the “training data” and the “test data”. Useful? At times, okay. “Intelligent”? Not a chance.In strong contrast there is, and may I have the envelope please. Drum roll, please. And the winner is mathematics, with theorems and proofs. Then for applications, the real world provides the hypotheses of the theorems, and the consequences of the theorems, established with proofs, provide the results needed for the applications.Uh, let’s see AI, rediscover, state, and prove the Lindeberg-Feller version of the central limit theorem. Here’s an easy one: Given the set of real numbers R, a positive integer n, real Euclidean space R^n, and a set C closed in the topology of R^n, show that there exists a function f: R^n –> R so that f is 0 on C, positive otherwise, and infinitely differentiable. Use that to settle an issue in the famous paper in mathematical economics by Arrow, Hurwicz, and Uzawa and show that in the constraint qualifications for the Kuhn-Tucker conditions the Kuhn-Tucker and Zangwill constraint qualifications are independent. Come on, AI, let’s get some actual intelligence in here!Here’s another one: Suppose for i = 1, 2, …, n, linear f_i: R^n –> R, and letC = { x | x in R^n and f_i(x) <= 0 for all i }Then, given linear g: R^n –> R where g(x) for x in C is bounded above, there exists u in C so that for all x in C g(x) <= g(u). That is, bounded above linear g achieves its least upper bound.Outline of a novel proof: From linearity argue that C has only finitely many extreme points and that we need to consider only those. Done.Just how to argue from linearity is more involved.The easiest solution is via the simplex algorithm of linear programming with Bland’s rule.AI is welcome to try to discover, state, and prove those results.The same linear programming can also provide a quite easy proof of the von Neumann-Morgenstern saddle point result in two person game theory — again would be impressive to see AI discover, state, and prove that result. AI is good at games. Okay, AI, let’s play a larger version of paper, scissors, and rock. I already know how to play. Let’s see how long it takes you to figure it out, how many million games with your neural network nonlinear curve fitting!!!!!AI, let’s go for the big time, one close to some of the dreams of computer science: Discover and present an algorithm that shows that, in the theory of computational time complexity, P = NP. I’m waiting. I’ve been waiting. We’re all waiting. Come ON AI!!!!! The big time is right at hand!!!!

  6. gmalov

    How does does this make you feel? A machine, learning to express your thoughts and words…

    1. sigmaalgebra

      If turn this question around a little, there is a way for some advanced pure math to give you art to express your feelings. Alpha test promised here!

  7. Emily Steed

    I wonder if this exercise reveals a fundamental question about the future of AI related to qualitative content (as opposed to quantitive or metrics-related AI). The AI written blog mimics the pattern of your writing fairly well, but without the substance. You communicate ideas. The AI blog doesn’t actually say anything. There is no there there. This worries me about the future of AI for financial crimes work that calls for strategic thinking.

    1. sigmaalgebra

      Yes, a Holy Grail problem in computer science, natural language processing, and AI is understanding the meaning of text. So, yup, right away you and JLM saw that the AI produced text was gibberish, had no meaning.But with some advanced pure math (NO computer science, natural language processing, or AI) there is a nice, practical solution. Alpha test promised here on AVC!

      1. Emily Steed

        Interesting. That makes me wonder if there is a category of thinking / ideas that is distinct from math? Or is everything math? I will be interested to read an AI piece that attempts to create a new idea, as opposed to mimicking a pattern.

        1. sigmaalgebra

          That makes me wonder if there is a category of thinking / ideas that is distinct from math? Or is everything math? For a fast view, math is theorems and proofs based on sets where the sets are usually from the work of Georg Cantor and then others, including von Neumann, and resulting in the Zermelo-Fraenkel axiomatic set theory and your choice, either with or without the axiom of choice.So, with that definition, there’s apparently essentially no limit to how many theorems can be proved.So, there can be a lot of math.For software being math, one can try to have a proof of correctness. As I studied that, it was usually a lot of induction arguments. These are based on a definition of the natural numbers in terms of, we have 1 and for each natural number n, n + 1 is a natural number, and the result is the set of natural numbers (this is made very precise in axiomatic set theory). So, if want to prove that something holds for all natural numbers, then show that it holds for 1 and for each n, if it holds for n, then it holds for n + 1 — presto, bingo, just from the definition of the natural numbers, it has to hold for all natural numbers.So, with such induction, can show that some cases of software are correct.For realistically complicated code, the induction arguments get really messy. Next, and an even bigger problem, commonly we do not have a full, precise statement of the inputs to the code or the corresponding outputs so that really can’t argue that the inputs give the outputs and, thus, the code is correct.So, in principle can use some math to show some code is correct but in practice, not so much.For intelligence as good as, say, a kitty cat, we don’t have a statement of everything the cat does that is precise enough to argue mathematically.For current AI, e.g., curve fitting (linear regression, non-linear curve fitting), we can start with some training data, get a fit, and check with some test data. We can make math out of this if assume, say, that all the data is independent and identically distributed (the usual i.i.d.) random variables. And with some more work maybe could come up with some confidence intervals although so far that appears to be rare in both theory and practice.But then, for an application of the fit, a biggie practical question is, does the data in the application meet the i.i.d. assumption of the fitting? It appears that too frequently in practice the answer is no. For something like self-driving on current roads in current traffic, weather, etc., likely (i) the fit is not good enough in enough cases of the i.i.d. data, (ii) the real data is not like the i.i.d. data, and (iii) the result is dangerous. That is, in simple terms the AI curve fitting was seen to work in 100,000 cases of driving and then applied to 100,000,000 cases of driving that included some extreme cases not in the 100,000 cases in the sample. Can kill people doing that.Might do some math to make this challenge more clear. It might be that can show mathematically with meager and realistic assumptions that to have the trained AI work in 100,000,000 cases, essentially MUST train on all 100,000,00 cases — no simple random sample can suffice. But we will need self-driving AI to work in more than 100 billion cases — then for my conjectured math result anything like the methodology of self-driving AI is doomed to kill people.Generally we want progress. Generally, now, we have what seems like a LOT of data. And we want some results. So, we want to know how to manipulate the data to get the results.We can try common sense, heuristics, curve fitting, whatever, but at times, and maybe quite generally in time, we will want better methods.Well, the better successes of math in the past indicate that math, when we can find or invent the right math for a particular problem, is a MUCH better approach, really the only really good approach we have.If we can get by with common sense, heuristics, and empirical curve fitting, then okay. Otherwise, for something better, it’s math.Sorry ’bout that.For my work, project, startup, alpha test to be announced, etc., I have found some advanced pure math and derived (yes, with theorems and proofs) some apparently new applied math apparently with assumptions that can be met in the intended application and with results that users should find valuable. So, it’s an applied math project. It’s the theorems and proofs that give me confidence. That the math shows that the work should do well on the Holy Grail problem of meaning is a bit surprising until actually see the math. Then the answer is, “sure”.The math is my coveted technological barrier to entry. People who could understand my math are not interested in entrepreneurship. People interested in entrepreneurship don’t know the math, wouldn’t know what math to study, and would need years, ballpark a decade, if they did know what math to study.I’ve done a lot in math and from teaching, etc. seen reactions of a lot of people and, thus, have a good idea how difficult the relevant math is for nearly everyone in the society. When I was trying to teach or show others my work, their difficulties in understanding were a real pain. E.g., at FedEx, I had a project that should have saved the company ballpark 15% of direct fleet operating costs — big bucks. They didn’t take me seriously, but there was and is good evidence that I was correct. And there have been other such cases. Now their difficulties are a real technological barrier to entry and an advantage!Also, I’ve learned in rock solid terms that no venture capital firm will fund a project where the technology is just theorems and proofs on paper. So, I can’t get competition from venture funded startups before they are successful against me in the market, even if they figure out my math and duplicate it.That venture firms would not consider my math was a pain. Now that I see my way to good earnings (if user actually do like my work), that view of the venture community is an advantage!More generally, US business just does NOT see some math on paper as an example of technology. Of course, the US DoD long has. E.g., at…isIt is still an unending source of surprise for me how a few scribbles on a blackboard or on a piece of paper can change the course of human affairs.Stanislaw Ulam He was right!Smells like opportunity to me!I like Ulam! He has a very general result the French probabilist Le Cam called tightness. There is a good presentation inPatrick Billingsley, Convergence of Probability Measures. John Wiley and Sons.I used the result in a paper I published on mathematical statistics.ForI will be interested to read an AI piece that attempts to create a new idea, as opposed to mimicking a pattern. Well, maybe the AI community would let me call my work AI. I won’t call my work AI because I would regard that as an insult to my work. My work is just some pure/applied math, and history is awash in drop dead gorgeous examples of pure/applied math. E.g., in my recent move I lost my copy ofDavid G. Luenberger, Optimization by Vector Space Methods, John Wiley and Sons.So I recently got a new copy. It’s a buffet of gorgeous pure/applied math, basically the Hahn-Banach theorem. I studied that theorem and the rest of the basics of Banach space well enough, but this book has its buffet of applications. FUN reading!So, I’d much rather be an example of math than the junk of AI. And some of the pure math I’m using is gorgeous stuff. One would say, “NO WAY, NOHOW could any such thing be true.”. But it is true! And there are gorgeous proofs that remove all doubt.So, if you want to call my work AI, then maybe you can get “a new idea”!

  8. LIAD

    I dunno Fred, some of those Sunday morning blog posts over the years have been a little dicey.

  9. David A. Frankel

    Kinda funny actually. It’s as if 8th grade Fred Wilson had a blog and was writing about his imagined future….

  10. jamiew

    I love that he hosted it on Tumblr

  11. Chris Phenner

    I laughed out loud in a public space within 30 seconds of visiting Fraid Wilson. Oh my Lord was that good humor, and a relief to see its lack of humanity 🙂

  12. JLM

    .The AI version — Fraid Wilson — has no “voice.” It is really just gibberish.Still, some of those Saturday morning videos might be replaced by Fraidie Wilson?JKJLMwww.themusingsofthebigredca…

    1. Mario Cantin

      I enjoy the Saturday videos 🙂

      1. JLM

        .Great for you, Mario.I used to love rappelling from hovering helicopters head first, Australian style.To each their own.I really love the Funding Fridays. You?JLMwww.themusingsofthebigredca…

        1. Mario Cantin

          Ha ha! Actually, that’s my weekly day off from AVC!

        2. jason wright

          Head first. Explains so much.

          1. JLM

            .I did come up 20′ short one time in the dark, but luckily I was in some deep grass.JLMwww.themusingsofthebigredca…

  13. LIAD

    just noticed the post title! Badass.

  14. Mario Cantin

    The singularity is still far away!

  15. Jeremy Shatan

    AI’s make typos? Shocking.

  16. jason wright

    I demand an immediate Turning test, or my brain equity returned. ‘Dead Wilson’?

  17. Will Luttrell

    Fred, I am struck by the way the bot was able to mimic your writing style. You use a lot of short, declarative sentences, as did the AI. I wonder if this style is easier to mimic than others.

  18. jason wright“Joined December 2006″”@Fraser hasn’t Tweeted”Have you met him in person?

  19. Sierra Choi

    The “voice” of the AI sounds altogether different, but it’s got some of Fred’s catchphrases right, “This is the way I’ve always done it, and it works for me” so I would say not bad. I have to say though that the AI voice sounds a little more “pedantic” whereas I would say Fred tends to sound more humble and down to earth. Maybe a little more discrete categorisations in voice “tone” is in order.

  20. ChimpWithCans

    Haha….How many AI computers, left in a room with a typewriter, would it take to write out the works of Shakespeare?

  21. cavepainting

    Interesting but it is like a bunch of disconnected sentences with no real meaning. Thank God, the human brain is still relevant.

  22. Jan Schultink

    I wonder if you get better results if you let the AI bot loose on some blogs purely written for Google/SEO, you know them when you see them: “Write more engaging social content in 3 easy steps”

  23. John Fitzpatrick

    The New Yorker did the same thing in October, and this was the article’s author’s reaction:”It hurt to see the rules of grammar and usage, which I have lived my writing life by, mastered by an idiot savant that used math for words. It was sickening to see how the slithering machine intelligence, with its ability to take on the color of the prompt’s prose, slipped into some of my favorite paragraphs, impersonating their voices but without their souls.”It’s a good read: