IgnatiusReilly
No bio...
User ID: 611
"Love" might be ah... bit strong for some of these books. I chose Cup of Gold to start with for instance because I like Steinbeck, and thought it would be fun to do his first novel for the site's first visual novel. But I wouldn't really say that that book is good. (Ambitious perhaps?)
And then since then, the order has been kind of just whatever book I feel like doing next, based on what I'm up for reading and re-reading and re-reading, though biased towards shorter books at first. (Gotta build my way up to doing those long Russian authors.)
And also I guess I've been biasing my selections against books with complex frame narratives or like epistolary-type formats, like Wuthering Heights or Dracula or Heart of Darkness or whatever, since even though those are all books I'd like to do, I want some more experience before figuring out how I'd approach them as visual novels.
The metadata is (poorly) documented in this blog post: Converting Books Into Visual Novels Part 0: The pulp.txt Format. It's part of a series of seven or so posts I'm still only partway through making, where I plan to walk through the whole process. Here are the other two currently completed as well: Converting Books Into Visual Novels Part 0.5: Creating book.txt and Converting Books Into Visual Novels Part 1: The First Edit — Creating the Starter pulp.txt. (Again, you'll have to excuse the really bad technical writing in these, since I'm very much not good at it!)
(Also, if you're looking for a book to reference, probably any of the others is better than Cup of Gold, since what with that one being the first, I was still figuring a lot of the details out, and hadn't realized that giving all the characters one-letter IDs was a stupid idea.)
The overall pulp.txt format is one I created myself, for a couple reasons:
First, I wanted an enforcement mechanism to ensure that, as part of editing, the original text doesn't end up accidentally getting changed. So I wanted that logic baked into the "compiler", that pulp.txt (the metadata-enhanced text) would reference book.txt (the original text), to make sure they add up to the same complete book. And that required custom code.
And then second, I just my own format so that I could always be able to add in all my own idiosyncratic parsing rules and not have to deal with the idiosyncratic parsing rules of other existing visual novel formats. (Which I admit isn't a great reason to create my own format, though to be fair, book-to-VN converting is a fairly idiosyncratic venture.)
I'm not sure I fully understand your questions about the git repo automation and character sheet enforcement, but maybe the blog posts will answer those questions. (If not though, feel free to ask again.)
As for the cost, it seems to be converging to about $200 in image API spend per book, though the number will vary depending on the length of the book and number of different background and characters and character outfits.
For the time, this latest book Pride and Prejudice (~122k words) took me almost a month, although admittedly that could have been shorter if I was more focused, and also wasn't still making updates to my processes. I think if I had everything ironed out correctly, a book of that length could take as little as two weeks to do, though probably not much shorter than that, without starting to lose quality and accuracy.
Yeah, it does take a fair bit of time, requiring basically three full close readings. The first to familiarize myself with the text (or re-familiarize, if it's been a while), the second to assign all the initial metadata to be used for image generating, and the third (after the image generating) to do a visual edit pass. Plus some smaller steps in-between.
Would that I could abridge the text to cut out tricky passages, or even just tweak certain annoying visual-defying wordings, it would make things a lot easier. But I think I share your feelings that, at that point, it would no longer be the original book or prose.
That does seem like the more accurate term, though I'll admit that I hadn't actually heard that one before. I'll probably stick with "visual novel" overall for the site and Github repos, since "kinetic novel" strikes me as being less descriptive and also kind of obscure (though I may be betraying my ignorance), but I could see it being a good idea to include in parentheses on the home page.
(And yeah, I don't have any plans to try and make these interactive, since I don't think there'd be a non-gimmicky way to do it.)
Currently I'm using Nano Banana Pro, which was the best model at the time I started the project. The more recently released GPT Image 2 is probably the better model overall now, but for what I'm trying to do, it has a few issues.
GPT's interpretation of "literary cross-hatching" style is more beautiful overall in isolation, using thin fine stippling, which gives it a really nice engraved look, especially for character sprites. Whereas Banano's style tends to use thicker lines and give a more exaggerated and perhaps less realistic overall appearance to characters. But even though style-wise Banano is the more amateurish one, I'd say its use of bolder/clearer lines with less detail makes it easier to more quickly understand what's going on, both in the backgrounds and in characters' expressions. So it wins from an information standpoint, even if less pretty.
Also: GPT seems to do a better job at following prompts details, like including everything you tell it to include. But, it seems to be worse at understanding how objects exist in space (or maybe it just screws up more often due to including more fine details), leading to more nonsensical elements in the backgrounds. (Banano does screw up physicality too, but less often.)
And I find it interesting that, no matter how I try to prompt the differences in style out of the models, even with style reference images, they really do just seem to have their own interpretations of cross-hatching that they can't help but converge to. (And that does make me a little worried that, should Google deprecate and replace Banano with a different model, the "upgrade" might have a fundamentally different style, which would lead to the website's art becoming inconsistent. But I'll figure out how to deal with that if it comes to it.)
And then the last weird about the GPT model is its bizarre fixation on cleavage. It can't help itself but to include prominent cleavage in all the female sprites, which... could be acceptable, were again, its art style not so distractingly detailed. But you end up with these perfectly spherical breasts contoured with perfect curve-following grids, which look like they belong in a calculus class. Beautiful! But distracting. (Banano is comparatively more modest, and also just does a better job diversifying its faces away from always being of the perfect form, which helps keeps characters identifiable at-a-glance.)
I've lately been working on a project around converting public domain novels into visual novels, unabridged. (Website: https://publicdomainpulp.com)
It's an idea I've had for a while, but it's only recently that image generator technology has gotten good enough to make the project viable. Viable, but not yet easy. At the start, I thought it would be easy, but there's a real scaling challenge in maintaining consistency when it comes to generating the hundreds of images a single book's visual novel needs.
Across the book, the style needs to be kept consistent. And the colors need to be consistent. And the image quality needs to be consistent. And the physical settings need to be consistent. And all this while being accurate to the details of the book, stated and implied, including period-accuracy. And of course: the images should be pleasant to look at.
None of these constraints on their own is super difficult to get right, but all of them together? That's when you start getting a lot of mistakes and having to do a lot of reprompting.
Not that I can blame all the mistakes of the image generator. Many of the screw-ups are fully mine: getting background details wrong, getting character details wrong, screwing up character expressions, screwing up relative resolutions, screwing up background framing, failing to make characters look unique, making characters look too unique, and so on.
The result being, the first three visual novels (Cup of Gold, The 39 Steps, and Pudd'nhead Wilson) I would describe as terrible, the next three (The Sun Also Rises, Jekyll and Hyde, and The Great Gatsby) as merely mostly terrible, the next two (The Mysterious Affair at Styles and The Secret Garden) as only just bad, and the most recent (Pride and Prejudice) as... perhaps approaching being okay.
It's a trend of improvement to be sure, but I'm also annoyed with myself by how long the results are taking to improve, and just how long I'm taking with the conversions in general. I really do want to have visual novels for every book ever written all the major novels of the public domain western canon, but at this rate, it's going to take some time. Especially if I want to get the results to the point of being good. And I also need to decide: do I go back and re-edit all the bad VNs? And if so, when, since I don't know if that's such a good use of marginal effort at the moment, with so many books still to do. But I also hate to leave up a bunch of garbage on the website.
Still, at least in terms of prose quality, the site currently has the nine best visual novels ever written. But visually, there's still many process improvements I need to make (including just getting better at slideshow editing), which hopefully the continued release of better image generators will help with. (I was optimistic about the recent GPT Image 2 at first, but it turned out to have... interesting issues.)
I think I'll do A Study in Scarlet next, where I'll try to tackle some of remaining sprite generation issues.
Doctorow's use of CC-BY-SA-NC licenses for his novels as opposed to the more widespread CC-BY-SA that e.g., Wikipedia uses doesn't sit right with me. The NC (non-commercial) in general strikes me as being like the Trotskyist provision of the creative-common/open-source world. "It's not enough that people can share my contents for free. They must also not be allowed to profit off it!"
Or maybe I'm the Trotskyist for thinking CC-BY-SA-NC isn't open enough. (But really, NC doesn't really have an analogue in the code-licensing world, and for good reason: way too ambiguous.)
That is pretty funny that they changed the official documentation to just say to use an agent.
Like, I can sort of see the idea: upgrading from .NET Framework always had too many edge cases for the dummy automated tools to ever really work fully correctly. There was always cleanup to do afterwards, so wouldn't it be nice to have an agent do that whole process instead? But if the recommended agent isn't actually smart enough to do it, then that's just giving up on maintaining any actual solution.
I'd agree with the other comment though that a normal non-agentic LLM could probably do the task way better. If your projects really are small, you could probably just go one at-a-time, concatting each .csproj file with all its .cs files into one big .txt file (labeling each file within the big text blob), feed that to Gemini Pro or some equivalent smart and big-context model, and let it give you back all the changes you need. Would still be a slower process than the CLI, though it might handle some of the edge cases better (especially where there aren't directly equivalent APIs).
In any case, the good news is that once you're through the .NET upgrade, you never need to worry about .NET Framework again. Only poor, poor Microsoft still needs to worry about .NET Framework. I kind of feel bad for them, except that it's also kind of their fault what with the bad communication that so many organizations are still on it.
For the web browsing problem, that's a solved problem with the new "My GPTs" feature (once you have access, which still might not be everyone yet?). The new default GPT has all the extra features enabled by default, including the (I would argue) very useful DALL-E feature and the (I would agree) not very useful web browsing feature. But you can pin "ChatGPT Classic" to disable all that and stick to strictly textual responses, or create a custom one to get your preferred combination of features.
I've just started to mess around with the custom GPTs, and while it doesn't seem to be functionally different than keeping around some preliminary instructions to copy-paste before your actual query, I'm finding that that seems to have an outsized difference in decreasing the mental barrier for me wanting to use it. Now I've got one dedicated to generating unit tests from pasted code (following preferred style guides), one for code review (outputting potential issues with suggested changes), and so forth. I'm pretty optimistic about generative AI from an ever-increasing utility perspective, so I find it hard to complain about the current state of things.
That said, I have also noticed a greater-than-chance series of factual errors in recent conversations. Interestingly, the latest one I can recall also involved an error in comparative measures (while discussing hypothetical US statehood): "As of my last update in 2023, the Philippines had a population of over 100 million people. This population size would make it the second most populous state in the U.S., surpassed only by California."
So maybe they tweaked some dial to improve some other metric, which by the impossible-to-comprehend inner machinations of text analysis wizardry, had a tradeoff that made this failure point more common. Or maybe it really is less factually accurate across-the-board, and these examples are just the easier ones to notice. Either way, it doesn't seem too bothersome at least for me, with my set of use cases, especially since I imagine an easy-to-notice regression like this will be pretty quickly taken care of. If not by OpenAI, then by the others in the ever-escalating arms race.
- Prev
- Next

Life+N for low numbers of N creates a perverse incentive to assassinate good authors in order to get their works into the public domain faster. I've got several authors who I'd love for to die soon, but the current Life+70 rule cross-referenced against actuarial odds disincentives direct action.
More options
Context Copy link