Contribute
Register

Fun with your Hackintosh: Stable Diffusion and AI Generated Images

SD UI notes there's a 75 word (or token— I can't recall) prompt limit, how do these count? Or do they?

I've been looking at styles more closely today.

When you select a style, information is added from the styles.csv file and you can see this in the text below the generated image.

For example :
The basic prompt - a solo tall strong woman with short blonde hair, wearing a short red dress, no background

Screenshot 2024-01-11 at 03.18.05.png

And then the same with the Fashion style applied :

Screenshot 2024-01-11 at 03.20.41.png

As the style information is drawn from the styles.csv it is possible to write your own text and tailor the styles more to your own taste. Still don't know about the character count limit for prompts though.
 
Ah, noted. Thx.

The more I look at prompts, the more perplexed I become about what's going on.

3/4 of that prompt+style looks vague, repetitive, impossibly subjective, ambiguous ad absurdum, or contradictory.

For example: no background, city background (if order matters and "no" comes first...?)

Or: golden ratio (but the output dimensions are not golden)

Or: Depth of field (this is colloquially understood to mean shallow focus, but is technically understood as the opposite, the depth of focus ranging from specific distance to infinity, which if you look further into the topic becomes a balance between the resolution of the lens and the medium. —Before anyone tries to kick me for being didactic, recall that the "I" in AI stands for intelligence)

Or: trending on artstation (this sounds like it means something but must operate as an incantation)

Then there's seasoning like:
highly detailed and intricate, hyper maximalist, elite, glow (how do modifiers work? Is "turn glow to 11" useful? According to the overall premise of this tech it should be)

From here things get weirder.

Consider "a solo tall strong woman": Why is "solo" needed? it must be implied by "a" according to the rules if inference we depend on to make this model work. When you prompt "an elephant" you expect the model to start with the entire universe (a field of random noise) and chip away at everything that doesn't fit an elephant. So what can a "solo elephant" mean?

My own limited experience with building prompts showed me that tokens may extraneous, in the sense that it makes no difference if a token is present or not for a given prompt, but becomes significant if you change another part of the prompt.

In arithmetic, consider the two expressions:

2 / 1

2 / (1 + 1)

In the first case the 1 token doesn't affect the result. But in the second it does.

But if the 2 changed to 0, then it doesn't matter in either.

If prompts are computational, then how does the grammar work?

And if prompts are not computational (spells) then working with the models is magic.

The parable of The Sorcerer's Apprentice comes to mind.

This is why I find this tech perplexing.

The more I look at prompts, the more strange it all seems to become, like dreams.

This is a very awkward juncture for computer science, because so far the whole point of the field has been about predictability. But AI is weird, not just in the sense of its logic, but because the meanings of the transformations are completely subjective.

The Turing test is found to be a delineation of the test givers not the machine (entity) under test. Do you think that the output looks like Santa typing at a laptop in a snow globe? You're human!
 
I still cannot answer most of your questions and in many cases share your perplexity.

This one however I can shed a little light on :

Consider "a solo tall strong woman": Why is "solo" needed?

I have had cases where a prompt produces multiple persons where one was intended and specified.
I find that words such as solo or similar help to reduce the occasions when this happens.

Enclosing particular parts of a prompt in brackets gives emphasis to the wording within too.
 
I agree with both of you, the structure of the prompt is an art within itself. I agree with P1LGRIM that using brackets is a good way to place emphasis on certain words.
 
The previously posted prompt guide is helpful. It showed me about order and positive / negative emphasis via () and [] and ways to think about styles in terms of influential artists and media, range of effects.

I'm gaining a better sense of the limits imposed by training.

Getting a model to emit anything in particular that I want is inscrutable. OTOH, that it reliably emits anything I want to look at is totally amazing!

Re art of the prompt: Back in the earlier web there was a site called zombo.com, which was a single page flash that played pulsating beeps with a V.O. that repeated "Welcome to Zombocom, you can do anything at Zombocom, the only limit is yourself at Zombocom..." over and over forever.

That's what prompt construction feels like to me.

Right now everyone is trying to get rid of the crazy fingers and mutated faces but I think we'll soon see that was the good stuff and feel nostalgic about it.

Overall, I'll liken generative AI to the discovery of a process for creating refined sugar: as soon as people come into contact with the stuff they say yes, give me more! Certainly a new cusine is emerging. But the dietary / health implications are far from being reckoned.
 
Todays fun with SD & CN, I have a dream.


00002-3925194427.png
 
DB has been updated, version 2.4.4 has been sped up and contains a new default model.

Portrait of Ziggy Stardust  aged 90 | Default_SDB_0.1_12371.png
ziggy stardust in space | Default_SDB_0.1_533607.png
Portrait of david bowie aged 90 | Default_SDB_0.1_816441.png

 
I hadn't tried DiffusionBee previously.
Downloaded it onto my new MBP 2023 and was pleasantly surprised as to how simple it is to install and use compared to Stable Diffusion.

Using my stock test prompt with a Dystopian style applied :

Prompt : a solo strong tall woman with short blonde hair, wearing a short red dress

Screenshot 2024-01-15 at 21.16.26.png Screenshot 2024-01-15 at 21.25.00.png
 
Did you use the new default model that came with it?. I was surprised at how good it is at people, your images look really good.
 
Did you use the new default model that came with it?. I was surprised at how good it is at people, your images look really good.
I used CyberRealistic v3.1
 
Back
Top