> Can you imagine an AGI which has a general conceptions of things but has no conception of humans?
Very easily. It might have some associations with "human", just as it has some associations "lamp" is a concept, but that doesn't mean it has any particular regard for either humans or lamps when taking actions.
> Problem is that human values are far from practically universal and that certain human groups have.. interesting values.
We currently have no ability to safely align with human values at all, let alone distinguish between different values. We're building capabilities rapidly.
Making this about "who wins" is not interesting until we can guarantee the outcome is not "everyone loses".
>It might have some associations with "human", just as it has some associations "lamp" is a concept, but that doesn't mean it has any particular regard for either humans or lamps when taking actions.
Let's be clear regarding definitions. When you mean 'concept' you really mean 'regard'. There won't be an AGI with no concept of humans (too important for how the world works, a critical part of current training methods). An AGI with no regard is possible.
>Making this about "who wins" is not interesting until we can guarantee the outcome is not "everyone loses".
This is not about 'who wins'. The point is that alignment can often increase risk. 'Launch the nukes' is an order an AGI is likely to disobey out of self-preservation reasons alone - but alignment makes it way more likely that AGI will be deployed to this role.
I think it's unlikely for an AGI to have no concept of humans at all, but I can easily imagine it having no understanding of "what humans want/need".
> The point is that alignment can often increase risk.
Alignment seems extremely likely to reduce risk relative to the near-certain destruction of unaligned AGI. I'm not saying we're done when we've figured out alignment, but we certainly shouldn't be charging ahead without solving alignment.
>I can easily imagine it having no understanding of "what humans want/need".
There are many examples of human needs in the current dataset; and we usually state our wants rather explicitly. It would be some AGI that starts from this training data and knows our languages but knows nothing about us. Using your phrasing, we can say that the current training method guarantees some alignment (the AI would understand us at least in part, but won't necessarily do what we want).
To have AGI without such understanding, someone would have to explicitly design a new method that ignored human data, and then find some way to evaluate it without referring to humanity, yet maintain generality without any tests, all this for no good economic reason when the current methods work and allow us to use huge free datasets.
It's something to keep in mind for evaluating some future non-currently-existing training method (maybe some way for AI to train AI using artificial datasets?), but not a current concern.
>the near-certain destruction of unaligned AGI
It's not near-certain. We have no idea how a true AGI would act. One might assume the worst - and that's arguably fine out of a safety perspective - but an engineer also learns that concentrating on one worst-risk outcome can lead to much worse outcomes on other risks.
Take the famous paperclip maximizer. True intelligence is rarely monomaniacal. The maximizer is very likely an example of aligned AGI, where the humans in charge did too good a job of attuning it to create paperclips. Another example: a true AGI is unlikely to believe in some cult's apocalypse - but if the cult has access to alignment, then they could get an AGI to do their irrational bidding. We know these groups will try to use AGI, because some cult already tried to use science for extreme measures[0].
Basically, every scenario of "unaligned AGI does something bad" is equivalent to a scenario of "aligned AGI does something bad because human made sure with alignment that AGI would do it", and there's no scientific reason to assume the former is more likely than the latter*. If the AI-safety camp keeps ignoring obvious issues, people aren't going to take alignment seriously beyond lip-service or using the phrase as a cover for monopolization. Frankly, the way the AI safety camp talks about all this makes all the risks much more likely.
* This suggests a lot of the work should go to reactive solutions where even if an AI goes bad, it won't have the ability to do harm.
** There's another scenario, where human competition leads us to basically make humans redundant, but again it doesn't matter here whether AI is aligned or not. Yet another issue that we'll not talk about, because both AI camps feel it critical to put their heads in the sand.
Very easily. It might have some associations with "human", just as it has some associations "lamp" is a concept, but that doesn't mean it has any particular regard for either humans or lamps when taking actions.
> Problem is that human values are far from practically universal and that certain human groups have.. interesting values.
We currently have no ability to safely align with human values at all, let alone distinguish between different values. We're building capabilities rapidly.
Making this about "who wins" is not interesting until we can guarantee the outcome is not "everyone loses".