SP0256-AL2 speech synthesis
chip
What
The SP0256-AL2 is a chip designed to synthesize speech. As opposed
to chips which store sampled digital audio for playback, the sp0256
implements a digital filter that can simulate the allophones of
the English language. Under the control of a microcontroller such
as our BX-24, the sp0256 can be made to pronounce its 59 allophones
in any sequence, reasonably approximating an unlimited vocabulary,
and sounding like a robot run amok in the process. The chip does
not allow (directly) controlling pitch, rate, or inflection of speech.
How
The basic operation of
the sp0256 goes like this: tell it which allophone to say, tell
it to say it, then wait and see when its ready to accept a new allophone.
Using the sp0256 is relatively straightforward once we sort out
which pins we need and which we don't (there are some other operations,
such as communicating with an external ROM, that we don't need at
all). The pins we need are:
view a larger
schematic
|
- VDI
and VDD - two +5v pins
that power the chip. The chip is divided into two parts
- one that does the talking (powered by VDD), and one that
listens to our BX-24 (VDI). This is done to conserve power
- the talking part can be powered down when nothing is going
on, while the interface portion remains on and listening
for new data.
- VSS
- a fancy name for ground.
- A1-A8
- the eight address lines. These are used to select an allophone,
by putting a byte in parallel on the eight lines. Really
we only need A1-A6, because the chip only has 59 allophones
(and 5 different length pauses), so it only needs 6 bits
to address all possibilities. A7 and A8 can be grounded,
or used to explore the static that lies beyond allophone
63.
- ALD
- Address Load. The BX-24 uses this line to tell the sp0256
to load a new address. ALD is asserted low (we can tell
by the bar over the pin label in the schematic), meaning
when it goes to 0 the chip responds by "latching"
the values on its address pins into its memory; when its
1, the chip ignores its address lines.
- SE
- Strobe enable. The chip can actually respond to addresses
in one of two ways - the first being with the ALD pin as
described above. In the second, ALD is ignored, and the
chip instead responds whenever there is a change on any
of the address lines. We want to use the ALD method, so
we permanently connect SE to +5v.
- TEST,
SBYRESET, and RESET
- These pins are used by the BX-24 to reset the two sections
of the chip (see VDD and VDI), and to put the chip in test
mode. Like ALD, the resets are asserted low, so to put the
chip into normal non-reset mode we take those pins high.
TEST is asserted high, so we put that low for normal operation.
|
- LRQ
and SBY - Load request and
Standby. These to pins send information back to the BX-24 to tell
us what's going on. LRQ tells us when the sp0256 is ready to get
a new address from the BX by going low (another "asserted
low" pin); otherwise its high, telling us not to load another
address yet. SBY indicates that the chip has finished speaking
altogether and is ready to be placed in standby by powering down
the VDD section.
- OSC1
and
OSC2
- these connect to a 3.12 MHz crystal that provides a clock source
for the chip. Our BX works exactly the same way, but the crystal
is built in. For a lot of other chips (including the PIC chips)
we'll need to provide our own clock. Often the oscillator is shown
connected to ground through some small capacitors. For some as
yet unexplained reason nothing I make needs these capacitors to
work right, so I leave them out.
- Output
- the reason for all this work. Output is the audio signal of
the synthesized speech. Although most schematics for this chip
show a lot of fancy circuitry after the output, we can ignore
all that - that stuff is only used to filter and amplify the signal
in order to drive a speaker directly. We'll use a guitar amp or
computer speakers (which have an amp built in), so our signal
can just go straight to a jack.
Now that we know more specifically what pins do what, this is how
we use the sp0256. First, wire VDD and VDI to +5v, and VSS to ground.
Also wire SE to +5V to place the chip in strobe enable mode for
good. Connect TEST, RESET, SBYTEST and ALD to the BX-24 (pins 18,
17, 16, and 13 in my case). The BX will use these lines as output
to control the chip. Connect LRQ and SBY to the BX (14 and 15).
These will be inputs to the BX that will tell it what the sp0256
is up to.
Next setup the address lines. We will want to write an entire byte
across eight pins of the BX-24. We could do that with lots of individual
PutPin calls, but that would be cumbersome. As it turns out, there's
a lot easier way to do this.
Ports and Registers
Microcontrollers, including our BX, are full of internal registers,
which is just a fancy way of saying they have memory bytes that
have a special meaning or purpose. The bits in those bytes act as
little switches that control the machine that is our microcontroller.
When we start looking at programming the PIC or other microcontrollers,
we'll be dealing with registers 24-7 to set the chip up to do different
things. The BX makes it easy for us by hiding a lot of that detail,
but the functionality is still there when we need it, and now is
one of those times.
The I/O pins of the BX-24 are grouped into ports.
Port A, for example is all the analog pins - 13 through 20.
Port C is all the other I/O pins, 5 through 12.
Each port reflects the contents of a register - a byte somewhere
in memory. There is a register for Port C called... PortC!
Normally, when we call, say, PutPin(5, 1), the BX makes bit
7 of the PortC register a one for us. Now, if we could go
in ourselves and assign the value of the entire register -
which, remember, is just a byte with a special name - then
we could change the value of all the Port C pins at once.
|
|
It turns out that that is pretty easy. The syntax is just Register.PortC
= byte. That will set all the bits in the PortC byte at once,
and thus change the state of all the pins in one step. In terms
of using the sp0256, this is really useful, since in order to address
the chip - i.e., to tell it which allophone we want it to say -
we need to set eight address lines in parallel all at once. If we
make the eight address lines of the sp0256 correspond to the eight
output pins of the BX's Port C, then all we have to do to select
an allophone is say:
Register.PortC
= numberOfAllophone
and we're done.
Now that we have the address and control lines taken care of, the
only thing remaining to do is get at the audio output. To do that,
we simply connect the Output pin of the sp0256 to the signal tab
of an audio jack, and ground the ground tab of the jack to the ground
of our circuit. The audio jack could be an RCA connection, a 1/4"
guitar jack, or a 1/8" mini jack like for headphones. I used
a stereo mini jack, and wired the signal to both the tip and ring
tabs of the jack. That way I can use it with headphones or computer
speakers directly, or use a mini-1/4" mono adapter to take
it to a guitar amp. The spec sheet says we have to filter the square
wave output (hence all the capacitors in their setup diagrams),
but that's not true. It just sounds more robotic without the filter,
which is fine with me. Besides, we can always filter and/or otherwise
process the sound with other gear or software - no need to hardwire
in a filter.
Software Notes
Now the chips all set, and the only thing that remains is to write
the software to control it. The code for the BX in this project
is divided up into three files: main.bas,
which contains the Main routine and subroutines for doing stuff;
allophones.bas, that
contains constants and initialization routines related to the allophone
set of the chip; and phrases.bas, that has some hardwired phrases and words.
As mentioned, an allophone is selected by writing a byte to the
address lines connected to the sp0256. The spec sheet for the chip
lists all the allophones and their addresses. To have the chip say
the sound "/PP/" as in "Pow!", you write a 9
to Port C, which in binary is bx00001001. Allophones.bas
defines a set of constant mnemonics that makes programming
the chip much easier. It defines the constant "PP" as
being a byte equal to 9 (or bx00001001). Thus, if we want the chip
to say "P" we can write:
Register.PortC
= PP
instead of
Register.PortC
= 9
or worse
Register.PortC
= bx00001001
(The Say() subroutine is explained below). The other role of allophones.bas
is to initialize a set of arrays that group the allophones into
families, such as Voiceless Stops and R-Colored Vowels.
In main.bas, we have the meat
of the program. We define constants for all the pins that
interface with the sp0256, such as ALD and LRQ. As with the allophone
constants, this will make the code much cleaner, since we can refer
to Pin ALD instead of always having to remember which pin is connected
to the ALD pin of the sp0256 (was it 13? No, 12?...).
We also declare the arrays that will hold our phoneme groups, and
another array called AllophoneQueue that will be used to
say whole words or phrases. AllophoneQueue is used by the Add()
and Say() subroutines. Add adds a phoneme to the queue if
there is still room. Say takes the current allophone queue and steps
through it, sending one allophone at a time to the speech chip.
Between each allophone it watches LRQ to see when the chip is ready
to accept the next allophone. When Say has sent all the allophones
to the chip, it empties the queue and returns.
Cycle() accepts a reference to one of the phoneme groups,
and steps through each one at a preset rate.
Phrases.bas has some hardwired phrases. For example, to make the chip say "Fitter... happier... more productive", there is the sayFitterHappier routine:
public sub SayFitterHappier()
'"Fitter"
Add(FF)
Add(FF)
Add(IH)
Add(TT2)
Add(ER2)
Add(P200ms)
Add(P200ms)
Add(P200ms)
Say
'"Happier"
Add(HH1)
Add(AE)
Add(PP)
Add(YR)
Add(P200ms)
Add(P200ms)
Add(P200ms)
Say
'"More Productive"
Add(MM)
Add(OR1)
Add(PP)
Add(ER1)
Add(OW)
Add(DD2)
Add(UH)
Add(UH)
Add(KK3)
Add(TT2)
Add(IH)
Add(VV)
Add(VV)
Add(VV)
Add(P200ms)
Say
end sub
Next Steps
I'm less interested in the sp0256's ability to make understandable
speech than in its function as a digital filter that happens to
have vocal or speech-like qualities. I'd like to explore ways of
grouping and accessing the allophones to create abstract rhythmic
patterns. Finding meaningful patterns in the allophones and interesting
ways to access and arrange them will be key to this and will no
doubt require some degree of experimentation.
The sp0256 allows no specific control of the pitch, rate, or volume
of the speech output. In order to change the pitch, I've replaced
the fixed clock crystal with a controllable oscillator chip
from Maxim.
In order to control the volume programmatically I'm interested
in digital potentiometers and volume controls, such as those available
from Analog
Devices.
Finally, I've ordered two more chips. I'd like to create a device
that uses one microcontroller to utilize all three chips simultaneously,
for triphonic abstract vocal output.
|