Photo of my breadboard

Schematic of the circuit

A note about registers

Code listing
- software notes
- main.bas
- allophones.bas

Listen
-The Voder, ca 1939
- Modern Speech Synthesis from AT&T Research

Definitions
Phoneme
Allophone

Links
- Ken Lemieux, who has some chips and crystals for sale.
- The original spec sheet.
- Some good documentation of experiments with the chip.

Related devices
- CTS256A-AL2, a text-to-phoneme companion chip.
- Mechanical and electronic speech synthesis machines.
- Many recordings of historic speech synthesis efforts.
- Record/playback and text-to-speech chips from Winbond.
- Some speech recognition boards from Jameco. "Speaker dependent" means you must train them to recognize your specific voice speaking specific words - a long way from natural speech processing.

 


 

 

SP0256-AL2 speech synthesis chip

What
The SP0256-AL2 is a chip designed to synthesize speech. As opposed to chips which store sampled digital audio for playback, the sp0256 implements a digital filter that can simulate the allophones of the English language. Under the control of a microcontroller such as our BX-24, the sp0256 can be made to pronounce its 59 allophones in any sequence, reasonably approximating an unlimited vocabulary, and sounding like a robot run amok in the process. The chip does not allow (directly) controlling pitch, rate, or inflection of speech.


How
The basic operation of the sp0256 goes like this: tell it which allophone to say, tell it to say it, then wait and see when its ready to accept a new allophone. Using the sp0256 is relatively straightforward once we sort out which pins we need and which we don't (there are some other operations, such as communicating with an external ROM, that we don't need at all). The pins we need are:

 

 

 

 

   view a larger schematic

 

  • VDI and VDD - two +5v pins that power the chip. The chip is divided into two parts - one that does the talking (powered by VDD), and one that listens to our BX-24 (VDI). This is done to conserve power - the talking part can be powered down when nothing is going on, while the interface portion remains on and listening for new data.
  • VSS - a fancy name for ground.
  • A1-A8 - the eight address lines. These are used to select an allophone, by putting a byte in parallel on the eight lines. Really we only need A1-A6, because the chip only has 59 allophones (and 5 different length pauses), so it only needs 6 bits to address all possibilities. A7 and A8 can be grounded, or used to explore the static that lies beyond allophone 63.
  • ALD - Address Load. The BX-24 uses this line to tell the sp0256 to load a new address. ALD is asserted low (we can tell by the bar over the pin label in the schematic), meaning when it goes to 0 the chip responds by "latching" the values on its address pins into its memory; when its 1, the chip ignores its address lines.
  • SE - Strobe enable. The chip can actually respond to addresses in one of two ways - the first being with the ALD pin as described above. In the second, ALD is ignored, and the chip instead responds whenever there is a change on any of the address lines. We want to use the ALD method, so we permanently connect SE to +5v.
  • TEST, SBYRESET, and RESET - These pins are used by the BX-24 to reset the two sections of the chip (see VDD and VDI), and to put the chip in test mode. Like ALD, the resets are asserted low, so to put the chip into normal non-reset mode we take those pins high. TEST is asserted high, so we put that low for normal operation.
  • LRQ and SBY - Load request and Standby. These to pins send information back to the BX-24 to tell us what's going on. LRQ tells us when the sp0256 is ready to get a new address from the BX by going low (another "asserted low" pin); otherwise its high, telling us not to load another address yet. SBY indicates that the chip has finished speaking altogether and is ready to be placed in standby by powering down the VDD section.
  • OSC1 and OSC2 - these connect to a 3.12 MHz crystal that provides a clock source for the chip. Our BX works exactly the same way, but the crystal is built in. For a lot of other chips (including the PIC chips) we'll need to provide our own clock. Often the oscillator is shown connected to ground through some small capacitors. For some as yet unexplained reason nothing I make needs these capacitors to work right, so I leave them out.
  • Output - the reason for all this work. Output is the audio signal of the synthesized speech. Although most schematics for this chip show a lot of fancy circuitry after the output, we can ignore all that - that stuff is only used to filter and amplify the signal in order to drive a speaker directly. We'll use a guitar amp or computer speakers (which have an amp built in), so our signal can just go straight to a jack.

Now that we know more specifically what pins do what, this is how we use the sp0256. First, wire VDD and VDI to +5v, and VSS to ground. Also wire SE to +5V to place the chip in strobe enable mode for good. Connect TEST, RESET, SBYTEST and ALD to the BX-24 (pins 18, 17, 16, and 13 in my case). The BX will use these lines as output to control the chip. Connect LRQ and SBY to the BX (14 and 15). These will be inputs to the BX that will tell it what the sp0256 is up to.

Next setup the address lines. We will want to write an entire byte across eight pins of the BX-24. We could do that with lots of individual PutPin calls, but that would be cumbersome. As it turns out, there's a lot easier way to do this.

Ports and Registers
Microcontrollers, including our BX, are full of internal registers, which is just a fancy way of saying they have memory bytes that have a special meaning or purpose. The bits in those bytes act as little switches that control the machine that is our microcontroller. When we start looking at programming the PIC or other microcontrollers, we'll be dealing with registers 24-7 to set the chip up to do different things. The BX makes it easy for us by hiding a lot of that detail, but the functionality is still there when we need it, and now is one of those times.

The I/O pins of the BX-24 are grouped into ports. Port A, for example is all the analog pins - 13 through 20. Port C is all the other I/O pins, 5 through 12.

Each port reflects the contents of a register - a byte somewhere in memory. There is a register for Port C called... PortC! Normally, when we call, say, PutPin(5, 1), the BX makes bit 7 of the PortC register a one for us. Now, if we could go in ourselves and assign the value of the entire register - which, remember, is just a byte with a special name - then we could change the value of all the Port C pins at once.

It turns out that that is pretty easy. The syntax is just Register.PortC = byte. That will set all the bits in the PortC byte at once, and thus change the state of all the pins in one step. In terms of using the sp0256, this is really useful, since in order to address the chip - i.e., to tell it which allophone we want it to say - we need to set eight address lines in parallel all at once. If we make the eight address lines of the sp0256 correspond to the eight output pins of the BX's Port C, then all we have to do to select an allophone is say:
Register.PortC = numberOfAllophone
and we're done.

Now that we have the address and control lines taken care of, the only thing remaining to do is get at the audio output. To do that, we simply connect the Output pin of the sp0256 to the signal tab of an audio jack, and ground the ground tab of the jack to the ground of our circuit. The audio jack could be an RCA connection, a 1/4" guitar jack, or a 1/8" mini jack like for headphones. I used a stereo mini jack, and wired the signal to both the tip and ring tabs of the jack. That way I can use it with headphones or computer speakers directly, or use a mini-1/4" mono adapter to take it to a guitar amp. The spec sheet says we have to filter the square wave output (hence all the capacitors in their setup diagrams), but that's not true. It just sounds more robotic without the filter, which is fine with me. Besides, we can always filter and/or otherwise process the sound with other gear or software - no need to hardwire in a filter.

Software Notes
Now the chips all set, and the only thing that remains is to write the software to control it. The code for the BX in this project is divided up into three files: main.bas, which contains the Main routine and subroutines for doing stuff; allophones.bas, that contains constants and initialization routines related to the allophone set of the chip; and phrases.bas, that has some hardwired phrases and words.

As mentioned, an allophone is selected by writing a byte to the address lines connected to the sp0256. The spec sheet for the chip lists all the allophones and their addresses. To have the chip say the sound "/PP/" as in "Pow!", you write a 9 to Port C, which in binary is bx00001001. Allophones.bas defines a set of constant mnemonics that makes programming the chip much easier. It defines the constant "PP" as being a byte equal to 9 (or bx00001001). Thus, if we want the chip to say "P" we can write:
Register.PortC = PP
instead of
Register.PortC = 9
or worse
Register.PortC = bx00001001

(The Say() subroutine is explained below). The other role of allophones.bas is to initialize a set of arrays that group the allophones into families, such as Voiceless Stops and R-Colored Vowels.

In main.bas, we have the meat of the program. We define constants for all the pins that interface with the sp0256, such as ALD and LRQ. As with the allophone constants, this will make the code much cleaner, since we can refer to Pin ALD instead of always having to remember which pin is connected to the ALD pin of the sp0256 (was it 13? No, 12?...).

We also declare the arrays that will hold our phoneme groups, and another array called AllophoneQueue that will be used to say whole words or phrases. AllophoneQueue is used by the Add() and Say() subroutines. Add adds a phoneme to the queue if there is still room. Say takes the current allophone queue and steps through it, sending one allophone at a time to the speech chip. Between each allophone it watches LRQ to see when the chip is ready to accept the next allophone. When Say has sent all the allophones to the chip, it empties the queue and returns.

Cycle() accepts a reference to one of the phoneme groups, and steps through each one at a preset rate.

Phrases.bas has some hardwired phrases. For example, to make the chip say "Fitter... happier... more productive", there is the sayFitterHappier routine:

public sub SayFitterHappier()
	'"Fitter"
	Add(FF)
	Add(FF)
	Add(IH)
	Add(TT2)
	Add(ER2)
	Add(P200ms)
	Add(P200ms)
	Add(P200ms)
	Say
	
	'"Happier"
	Add(HH1)
	Add(AE)
	Add(PP)
	Add(YR)
	Add(P200ms)
	Add(P200ms)
	Add(P200ms)	
	Say

	'"More Productive"
	Add(MM)
	Add(OR1)	
	Add(PP)
	Add(ER1)
	Add(OW)
	Add(DD2)
	Add(UH)
	Add(UH)
	Add(KK3)
	Add(TT2)
	Add(IH)
	Add(VV)
	Add(VV)
	Add(VV)
	Add(P200ms)
	Say

end sub

 

Next Steps
I'm less interested in the sp0256's ability to make understandable speech than in its function as a digital filter that happens to have vocal or speech-like qualities. I'd like to explore ways of grouping and accessing the allophones to create abstract rhythmic patterns. Finding meaningful patterns in the allophones and interesting ways to access and arrange them will be key to this and will no doubt require some degree of experimentation.

The sp0256 allows no specific control of the pitch, rate, or volume of the speech output. In order to change the pitch, I've replaced the fixed clock crystal with a controllable oscillator chip from Maxim.

In order to control the volume programmatically I'm interested in digital potentiometers and volume controls, such as those available from Analog Devices.

Finally, I've ordered two more chips. I'd like to create a device that uses one microcontroller to utilize all three chips simultaneously, for triphonic abstract vocal output.