Brief
DIYAudio Thread http://www.diyaudio.com/forums/showthread.php?s=&threadid=120463
Complete FIR Crossover can be Audiophile's dream, by it's sharp filter and linear signal phase.
but current FFT
type of FIR crossover on PC loses time information of audio signal. If
you use 10ms frame, time information in 10ms are lost.
Now nVidia CUDA GPU computing power can achieve very fast "Real" FIR conversion, resulting such a sharp filtering like below.
05/03/2008
Downloadable Package
for XP32 + VisualC++ 2005 runtime, Cuda capable GPU.
unzip to C:\.
You can try FIR parameter creation / GPU FIR conversion.
NOTE: input wave file can not have list chunk. Please use EAC, "Exact Audio Copy" to rip CD.
If you have 7.1ch audio device, you can try to play with Microsoft Media or foobar etc.
most common way will be foobar / ASIO / Lynx AES-16. (I don't have - can not support)
06/21/2008
I bought GTX280 and studying CUDA 2.0 for another (non audio) project.
Crazy Fast, huge, Power eater, Expensive! and fan noisy :)
There are no change for this project, because no effect for 16bit output.
I think GeForce8400-9600 (fanless with Accelero) are enough, for this FIR Crossover.
Where to go
Components in red rectangle are implemented in this section.
Player / USB / I2S Converter : See other section, "Multichannel I2S output"
TAS5518 4 way amplifier : See other secntion, "TAS-4i"

Beginning
I understood that 4 way FIR crossover requires huge computing power.
44100 sample/sec * 2 (L,R) * 4 (way) * 1024 (TAPS) = 352.8 M tap
calculation / sec
I found TI C6713 is not enough for this. (of course, possibility my
programming knowledge is poor to achieve)
Maybe C6713 can be used for 2 way FIR crossover, but I want 4 way, and
very accurate.
Blute FIR? I want "straight-forward" time domain, continuous FIR.
I was thinking with FIRCalc01.cpp (prototype source code , at FIR
processor project) and made some tuning.
By 2.3GHz AMD Phenom CPU, it achieved 1100 M taps calc / sec, by 4
threads running pararrel.
Enough power? no. I want more power like below, and Phenom can not be
Fanless PC, CPU must be degraded..
44100 sample/sec * 2 (L,R) * 4 (way) * 8192(Taps) = 2.9 G tap
calculation / sec
I purchased GeForce 8800GTS, and downloaded CUDA - " NVIDIA CUDA Compute
Unified Device Architecture " GPU computing tools.
After 1 week struggling.....
Here is the source code of CUDA version FIR kernel routine.
CUDA_FIR_Kernel
the result.

GeForce 8800GTS and CUDA calculates 38.78 G TAPS FIR per second.
simply, incredible. it's 140 times faster than one CPU thread.
Here is the test wave file, CPU version output, and GPU version output.
also FIR coeffs and spectrum. Could you re-inspect?
Original
Whitenoise.wav
Output_CPU.wav
Output_GPU.wav
FIR coeffs
Spectrum
There
are fanless version VGA card like 8600GT. They only have 1/4 shader
processors than 8800, but it will be enough so far.
Softwares
FIR
Parameter Generator
03.17.2008 Now performance is 79G TAPS calculation /
sec, by loop
unrolling tuning, with looking PTX assembler list.
So even GeForce 8400 can reach 8-9 G TAPS calculation / sec.
(easier to build fan less PC)
03/24/2008
FIR
Converter ver 01
GPU
Kernel
Main program and GPU kernel source code. (File conversion only)
already too much complicated by buffer management.
- Wave File itself
- conversion buffer
- Input frame
- coeff buffer
- output buffer
- output conversion buffer
and kernel has it's own buffers. managing memory structure is key of
performance, but it's suffer.
03/26/2008
Extended
Wav format test routine. see MSDN / multi channel wav data
format, for detail.
8
channel wav file

Audacity can open and look this wav file. so maybe other players, like
Windows Media Player, foobar, etc, also can play.
This format will be the result of "Converter" mode output.
I will write my own player for Xylo-L
FPGA board, so will not invest in how to setup foobar etc.
03/30/2008
FIR
Converter ver 02
How to generate 4 Way converted WAV file.
Freq01.txt
for Channel 1,2
Freq02.txt
for Channel 3,4
Freq03.txt
for Channel 5,6
Freq04.txt
for Channel 7,8
EQ01.txt
has no EQ data now, but each channel may contain up to 8192
(= same
as Tap count) equalizing point.
BAT file to generate FIR Coeffs.
Build FIRParamGen
for you.
Out_Coeff01.txt
generated FIR Coeff parameter, for Channel 1,2
Out_Coeff02.txt
generated FIR Coeff parameter, for Channel 3,4
Out_Coeff03.txt
generated FIR Coeff parameter, for Channel 5,6
Out_Coeff04.txt
generated FIR Coeff parameter, for Channel 7,8
Out_Freq01.txt
calculated Frequency Responce, for Channel 1,2
Out_Freq02.txt
calculated Frequency Responce, for Channel 3,4
Out_Freq03.txt
calculated Frequency Responce, for Channel 5,6
Out_Freq04.txt
calculated Frequency Responce, for Channel 7,8
WaveX.ini
FIR Converter Parameter specification.
Global parameter: Ways 1-4, Bit Format 16/24, sampleRate=44100 only
now. File generation = 1/0, realtime = 1/0 (current 0 only)
channel Parameter: Coeff File name, Channel Delay (per
1/44100=0.023ms), dB Offset.
Now FIR parameters are ready.
by executing "WaveX01.exe C:\\temp\\Debussy.wav c:\\temp\\wavex.ini
c:\\temp\\DebussyConv.wav", FIR converter start working.
WAV
file before conversion : (Free wav, sample data). 4.7MB
Extended
WAV File after conversion. 27MB
This
Extended WAV file can be opended by Audacity. So I think other
softwares can open and (if you have 8 channel output) it can be played.
Actual Playing Sequence (for current. real time playing will be future
issue.)
- GUI module will call WaveX01.exe to generate converted
file. It will take about 30 seconds, for 5 minutes music.
- now 1st file is ready. player will play 1st file.
- on background, GUI Module and WaveX01 exe continue to
convert other files in play list.
- If converted files are kept, no conversion anymore. (just
take much HDD area, x 6 larger than original WAV)
- If parameter Files time stamp is newer than converted wav
file, conversion must be done before playing.
4/14/2008
Suddenly I heard surprising noise from a certain CD.

Upper: Original terrible clipping WAV, Lower: after FIR, Channel 3/4(Mid Low).
When points in "clipping" area were smoothed by filter, the
value exceeds (float)l.000 limit. Completely I forgot it.
Applying -1.5dB to all CD?
That is not a fundamental solution and it reduces dinamic range.
Limiter function should be added.
Modified
Source
After modification..

If Wave value exceeds 0.95, Limiter is applied by Sigmoid Function (sigmoid
(0.0,0.5) - (infinite, 1.0) are moved and acceralated to (0.95,0.95) -
(infinite, 1.0). also negative side).
This
is to get smooth deviation curve (or soft clipping), keen clipping
point makes a spike noise which has wide frequency range.
04/20/2008
I wrote GUI interface for CUDA FIR programs.
Screen(Main) controls FIR Parameter, Conversion, and playback.
Tree and List can be drag & dropped to PlayList, or Convert text box.
global option has program and folder location setting.
FIR Option controls parameters to make coeff file.
Conversion dialog edits option for CUDA FIR converter.
Frequency Responce graphic display.
Program
was written in VisualBasic 2005, I'm not sure source code's copyright.
Some codes are brought from my disk area of business.
now I'm satisfied so far.
05/29/2008
I found VIA EPIA SN10000EG as VIA's new product. This is fanless VIA board with PCI Express slot.
looks nice....
Next if in the mood??
Digital Room Correction. FIR Parameter generater can process 8192 EQ points (5Hz accuracy) for digital room correction.
There are no IIR or phase modification. FIR coefficient parameter is conditioned by EQ gain at each frequency.
I have not been so much impressed by DRC, by TacT RCS. was it IIR and I felt much phase rotation?
return to home