Brief

DIYAudio Thread http://www.diyaudio.com/forums/showthread.php?s=&threadid=120463

Complete FIR Crossover can be Audiophile's dream, by it's sharp filter and linear signal phase.
but current FFT type of FIR crossover on PC loses time information of audio signal. If you use 10ms frame, time information in 10ms are lost.
Now nVidia CUDA GPU computing power can achieve very fast "Real" FIR conversion, resulting such a sharp filtering like below.

4way FIR crossover 

05/03/2008

Downloadable Package
for XP32 + VisualC++ 2005 runtime, Cuda capable GPU. 
unzip to C:\.
You can try FIR parameter creation / GPU FIR conversion.
NOTE: input wave file can not have list chunk. Please use EAC, "Exact Audio Copy" to rip CD.
If you have 7.1ch audio device, you can try to play with Microsoft Media or foobar etc.
most common way will be foobar / ASIO / Lynx AES-16. (I don't have - can not support)

06/21/2008

I bought GTX280 and studying CUDA 2.0 for another (non audio) project.
Crazy Fast, huge, Power eater, Expensive! and fan noisy :)
There are no change for this project, because no effect for 16bit output.
I think GeForce8400-9600 (fanless with Accelero) are enough, for this FIR Crossover.

Where to go

Components in red rectangle are implemented in this section.
Player / USB / I2S Converter : See other section, "Multichannel I2S output"
TAS5518 4 way amplifier : See other secntion, "TAS-4i"

 structure
Beginning

I understood that 4 way FIR crossover requires huge computing power.

44100 sample/sec * 2 (L,R) * 4 (way) * 1024 (TAPS) = 352.8 M tap calculation / sec
I found TI C6713 is not enough for this. (of course, possibility my programming knowledge is poor to achieve)
Maybe C6713 can be used for 2 way FIR crossover, but I want 4 way, and very accurate.
Blute FIR? I want "straight-forward" time domain, continuous FIR.

I was thinking with FIRCalc01.cpp (prototype source code , at FIR processor project) and made some tuning.
By 2.3GHz AMD Phenom CPU, it achieved 1100 M taps calc / sec, by 4 threads running pararrel.
Enough power? no. I want more power like below, and Phenom can not be Fanless PC, CPU must be degraded..

44100 sample/sec * 2 (L,R) * 4 (way) * 8192(Taps) =  2.9 G tap calculation / sec

I purchased GeForce 8800GTS, and downloaded CUDA - " NVIDIA CUDA Compute Unified Device Architecture " GPU computing tools.
After 1 week struggling.....

Here is the source code of  CUDA version FIR kernel routine.
CUDA_FIR_Kernel

the result.


GeForce 8800GTS and CUDA calculates 38.78 G TAPS FIR per second. simply, incredible. it's 140 times faster than one CPU thread.
Here is the test wave file, CPU version output, and GPU version output. also FIR coeffs and spectrum. Could you re-inspect?
Original Whitenoise.wav
Output_CPU.wav
Output_GPU.wav

FIR coeffs
Spectrum

There are fanless version VGA card like 8600GT. They only have 1/4 shader processors than 8800, but it will be enough so far.

Softwares

 FIR Parameter Generator

03.17.2008 Now performance is 79G TAPS calculation / sec, by loop unrolling tuning, with looking PTX assembler list.
 So even GeForce 8400 can reach 8-9 G TAPS calculation / sec. (easier to build fan less PC)

03/24/2008
FIR Converter ver 01
GPU Kernel
Main program and GPU kernel source code. (File conversion only)
already too much complicated by buffer management.
and kernel has it's own buffers. managing memory structure is key of performance, but it's suffer.

03/26/2008
Extended Wav format test routine. see MSDN / multi channel wav data format,  for detail.
8 channel wav file

Audacity can open and look this wav file. so maybe other players, like Windows Media Player, foobar, etc,  also can play.
This format will be the result of "Converter" mode output.
I will write my own player for Xylo-L FPGA board, so will not invest in how to setup foobar etc. 

03/30/2008
FIR Converter ver 02

How to generate 4 Way converted WAV file.
Freq01.txt for Channel 1,2
Freq02.txt for Channel 3,4
Freq03.txt for Channel 5,6
Freq04.txt for Channel 7,8
EQ01.txt has no EQ data now, but each channel may contain up to 8192  (= same as Tap count) equalizing point.

BAT file to generate FIR Coeffs. Build FIRParamGen for you.

Out_Coeff01.txt generated FIR Coeff parameter, for Channel 1,2
Out_Coeff02.txt generated FIR Coeff parameter, for Channel 3,4
Out_Coeff03.txt generated FIR Coeff parameter, for Channel 5,6
Out_Coeff04.txt generated FIR Coeff parameter, for Channel 7,8

Out_Freq01.txt calculated Frequency Responce, for Channel 1,2
Out_Freq02.txt calculated Frequency Responce, for Channel 3,4
Out_Freq03.txt calculated Frequency Responce, for Channel 5,6
Out_Freq04.txt calculated Frequency Responce, for Channel 7,8

WaveX.ini FIR Converter Parameter specification.
Global parameter: Ways 1-4, Bit Format 16/24, sampleRate=44100 only now. File generation = 1/0, realtime = 1/0 (current 0 only)
channel Parameter: Coeff File name, Channel Delay (per 1/44100=0.023ms), dB Offset.

Now FIR parameters are ready.
by executing "WaveX01.exe C:\\temp\\Debussy.wav c:\\temp\\wavex.ini c:\\temp\\DebussyConv.wav", FIR converter start working.
WAV file before conversion : (Free wav, sample data). 4.7MB
Extended WAV File after conversion. 27MB

This Extended WAV file can be opended by Audacity. So I think other softwares can open and (if you have 8 channel output) it can be played.

Actual Playing Sequence (for current. real time playing will be future issue.)
  1. GUI module will call WaveX01.exe to generate converted file. It will take about 30 seconds, for 5 minutes music.
  2. now 1st file is ready. player will play 1st file.
  3. on background, GUI Module and WaveX01 exe continue to convert other files in play list.
  4. If converted files are kept, no conversion anymore. (just take much HDD area, x 6 larger than original WAV)
  5. If parameter Files time stamp is newer than converted wav file, conversion must be done before playing.

4/14/2008
Suddenly I heard surprising noise from a certain CD.

Upper: Original terrible clipping WAV, Lower: after FIR, Channel 3/4(Mid Low).
When points in "clipping" area were smoothed by filter, the value exceeds (float)l.000 limit. Completely I forgot it.

Applying -1.5dB to all CD?
That is not a fundamental solution and it reduces dinamic range.
Limiter function should be added.

Modified Source
After modification..

If Wave value exceeds 0.95, Limiter is applied by Sigmoid Function (sigmoid (0.0,0.5) - (infinite, 1.0) are moved and acceralated to (0.95,0.95) - (infinite, 1.0). also negative side).
This is to get smooth deviation curve (or soft clipping), keen clipping point makes a spike noise which has wide frequency range.


04/20/2008

I wrote GUI interface for CUDA FIR programs.
Screen(Main) controls FIR Parameter, Conversion, and playback.
Tree and List can be drag & dropped to PlayList, or Convert text box.
global option has program and folder location setting.
FIR Option controls parameters to make coeff file.
Conversion dialog edits option for CUDA FIR converter.
Frequency Responce graphic display.

Program was written in VisualBasic 2005, I'm not sure source code's copyright. Some codes are brought from my disk area of business.

now I'm satisfied so far.

05/29/2008
I found VIA EPIA SN10000EG as VIA's new product. This is fanless VIA board with PCI Express slot.
looks nice....

Next if in the mood??

Digital Room Correction. FIR Parameter generater can process 8192 EQ points (5Hz accuracy) for digital room correction.
There are no IIR or phase modification. FIR coefficient parameter is conditioned by EQ gain at each frequency.
I have not been so much impressed by DRC, by TacT RCS. was it IIR and I felt much phase rotation?

return to home