Re: [math-fun] Intel to incorporate FPGA in new Xeon processor
I think this is a superb development, I've felt that should happen for a long time. It also will address a lot of issues that others (e.g. Jorg Arndt) have been complaining about for years, e.g. "why doesn't my processor have a 'reverse bit order' instruction?" This will be a hacker renaissance. It does not bother me the slightest bit that there is a big pre-wired processor accompanying the FPGA. The stuff lots of users often want, should be available hardwired for speed, not programmable for slowness. Furthermore, in the event that some FPGA use becomes commonplace idiom, that will hopefully inspire the hardware guys to make it available without the FPGA. My question would be: how should a high level language take advantage of this new hardware capability?
I suspect the latency of accessing the FPGA may preclude use for single-instruction-type things (like finding the index of the 3rd set bit in a 64-bit word, often used in succinct data structures). I believe the FPGA will be more useful in a coprocessor sense. It will be interesting to see how the FPGA ties in to the memory hierarchy, if it does at all. On Mon, Jun 23, 2014 at 10:17 AM, Warren D Smith <warren.wds@gmail.com> wrote:
I think this is a superb development, I've felt that should happen for a long time. It also will address a lot of issues that others (e.g. Jorg Arndt) have been complaining about for years, e.g. "why doesn't my processor have a 'reverse bit order' instruction?" This will be a hacker renaissance.
It does not bother me the slightest bit that there is a big pre-wired processor accompanying the FPGA. The stuff lots of users often want, should be available hardwired for speed, not programmable for slowness.
Furthermore, in the event that some FPGA use becomes commonplace idiom, that will hopefully inspire the hardware guys to make it available without the FPGA.
My question would be: how should a high level language take advantage of this new hardware capability?
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
-- -- http://cube20.org/ -- http://golly.sf.net/ --
If you check these patents, you’ll see that we envisioned the FPGA logic having direct read/write access to the processor register files. This would allow an easy single cycle instruction of the kind you envision. Intel probably didn’t do this, although there is nothing difficult about it. US 5,742,180 and US 6,052,773 On Jun 23, 2014, at 1:28 PM, Tom Rokicki <rokicki@gmail.com> wrote:
I suspect the latency of accessing the FPGA may preclude use for single-instruction-type things (like finding the index of the 3rd set bit in a 64-bit word, often used in succinct data structures). I believe the FPGA will be more useful in a coprocessor sense.
It will be interesting to see how the FPGA ties in to the memory hierarchy, if it does at all.
On Mon, Jun 23, 2014 at 10:17 AM, Warren D Smith <warren.wds@gmail.com> wrote:
I think this is a superb development, I've felt that should happen for a long time. It also will address a lot of issues that others (e.g. Jorg Arndt) have been complaining about for years, e.g. "why doesn't my processor have a 'reverse bit order' instruction?" This will be a hacker renaissance.
It does not bother me the slightest bit that there is a big pre-wired processor accompanying the FPGA. The stuff lots of users often want, should be available hardwired for speed, not programmable for slowness.
Furthermore, in the event that some FPGA use becomes commonplace idiom, that will hopefully inspire the hardware guys to make it available without the FPGA.
My question would be: how should a high level language take advantage of this new hardware capability?
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
-- -- http://cube20.org/ -- http://golly.sf.net/ --
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
"TK" == Tom Knight <tk@ginkgobioworks.com> writes:
TK> If you check these patents, you’ll see that we envisioned the FPGA TK> logic having direct read/write access to the processor register TK> files. This would allow an easy single cycle instruction of the kind TK> you envision. Intel probably didn’t do this, although there is nothing TK> difficult about it. TK> US 5,742,180 and US 6,052,773 The reports I read said that all they are doing is putting an existing fpga chip in the package with their xeon chip, with the two communicating over the same bus multi-socket xeons use between themselves. -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 0x997A9F17ED7DAEA6
Oh my, they are implementing QPI in the FPGA? That's actually pretty cool; it gives you a fully coherent memory interface. QPI is much faster than the FPGA, though, so they probably have a slower wider interface unit of some sort that the FPGA talks to. But it also means you get a really fast interface to memory. That sounds like a lot of fun! (But scary; you can probably really mess up the system by botching the coherence protocol in some subtle ways.) -tom On Mon, Jun 23, 2014 at 3:32 PM, James Cloos <cloos@jhcloos.com> wrote:
"TK" == Tom Knight <tk@ginkgobioworks.com> writes:
TK> If you check these patents, you’ll see that we envisioned the FPGA TK> logic having direct read/write access to the processor register TK> files. This would allow an easy single cycle instruction of the kind TK> you envision. Intel probably didn’t do this, although there is nothing TK> difficult about it. TK> US 5,742,180 and US 6,052,773
The reports I read said that all they are doing is putting an existing fpga chip in the package with their xeon chip, with the two communicating over the same bus multi-socket xeons use between themselves.
-JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 0x997A9F17ED7DAEA6
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
-- -- http://cube20.org/ -- http://golly.sf.net/ --
* Warren D Smith <warren.wds@gmail.com> [Jun 23. 2014 19:44]:
I think this is a superb development, I've felt that should happen for a long time. It also will address a lot of issues that others (e.g. Jorg Arndt)
Umlaut lecture: Joerg Arndt, or Jörg Arndt for the UTF-8 professionals, or jj for people that dislike that particular first name as much as I do.
have been complaining about for years, e.g. "why doesn't my processor have a 'reverse bit order' instruction?" This will be a hacker renaissance.
Sadly, CPU --> FPGA (revbin word) FPGA --> CPU will be the single most slow way to do it. There used to by a socket for the FPU coprocessor (Weitek?), I always have been puzzled why at the time the Weitek was obsolete (Pentium came) that socket wasn't kept (for FPGA or [here goes your ad]).
It does not bother me the slightest bit that there is a big pre-wired processor accompanying the FPGA. The stuff lots of users often want, should be available hardwired for speed, not programmable for slowness.
Furthermore, in the event that some FPGA use becomes commonplace idiom, that will hopefully inspire the hardware guys to make it available without the FPGA.
As it has been said, the FPGA does very well with tasks that (bit-)parallelize well and with tasks where a deep serialization kicks butt. There is a rough similarity between this and both SIMD and GPU (the latter to a greater extend). For running your compiler of (gasp!) Office[TeeEmm], you'll be _very_ hard pressed to win with over your CPU with the FPGA.
My question would be: how should a high level language take advantage of this new hardware capability?
I speculate that this, as an afterthought for most (all??) existing languages, is not going to integrate smoothly. I'd be delighted to hear of languages where this may not come as in "we nailed a plank to a dog to make it an octopus". Best, jj
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
So I see (at least) three types of work the FPGA could do: 1. Complicated work on small data---bitcoin mining and stuff like that. FPGAs tend to do this well; no latency concerns or memory access bottlenecks. 2. Streaming work. Will the FPGA have a streamable interface to the memory hierarchy? If so, all sorts of vector work with big computations are possible. It becomes SSE on steroids (of course subject to memory bandwidth limitations). 3. Random access work. Will the FPGA have random access to the memory hierarchy? And if so, how many memory accesses could be in flight at one time? This would permit another wide range of possible applications. I don't have high expectations for the first iteration of this technology, but I hope to be pleasantly surprised. On Mon, Jun 23, 2014 at 11:09 AM, Joerg Arndt <arndt@jjj.de> wrote:
* Warren D Smith <warren.wds@gmail.com> [Jun 23. 2014 19:44]:
I think this is a superb development, I've felt that should happen for a long time. It also will address a lot of issues that others (e.g. Jorg Arndt)
Umlaut lecture: Joerg Arndt, or Jörg Arndt for the UTF-8 professionals, or jj for people that dislike that particular first name as much as I do.
have been complaining about for years, e.g. "why doesn't my processor have a 'reverse bit order' instruction?" This will be a hacker renaissance.
Sadly, CPU --> FPGA (revbin word) FPGA --> CPU will be the single most slow way to do it.
There used to by a socket for the FPU coprocessor (Weitek?), I always have been puzzled why at the time the Weitek was obsolete (Pentium came) that socket wasn't kept (for FPGA or [here goes your ad]).
It does not bother me the slightest bit that there is a big pre-wired processor accompanying the FPGA. The stuff lots of users often want, should be available hardwired for speed, not programmable for slowness.
Furthermore, in the event that some FPGA use becomes commonplace idiom, that will hopefully inspire the hardware guys to make it available without the FPGA.
As it has been said, the FPGA does very well with tasks that (bit-)parallelize well and with tasks where a deep serialization kicks butt.
There is a rough similarity between this and both SIMD and GPU (the latter to a greater extend).
For running your compiler of (gasp!) Office[TeeEmm], you'll be _very_ hard pressed to win with over your CPU with the FPGA.
My question would be: how should a high level language take advantage of this new hardware capability?
I speculate that this, as an afterthought for most (all??) existing languages, is not going to integrate smoothly.
I'd be delighted to hear of languages where this may not come as in "we nailed a plank to a dog to make it an octopus".
Best, jj
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
-- -- http://cube20.org/ -- http://golly.sf.net/ --
participants (5)
-
James Cloos -
Joerg Arndt -
Tom Knight -
Tom Rokicki -
Warren D Smith