Oleg Zabluda's blog
Tuesday, September 25, 2012
http://en.wikipedia.org/wiki/Project_Nike
http://en.wikipedia.org/wiki/Project_Nike
Soviet analog of Nike seems to be S-200 Angara/Vega/Dubna (SA-5 Gammon http://en.wikipedia.org/wiki/S-200_Angara/Vega/Dubna), which was phasing out starting in the 1980s but passed on to the successor states before the process was completed. It is still deployed, and Ukrainians used it on 4 October 2001 to shot down a Russian Tu-154 airliner en route from Tel Aviv, Israel to Novosibirsk, Russia, killing all 78 people on board.
http://en.wikipedia.org/wiki/Siberia_Airlines_Flight_1812_accident
Project Nike was a U.S. Army project, proposed in May 1945 by Bell Laboratories, to develop a line-of-sight anti-aircraft missile system. The project delivered the United States' first operational anti-aircraft missile system, the Nike Ajax, in 1953. It was two stage missile, using a solid fuel booster stage and a liquid fueled (IRFNA/UDMH) second stage. The missile could reach a maximum speed of 1,000 mph (1,600 km/h), an altitude of 70,000 ft (21 km) and had a range of 25 miles (40 km).
During the early-to-mid 1960s the Nike Ajax batteries were upgraded to the Nike Hercules system. The new missiles had greater range, accuracy, destructive power, and could intercept ballistic missiles. The Hercules had a range of about 100 miles (160 km), a top speed in excess of 3,000 mph (4,800 km/h) and a maximum altitude of around 100,000 ft (30 km). It had solid fuel boost and sustainer rocket motors. The boost phase was four of the Nike Ajax boosters strapped together. In the electronics, some vacuum tubes were replaced with more reliable solid-state components.
The missile also had an optional nuclear warhead to improve the probability of a kill. The W-31 warhead had four variants offering 2, 10, 20 and 30 kiloton yields. The 20 KT version was used in the Hercules system. At sites in the USA the missile almost exclusively carried a nuclear warhead. Sites in foreign nations typically had a mix of high explosive and nuclear warheads. The fire control of the Nike system was also improved with the Hercules and included a surface-to-surface mode which was successfully tested in Alaska. The mode change was accomplished by changing a single plug on the warhead from the "Safe Plug" to "Surface to Air" or "Surface to Surface". The Nike Hercules was deployed starting in June 1958.
Development continued, producing Improved Nike Hercules and then Nike Zeus A and B. The Zeus was aimed at intercontinental ballistic missiles (ICBMs). Zeus, with a new 400,000 lbf (1.78 MN) thrust solid-fuel booster, was first test launched during August 1959 and demonstrated a top speed of 8,000 mph (12,875 km/h). The Nike Zeus system also included the Zeus Acquisition Radar (ZAR), a significant improvement over the Nike Hercules HIPAR system. Shaped like a pyramid, the ZAR featured a Luneburg lens receiver aerial weighing about 1,000 tons. The first successful intercept of an ICBM by Zeus was in 1962, at Kwajalein in the Marshall Islands. The Army continued to develop an anti-ICBM weapon system referred to as "Nike-X" - that was largely based on the technological advances of the Zeus system. Nike-X featured phase-array radars, computer advances, and a missile tolerant of skin temperatures three times those of the Zeus. In September 1967, the Department of Defense announced the deployment of the LIM-49A Spartan missile system, its major elements drawn from Nike X development. In March 1969. the Army started the Safeguard ABM program, which was designed to defend Minuteman ICBMs, and which was also based on the Nike-X system. It became operational in 1975, but was shut down after just three months.
Soviet development of ICBMs decreased the value of the Nike (aircraft) air defense system. Some small-scale work to use Nike Zeus as an anti-satellite weapon (ASAT) was carried out from 1962 until the project was canceled in favor of the Thor based Program 437 system during 1966. In the end, neither development would enter service. However, the Nike Zeus system did demonstrate a hit to kill capability against ballistic missiles during the early 1960s. Nike Hercules was included in SALT I discussions as an ABM. Following the treaty signed during 1972, and further budget reduction, almost all Nike sites in the continental United States were deactivated by April, 1974. Some units remained active until the later part of that decade in a coastal air defense role.
Nike missiles remained deployed around strategically important areas within the continental United States until 1974. The Alaskan sites were deactivated in 1978 and Florida sites stood down during the following year. Although the missile left the U.S. inventory, other nations maintained the missiles in their inventories into the early 1990s and sent their soldiers to the United States to conduct live-fire exercises at Fort Bliss, Texas. Last missile was launched in Italy in 2006.
The best preserved Nike installation is site SF88L located in the Marin Headlands just west of the Golden Gate Bridge in San Francisco, California. The site is a museum, and contains the missile bunkers, and control area, as well as period uniforms and vehicles that would have operated at the site. The site has been preserved in the condition it was in at the time it was decommissioned in 1974.
http://en.wikipedia.org/wiki/Project_Nike
Labels: Oleg Zabluda
Blast from the past: accidentally stumbled upon my original reaction to, now classic, article:
Blast from the past: accidentally stumbled upon my original reaction to, now classic, article:
=== http://tech.groups.yahoo.com/group/CoolTechClub/message/355 ===
From Oleg Zabluda ozabluda@... Fri Jan 07 10:40:57 2005
To: cooltechclub@yahoogroups.com
Message-ID: <20050107184056.87096.qmail@...>
Subject: "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software" from Herb Sutter
I was preaching the same thing ever since I realized it in the summer of 2003. Here is an essentially word-by-word narrative of my thoughts on the subject (minus modern CPU architecture details).
http://www.gotw.ca/publications/concurrency-ddj.htm
=== http://tech.groups.yahoo.com/group/CoolTechClub/message/xxxxx ==
From Anupam Kapoor
i think his assertion regarding "old ways of increasing CPU speeds are hitting practical limits" is flawed. transistor counts are still rising, for example nvidia 6800 has approx 222 M transistors, latest P4's have approx 150M (and most of which is cache).
some more discussion about this at:
http://lambda-the-ultimate.org/node/view/458#comment. lots of goodies here!
==== http://tech.groups.yahoo.com/group/CoolTechClub/message/356 ===
From ozabluda@... Fri Jan 07 12:38:08 2005
Transistor count is rising, but it doesn't lead to meaningfully increasing CPU
speeds on typical applications any more, except for the cache effect. Thus
the move to multicore cpus. It's a better use of transistors. The cache
effect is irrelevant for single-vs-multicore because the same cache can
be used by all cores.
=== http://tech.groups.yahoo.com/group/CoolTechClub/message/357 ===
From: "Vladimir Bronstein"
Actually,
it is an interesting thought experiment - let's say you have a 100
million transistors - what would be the most efficient architecture?
Let's assume that a relatively simple processor itself takes about 2
million transistors - I believe it is a reasonable assumption.
So would it be 50 processors without cache, 25 processors - each with
40KB of cache (each bit takes approx 6 transistors in cache) or 10
processors with...
It definitely depends on the application, but I believe that with proper
level of optimization - transistors are better spent fo processing power
than for cache. This consequence may be derived from the trivial fact
that transistors in cache are used very rarely (only when particular
memory location is read) while in the processor they are used much more
often (ex. ALU is used for a big percentage of all operations). In
addition, those processors should operate at relatively low speeds -
this alleviates the problem of power dissipation - but at the same time
helps to match performance of the processor with the performance of
memory - of course assuming the emergence of multiport memories for such
designs.
Vladimir
=== http://tech.groups.yahoo.com/group/CoolTechClub/message/359 ===
From Oleg Zabluda
Here is the important point that I want to put across, which in my view
clarifies things a lot:
In first approximation, the optimal number of cores is independent on
proportion of transistors used for cache, and cache size is independent
on the number of cores.
Explanation:
Assumption: for server applications the only thing you care about is total
throughput and not latency.
In first approximation the total number of transistors, clock speed,
memory bus speed and width, etc.. is independent on the number of cores
or their architecture. This is true for simple cores.
Given that, you simply come up with the core with the highest ratio of
operations per second to transistors in both the core and cache.
Given the workload, the ratio of transistors in the core to the cache
is function of only the ratio of cpu speed to memory access speed).
Then you try to share the cache amoung as many cores as possible to take
advantage of statistical multiplexing, but not hit too much syncronization
over head. I will call is core cluster. Then you place as many of the
clusters on a die as possible. This gives you the total number of cores.
We know experimentally that even adding the second ALU to a core is a net loss for the ratio. The only thing that is a win is a small pipeline.
For maximal throughput, we should be producing 64-128 core dies right now.
The only reason it is not happening is because software development
is not ready and because it's not clear how to share the cost between
server and desktop hardware.
=== http://tech.groups.yahoo.com/group/CoolTechClub/message/358 ===
From Oleg Zabluda
Let me expand some more. The only really useful thing you can do
with more transistors is to add more calculating units (ALU, FPU, etc..),
that is increase superscalarity. The rest is just baggage to keep
them buzy. So all this insane crap with out-of-order read/write, speculative
execution, branch prediction, register renaming, etc... is just baggage
to keep ALU occupied (and pipelines filled). Guess what? Currently we spend about half transistors on that, and we still can keep second ALU busy maybe 30% of the time.
Unless you hand-optimize specifically with that in mind, which alost nobody
can do because nobody really understands what is going on except in the
simplest cases. The people who can optimize best are Intel ingineers with
diagnostic tools unavailable to others. Even then, the optimizations are not
portable across CPUs and are not always deterministic.
This is the end of the line for that.
Fortunately, on the sever side, almost all major applications are trivially
parallelizable. So there will be divergence of server cpu (larger number of
simple and slow cores with higher total throuput and higher latency) and
desktop cpus (the other way around). If not, like Herb is saying, we can
always run spyware on other cores.
=== http://tech.groups.yahoo.com/group/CoolTechClub/message/360 ===
From Oleg Zabluda
I am looking for a graph of MIPS/clock/transistors. I am too lazy to make one myself[1]. In the process I came across this URL from Patterson
http://www.cs.berkeley.edu/~pattrsn/talks/NAE.ppt
Page 12 shows that even for relatively simple Pentium III with 10M
transistors, about 80% of them are basically wasted as far as throughput
is concerned.
[1] In the absence of a graph, just use your common sense for now.
How many instructions per clock cycle (IPC) can a modern general purpose
CPU do on a typical server workload? Maybe 1.3 if you are lucky. That's
despite the fact that about 80-90%[2] of the transistors in the core (not
in the caches) are dedicated to squizing that extra 30%. It's not going
to improve much. Now that clock speed is stalled for a while, the game
is over.
[2] In fact I am not sure about exact percentage. The total amount
of "wasted" transistors is about 97%. But it's hard for me to tell
exactly how much of it is wasted on cpu-vs-memory speed mismatch and how much is wasted on superscalarity.
=== http://tech.groups.yahoo.com/group/CoolTechClub/message/363 ===
From Sergiy Lozovsky
Hi,
what is so new in this information? I've seen some physical explanation why CPU clock limit is around 4GHz (for ordinary technology, not optics, etc.), long before 2003 (it was in some article about optical or neirocomputers I think). Many companies create applications which use multiple CPUs (computers) - Google for example. There are even open source progects to support clustering, so people who need performance don't rely on one CPU for a long, long time. All supercomputers have multiple CPUs (usually it is a cluster, so each or group of CPUs have their
own RAM, cache, etc.)
=== http://tech.groups.yahoo.com/group/CoolTechClub/message/365 ===
From Oleg Zabluda
CPU's are getting faster in two major ways: clock speed and the number of
transistors. Many differrent people were saying many differrent
things for a long time. Some were saying that clock speed will not be
increasing any more. Others were saying that the number of transistors
will not be increasing any more. It was hard to tell the truth from BS.
Now some of the predictions turned out to be correct. Some people they
knew what they were talking about, and some were right due to stopped
clock effect.
Here is what's new and what's not:
1. The ones that claimed that clock speed will not increase any more, turned out to be correct because it happenned. That's new.
2. Those who claimed that the number of transistors will not be increasing
are wrong so far. At this time it appears to me that the number of
transistors on a die will increase by at least a factor of 100 over the next 10
years using regular lithography. That's now new.
3. Turns out that the number of transistors does not lead to faster
single-core cpus any more. That's new.
4. In order to use all those transistors people are building multi-core cpus.
It appears that the natural progression of things will lead to thousands or
tens-of-thousands 486-Pentuim III class cpus on a die. That's new.
5. Existing software will not be any faster on those multi-core cpus. In fact,
it might be slower, because individual cpus might be slower. Free lunch
is over. That's new.
6. Very few people know how to create software for SMP computers. Nobody knows how to do it for 1024-way SMP machine. Things you don't know are hard. Whether the difficulty is inherent or accidental we will know
after we learn how to do it. The time to learn is now. This is new.
7. Existing SMP machines, clusters, etc, give us a place to start. But
it's nothing compared to where we are heading. If this progresses as I
envision, in 10 years we well have 1024 cpus per die, 4 dies per stack,
4 stacks per chip, 4 chips per blade, 8 blades per box and 4M boxes per
Google. The only reason this will not hapen is if software doesn't keep up.
==========================
Original threaded view:
http://tech.groups.yahoo.com/group/CoolTechClub/messages/355?threaded=1&m=e
http://tech.groups.yahoo.com/group/CoolTechClub/messages/355?threaded=1&m=e
Labels: Oleg Zabluda