# CP 524 o5 1412 <br> Restricted Moduli Symmetrical Signed Residue Addition: Part I 

First M. N. Daikpor, Second O. Adegbenro, Member, IEEE


#### Abstract

In this paper we propose a scheme for the design of a Symmetrical Multiple Valued Logic (SMVL) arithmetic circuit based on the use of restricted moduli Symmetrical Signed digit Residue Number system (SSRNS). Sign and overflow detection as well as magnitude comparison operations are accomplished without recourse to the traditional complex Mixed Radix number System (MRS) conversion process and multiplicative inverse computation. The method is particularly general purpose systems oriented. Addition operations are executed economically, fast and at constant speed.


Keywords - conversion, carry-free, full adder, magnitude, number system, symmetrical-signedresidue, weighted.

## I. Introduction

PRACTICALLY all human endeavours today are Information and Communication Technology Systems (ICTS) driven. These systems rely on high speed, secured and trusted communication gadgets whose operations depend on special classes of very big integer arithmetic circuits. The type of arithmetic operations employed in these devices is not only fixed point but also must be carryfree. The method of number representation is a critical design factor in order to attain the desired high-speed operations of these circuits; designers still have to find optimal ways of managing carry propagation chains [1], [2].

It has long been established that the non-redundant non-weighted Residue Number System (RNS) and the weighted highly redundant Signed-Digit Number System (SDNS), attract fast and efficient arithmetic. [2], [3], [4], [5], [6]. However, RNS arithmetic is beset with complex conversion procedure, difficult sign detection, cost intensive overflow detection and magnitude comparison strategies that require Mixed Radix number System (MRS) and practically none in-existence simple division algorithm. Hence, RNS arithmetic found application in special areas such as: error correction, fault tolerance and digital filter design, power dissipation reduction in VLSI design, fast Fourier transform structures and cryptography [7], [8], [9], [10], [11]. On the other hand, the SDNS

[^0]representation in addition to the above benefits of RNS provides a means of presenting operands at higher radices thereby supporting SMVL systems design which find application in image processing, robotic, and finite field arithmetic [6], [7]. The SDSN drawback is that it requires more character digits for operands representation. Thus in systems of high frequency addition operation this requirement is a setback in terms of circuit power consumption.

It is already established that combined Signed Digit (SD) and Residue Number (RN) arithmetic could result in reduced carry propagation delay and power consumption but the deployment of this in fast arithmetic circuit design is still at its infancy. For example the related arithmetic circuits in [2] are parameterized to provide basic building blocks for binary logic signal VLSI processors. Binary logic VLSI circuit's main objective is miniaturization with improved circuit complexity all at reduced cost. The Multiple Valued Logic (MVL) systems extend the horizon of this objective by virtue of its higher information per line capacity. Very recently, a Symmetrical Multiple Valued Logic (SMVL) developed from Restricted radix-7 Quaternary Signed Digit (Rr7SqSd) number system has been proposed [7]. It is highly probable that an interesting cross line could evolve when the character digit set of a symmetrical SDNS coalesces with a symmetrical signed digit RNS character digit set.

The proposed SSRNS addition scheme widens the scope of RNS arithmetic application by removing the inherent bottle neck in RNS arithmetic operations namely: sign detection, overflow detection and magnitude comparison. This contribution will thus make RNS arithmetic possible for use in general purpose digital systems. The organization of the remaining part of this paper is as follows. The background to this paper is presented in section II. Section III presents the SSRNS and the conversion procedures. SSRNS addition is presented in section IV.

## II. BACKGROUND

Exegesis of RNS arithmetic can be found in popular works such as [2], [4], and [5]. Similar works on Signed-Digit Number System (SDNS) arithmetic can be found in [6]. The concept of restricted radix-7 Symmetrical quaternary Signed-digit number system (Rr7SqSd) and its arithmetic is in the most recent works of [7], [12], [13], and [14]. Hence, by extension restricted radix-5 Symmetrical ternary Signed-digit (Rr5StSd) number system with the character digit set $L \in\{-2,-1,0,1,2\}$ and Restricted radix-3

Symmetrical binary Signed-digit (Rr3SbSd) number system exist with the character digit set $L \in\{-1,0,1\}$.

## III. THE SSRNS

The number system we are proposing here provides a cross over point between the SDNS and the RNS. It therefore possesses the properties of the two number systems as it is composed of symmetrical signed residues numbers $x_{k}$, of the moduli set $p_{1} p_{2} . . p_{k} . . p_{n}$. A symmetrical signed residue number $x_{k}$, is a unique representation of the signed integer $x$, such that for a set of unsigned relative prime moduli set $P_{k} ; k=1,2, . ., n$ the signed integer $x$ can be described as $x=\left(x_{1}\left|x_{2}\right| \cdot \cdot\left|\cdot x_{n}\right|\right) \operatorname{SSRNS}\left(P_{1}\left|P_{1}\right| \ldots\left|\cdot P_{N}\right|\right)$ for $\quad$ all $|x| \in[-M, M], \quad$ where: $x_{k}=x \bmod p_{k}$, $|M|=\prod_{1}^{n} p_{k}$ and if $\zeta_{k}$ is a restrictor $\leq\left|p_{k}-1\right|$ of modulo $p_{k}$ residues then the SSRNS digit $x_{p_{k}}$ takes the values

$$
x_{p_{k}} \in\left(\left(\overline{p_{k}-\left(\zeta_{k}-1\right)}\right), \overline{1}, 01,,\left(p_{k}-\left(\zeta_{k}-1\right)\right)\right)
$$

Now for bridging Rr7SqSd and SSRNS we take as an example $k=3$ and the relative prime moduli set as $(7|5| 3 \mid)$ which provide a unique representation of any signed integer $x$, in the dynamic range $-105<x<105$. It then follows that with $\left(\zeta_{7}, \zeta_{5}, \zeta_{3}\right)=(3,2,1)$

$$
\left.\begin{array}{l}
x_{7_{k}} \in\{-3,-2,-1,0,1,2,3\}  \tag{3}\\
x_{5_{k}} \in\{-2,-1,0,1,2\} \\
x_{3_{k}} \in\{-1,0,1\}
\end{array}\right\}
$$

The following procedures apply for the specified cases.
A. Decimal to SSRNS conversion

1. Obtain the $\operatorname{radix}-M / 2$ form $X^{*}=X^{*}{ }_{m-1} X^{*}{ }_{m-2} \ldots X^{*}{ }_{0}$, of the number.
2. For each $X^{*}{ }_{i}$, compute the RNS residue $T_{i_{k}}=X^{*}{ }_{i} \bmod p_{k}$.
3. Represent $T_{i_{k}}$ in SSRNS i.e. $t_{i, l}=f\left(T_{i_{k}}\right)$ and $l \in\{1,2,3\}$
Hence, $X=\left\{X_{i}^{*}\right\}=\left(t_{i, 1} t_{i, 2} t_{i, 5}\right) \operatorname{SSRNS}(7|5| 3)$, e,g $X=65 ; X_{i}=(2|1| \overline{1}) S S R N S(7|5| 3)$.

| B. Rr 7 SqSd to SSRNS |  |  |  |
| :---: | :---: | :---: | :---: |
| 1.Partition | the | given | number |
| $X_{R r 7 S q S d}=x_{R j-1} x_{R j-2} \ldots x_{R j 0} \quad$ where |  |  |  |
| $x_{R j-i} \in\{-3,-2,-1,0,1,2,3\}$, from the right into |  |  |  |
| $m$ | groups | of | Rr7SqSd |

$x_{R j, 3} x_{i j 2} x_{j, 1}=\left(\alpha_{j} \beta_{j} \lambda_{j}\right)$,
$-172<\alpha_{j}, \beta_{j}, \lambda_{j}<172$
$\alpha, \beta, \lambda \in\{-3,-2,-1,0,1,2,3\}$.
2.Compute the SSRNS equivalent $t_{R 1 j}, t_{R 2 j}, t_{R 3 j}$ of the $\operatorname{Rr} 7 \mathrm{SqSd} j^{\text {th }}$ partition $\alpha_{j} \beta_{j} \lambda_{j}$ as follows;

$$
\left.\begin{array}{l}
t_{R j 1}=\lambda_{j}  \tag{4}\\
t_{R 2 j}=\left(7\left(7 \alpha_{j}+\beta_{j}\right)+\lambda_{j}\right) \bmod p_{2} \\
t_{R 3 j}=\left|\left(\alpha_{j}+\beta_{j}+\lambda_{j}\right)\right| \bmod p_{3}
\end{array}\right\}
$$

e.g.
$1 \overline{3} 23=213_{10}=2 \mid 3_{\text {Radix- }-M / 2}=(22 \overline{1} \mid 30 \overline{1}) \operatorname{SSRNS}(7|5| 3)$.
C. SSRNS to decimal

Rather than using the Mixed Radix number System (MRS) , multiplicative or Look Up Tables (LUT) the following formular is developed heuristically. If $p_{1} p_{2} \ldots p_{k}$ is a set of relatively prime moduli in the interval $\left(-\frac{M}{2}, \frac{M}{2}\right) M=2 \prod_{k=1}^{l} p_{k}$ then there exist the set of integers $u_{1,}, \ldots, u_{k},-\frac{M}{2} \leq u_{i} \leq \frac{M}{2}$ such that $u_{i} \bmod p_{j}=\left\{\begin{array}{l} \pm 1 ; \text { if } i=j \\ 0 ; \text { if } i \neq j\end{array}\right.$
Satisfying the equation (10)
$u_{1}+\ldots .+u_{l}=\frac{M}{2}+1$ or $u_{1}-\frac{M}{2}+$
$. .+u_{l}-\frac{M}{2}=-(M-1)$;for $u_{1} \bmod p_{1}=$
$\ldots .=u_{l} \bmod p_{l}=1$
Similarly,
$-u_{1}-\ldots-u_{l}=-\left(\frac{M}{2}+1\right)$ or $\frac{M}{2}-u_{1}+$
$. .+\frac{M}{2}-u_{l}=M-1 ;$ for $u_{1} \bmod p_{1}=$

$$
\begin{equation*}
. .=u_{l} \bmod p_{l}=-1 \tag{7}
\end{equation*}
$$

If the decimal equivalent of the SSRNS digits $t_{1} t_{2} \ldots t_{l}$ is $\varphi_{i}-157 \leq \varphi_{i} \leq 157$ and denoting $\mu_{k}=u_{k} \bmod p_{k}$, $k=1,2, . ., l$ then,
$\left(\mu_{1} \prod_{h=2}^{l} p_{h}\right) \bmod p_{1}=1 \Rightarrow \mu_{1}=1$
$\qquad$
$\left(\mu_{m} \prod_{h^{\prime}=1}^{m-1} p_{h^{*}} \prod_{h=m+1}^{l} p_{h}\right) \bmod p_{m}=1 \Rightarrow \mu_{m}=1$
$\left(\mu_{l} \prod_{h=1}^{l-1} p_{h}\right) \bmod p_{l}=1 \Rightarrow \mu_{l}=l-1$
provided: $u_{k}=\mu_{k} \frac{M / 2}{p_{k}}$. Consequently,
$\varphi=\sum_{k=1}^{l} \mu_{k} \frac{M / 2}{p_{k}} t_{k}=\sum_{k}^{l} u_{k} t_{k}$
For $l_{\text {max }}=3 ; \mu_{1}=1, \mu_{2}=1$ and $\mu_{3}=2$ therefore; $u_{1}=15, u_{2}=21$ and $u_{3}=70$. Hence,

$$
\begin{equation*}
\varphi_{i}=\left(15 t_{1}+21 t_{2}+70 t_{3}\right) \tag{10}
\end{equation*}
$$

Long SSRNS digits strings are partitioned into $g$ groups of 3-SSRNS digits and

$$
\begin{equation*}
\varphi=\sum_{i=0}^{g-1} \varphi_{i}=\sum_{i=0}^{g-1}\left(15 t_{1 i}+21 t_{2 i}+70 t_{3 i}\right)(M / 2)^{i} \tag{11}
\end{equation*}
$$

## IV. SSRNS ADDITION SCHEME

In this contribution we are more concerned with very big integer operand SSRNS addition operations that are operands are converted to radix $-M / 2$ number system character digits and subsequently to SSRNS. We describe the addition operation by equation (12)

$$
\begin{align*}
& \delta_{i p}=\left\{\begin{array}{ll}
\overline{1} & ; \text { if } \alpha_{i p}+\beta_{i p}<\bar{a}_{p} \\
1 & \text {;if } \alpha_{i p}+\beta_{i p}>a_{p} \\
0 & ; \text { otherwise }
\end{array}\right\}  \tag{12}\\
& \gamma_{i p}=\alpha_{i p}+\beta_{i p}-p \delta_{i p} \\
& a_{p} \in(3,2,1) R N S(7|5| 3)
\end{align*}
$$

Where $\alpha_{i p_{k}}, \beta_{i p_{k}}$ are operand pairs and $\gamma_{i p_{k}}$ sum of the signed radix $-M / 2$ character digits in the SSRNS representation addition operation. $\alpha_{7 i,}, \beta_{i 7}, \gamma_{7 i}=\left\{x_{7_{k}}\right\}$, $\alpha_{5 i}, \beta_{5 i}, \gamma_{5 i}=\left\{x_{5_{k}}\right\}$, and $\alpha_{3 i}, \beta_{3 i}, \gamma_{3 i}=\left\{x_{3_{k}}\right\}$ as earlier defined in equation (3). There are inherent problems here, i) since the addition is radix $-M / 2$ pair-wise there is danger of the weighted number systems equivalent of operand's radix $-M / 2$ character digit-partitions to acquire varying signs that may lead to inaccurate computed final result. ii) The magnitude $\psi_{o}$,of any operand participating in an SSRNS arithmetic lies in the dynamic range $-M / 2<\psi_{0}<M / 2$ just as
that of the computed sum $\varphi_{s}$ lies in the interval $[-M, M]$. The implication of these observations is that there is bound to be numerical trimming of any computed sum magnitude that lies outside the SSRNS dynamic range. This of course can bring about fictitious sums of the addition operation. Which means long SSRNS digit input string addition schemes must have mechanism for identifying/detecting radix- $M / 2$-partition sign, operand-pair and result pairs parity (odd or even) status, extend of partition sum overflows detection and it's reporting. Existing methods of solving these difficult-tohandle problems in the RNS domain [8], [9] as earlier enunciated require conversion of the RNS operands to MRS domain for computability as well as multiplication inverse computations.

In this paper, these problems are again solved heuristically in a very simple way using the Rr7SqSd addition. The Radix $-M / 2$ character digits input stream of operands appear both in $3-\mathrm{Rr} 7 \mathrm{SqSd}$ and in the corresponding 3-SSRNS digits packet streams. The 3Rr 7 SqSd operand-pairs are added together per Rr 7 SqSd in parallel in a four level Rr7SqSd addition operation described by equation (13)

$$
\begin{align*}
& z_{i}=\alpha_{i}+\beta_{i} \\
& \delta_{\mathrm{i}}=\left\{\begin{array}{c}
-1 ; \text { if } z_{i}<\bar{a} \\
1 ; \text { if } z_{i}>a \\
0 ; \text { if otherwise }
\end{array}\right\}  \tag{13}\\
& \begin{array}{l}
\tau_{\mathrm{i}}=z_{\mathrm{i}}-7 \delta_{\mathrm{i}} \\
\lambda_{\mathrm{i}}=\tau_{\mathrm{i}}+\delta_{\mathrm{i}-1}
\end{array}
\end{align*}
$$

where: $\quad \alpha_{i}, \beta_{i}, \tau_{i}, \lambda_{i}\{-3,-2,-1,0,1,2,3\}$ and $\delta_{i}, \delta_{i-1}$ $€\{-1,0,1\}$. The addition accomplishes the magnitude comparison aspect. The sign of the $i^{\text {th }}$ radix $-M / 2$ character digit pair Rr 7 SqSd addition operation is the sign of the most significant Rr 7 SqSd . of that partition and the corresponding outputs $\lambda_{i=}=\delta_{i}$, $\lambda_{i, 3}, \lambda_{i, 2}$ and $\lambda_{i, 1}$ facilitates overflow discussion making. The actual radix $-M / 2$ character digit-pair addition operation is executed in SSRNS domain modulo-wise using equation (16). Overflows appear just as inter-radix carries.

One immediate area of application for the SSRNS arithmetic circuits is in SMVL systems where the processing signal profile, rather than the binary logic, is the Rr7SqSd logic levels. The Rr7SqSd number system being weighted is susceptible to carry propagation chains that adversely affect system operation speed. To avoid this problem, we embed SSRNS addition procedure in Rr 7 SqSd addition operation. Long Rr 7 SqSd -input-string operands $\quad x_{i}=x_{i_{3}}, x_{i_{2}}, x_{i_{1}} \quad, \quad y_{i}=y_{i_{3}}, y_{i_{2}}, y_{i_{1}}$ are divided into $m$-groups of 3 -Rr7SqSd each with $-171 \leq x_{i}, y_{i} \leq 171$. Let $-342 \leq z_{i} \leq 342$, be the
sum of the $i^{\text {th }}$ partition addition be such that $z_{i}=z_{i 3} z_{i 2} z_{i 1}, \quad z_{i l} \in\{-3,-2,-1,0,1,2,3\} \quad l \in\{3,2,1\}$. In the SSRNS domain, two radix $-M / 2$ character digits are required to represent $x_{i}, y_{i}$ and $z_{i}$. Taking $\left(\theta_{0 i} \theta_{1 i}\right),\left(\vartheta_{0 i} \vartheta_{1 i}\right)$ and $\left(\gamma_{0 i} \gamma_{1 i}\right)$ as $x_{i}, y_{i}$ and $z_{i}$ then $-(M / 2-1) \leq \theta_{1 i}, \vartheta_{1 i}, \gamma_{1 i} \leq(M / 2-1)$,
$-1 \leq \theta_{0 i} \vartheta_{0 i} \leq 1$ and $-3 \leq \gamma_{0 i} \leq 3$. To reduce cost and enhance operation speed operands are first represented in radix $-M / 2$ character digits and then each radix- $M / 2$ character digit is converted to both Rr7SqSd and SSRNS presentations. In terms of radix- $M / 2$ the addition process is represented by equations (17), (18) and (19)

$$
\begin{gather*}
\gamma_{1 i}=\theta_{1 i}+\vartheta_{1 i}  \tag{17}\\
c_{i}=\left\{\begin{array}{l} 
\pm 1 ; \text { if } \gamma_{1 i} \geq M / 2 \text { or }= \pm(M / 2-1), \\
\gamma_{1 i}= \pm M / 2-1, \gamma_{1 i-1} \text { and } c_{i-2}= \pm 1 \\
0 \quad ; \text { Otherwise }
\end{array}\right. \\
\gamma_{1, i+1}=\left\{\begin{array}{l}
\gamma_{1 i+1} \mp l M / 2 ; \text { if } c_{i}= \pm 1 \\
\gamma_{1 i+1} ; \text { otherwise }
\end{array}\right.
\end{gather*}
$$

Where $\quad l \in\{-3,-2,-1,0,1,2,3\} \quad$ corresponding
to $\{-315,-210,-105,0,105,210,315\}$ and $\theta_{1 p} \in\left\{\alpha_{7 i}, \alpha_{5 i}, \alpha_{i 3}\right\}, \vartheta_{1 p} \in\left\{\beta_{7 i}, \beta_{5 i}, \beta_{3 i}\right\}$.

Computational experiments conducted showed $50 \%$ operation execution speed increment using this detection-compare-migrate-and-return for alignment approach. There is a $75 \%$ increase in speed when magnitude alignment is also carried out in the SSRNS domain though with a higher complexity trade-off. The approach does not need a Chinese Remainder Theory (CRT) and the Extended Euclidean Algorithm (EEA) for backward conversion operation.

## REFERENCES

[1] R. Zimmermann, "Lecture notes on computer arithmetic: principles, architectures and VLSI design," Integrated system Laboratory, Swiss Federal Institute of Technology (ETH), CH-8092 Zurich, Switzerland. March 16, 1999.
http://www.iis.ee.ethz.ch/
[2] A. Lindstrom, M. Nordseth, L. Bentsson and A. Omondi,"Arithmetic circuits combing residue and signed-digit representations," in Proc. of the $8^{\text {th }}$ AsiaPacific Computer Systems Architecture conference (ACSAC"2003) Aizu-Wakamatsu City Japan, Sept.

2003, Published by Springer-verlag in lecture Notes in Computer Science(LNCS) Vol. 2823.
[3] S. Shieh and C. Wu, "Asymmetric high-radix signeddigit number system for carry-free addition- Accepted for publication," Journal of Information Science and Engineering 19, February 24, 2003, pp. 1015-1039.
[4] B. Parhami, "RNS Representations with redundant residues- Invited Paper," IEEE (2001) 0-7803-7147X/01/\$10.00, pp. 1651-1655.
[5] S. Wei, "Number conversions between RNS and mixed-radix number system based on modulo ( $2^{\mathrm{p}}-1$ ) signed-digit arithmetic," Proc. SBCCT'05 September 4-7, 2005, Florianopolis, Brazil, ACM 1-59593-1740/05/0009, pp. 160-165.
[6] M. Kameyama and Tatsuo Higuchi, "Design of a radix- 4 signed-digit arithmetic circuit for digital filters," Proc. of the $12^{\text {th }}$ IEEE International Symposium on multiple-Valued Logic, North-western University, Evanston, IL. USA 1980, pp 272-277.
[7] M.N Daikpor and O. Adegbenro," Radix - 7 signed digit element finite field arithmetic," Proc of the IEEE International conference on signal processing and communication, Dubai November 2007, pp 708-711.
[8] C. K. Cheng, "CSE: Computer arithmetic algorithms and hardware design lecture 2 : redundant and residue number systems," 2006, http://cseweb. ucsd.edu/classes/fa06/cse246/lect2.pdf.
[9] N. Stamenkovic, "Digital FIR filter architecture based on residue number system," FACTA
UNIVERSITATIS (NIS) Ser.:ELEC.ENERG. vol.22, N0. 1, April 2009, pp 125-140.
[10] M. Hosseinzadeh, S. Jafarali and K Navi., "A novel multiple valued logic OHRNS modulo $\mathrm{r}^{\mathrm{n}}$ adder circuit," Proc of World academy of Science, Engineering and Technology, Vol. 25 November 2007 ISSN 1307-6884, pp 128-132.
[11] F. Barsi and P Maestrini, "Error correcting properties of redundant residue number systems," IEEE Transactions on Computers, Vol. C-22.N0. 3, March 1973, pp 307-314.
[12] B.Tseng, G.A. Jullien and W. Miller, "Implementing FFT structures Using the Residue Number System," IEEE Transactions on Computer, vol. C-28, N0 11, November 1979, pp $831-845$.
[13] F. J. Taylor, "A VLSI Residue Arithmetic Multiplier,"

IEEE Transactions on Computers, Vol. C-31, N0. 6, June 1982, pp 540-546.
[14] V. Parlours and T. Strouraitis, "Novel High-radix Residue Number System Architectures, " IEEE transaction circuits and systems-II Analog and Digital Processing, Vol. 47, N0. 10 October 2000, pp 1059-1073.
[15] M.N. Daikpor and O. Adegbenro, "Efficient carryfree Rr7SqSd addition algorithm". Proc of the $3^{\text {rd }}$ International Conference on Emerging Trends, Research directions and Training Requirements of the $21^{\text {st }}$ century Electrical and Electronics Engineering, University of Lagos $.22^{\text {nd }}$ to $24^{\text {th }}$, July 2009. Pp 102-107.

## 2011 18 ${ }^{\text {th }}$ International Conference on Systems, Signals and Image Processing

## PROCEEDINGS IWSSIP 2011



Edited by:
Branka Zovko-Cihlar Narcis Behlilović
Mesud Hadžialić

## Author Index

> Adegbenro, Oluwole ..... 303
Agić, Darko ..... 205
Ahic-Djokic, Melita ..... 51
Akinola, Mobolaji O. ..... 97
Alcaim, Abraham ..... 299
Al-Jobouri, Laith. ..... 25
Allili, Madjid ..... 273
Al-Naser, Mohammad ..... 257
Anacleto, Harry. ..... 299
Armingol, José María ..... 63
Arsene, Octavian ..... 193
Avdic, Elma ..... 357
Avramovic, Aleksej ..... 105
Badnjević, Almir ..... 417
Balakrishna, L ..... 147,155
Banaeyan, Majid ..... 71
Barbarien, Joeri ..... 197
Becerikli, Yasar ..... 123
Beganović, Emir. ..... 417
Begovic, Pamela ..... 357
Behlilović, Narcis. ..... 221,315, 357
Beniak, Marián ..... 367
Bogdanov, Momcilo ..... 177, 235
Bogdanova, Sofija ..... 177, 235
Bolaño, Juan Antonio ..... 63
Borchartt, T. B. ..... 279
Bozek, Jelena ..... 247
Brandão, Alexandre S. ..... 113
Çalhan, Ali ..... 21
Cañas, Rubiel Vargas ..... 59
Cataldo, Edson ..... 113
Çeken, Celal ..... 21
Cerny, Tomas ..... 85
Chertov, Oleg ..... 47
Conci, A. ..... 131, 279
Conci, Aura ..... 189, 229
Coradine, Luis ..... 181
Cornelis, Jan ..... 197
Coronado, Gustavo A. Peláez ..... 63
Costa, Yandre M. G. ..... 151
Cristea, Paul Dan ..... 193
da Silva, Dirceu G. ..... 299
da Silva, Lincoln Faria ..... 189
Daikpor, Michael Naseimo ..... 303
de la Escalera, Arturo ..... 63
de Melo, R. H. C. ..... 279
Debono, Carl J. ..... 89
Dedić, Renato ..... 273

# Restricted Moduli Symmetrical quaternary Signeddigit Addition: A design Implementation Overview 

Michael Naseimo Daikpor<br>Department of Electrical and Electronics Engineering,<br>University of Lagos - Akoka,<br>Lagos, Nigeria.<br>e-mail: michaelnaseimodaikpor@rocketmail.com

Oluwole Adegbenro, Member IEEE<br>National Centre for Energy Efficiency and conservation, University of Lagos - Akoka,<br>Lagos, Nigeria.<br>e-mail: wole_adegbenro@yahoo.com


#### Abstract

In this paper we present an overview of design implementation of a Symmetrical Multiple Valued Logic (SMVL) arithmetic circuit based on the use of restricted moduli Symmetrical Signed Residue Number System (SSRNS). Restricted radix-7 Symmetrical quaternary Signed digit (Rr7SqSd) T-gate based interconnections and full adders are used to implement sign detection, overflow detection and magnitude comparison without recourse to Mixed Radix number System (MRS) converters design or Chinese Remainder Theorem (CRT) computation.


Keywords-component; circuit design; full adders; magnitude generator; signed residue; T-gates

## I. INTRODUCTION

High speed arithmetic circuits are the kernel of the emerging information and secured communication systems. They operate in such manners that very big integer operand chunks are fetched and executed in one clock cycle. This requires precise and concise arithmetic operation carry propagation chain reduction techniques [1],[2] during the VLSI circuit design stage. The developments of quantum functional semiconductor devices such as the Resonant Tunneling Device (RTD) and the Single Electron Tunneling Transistor (SETT) as well as the SMVL concept [3] are some of the other current research directions. An absolute carry-free arithmetic circuit that can bring about ultra-high speed system operation is the ultimate target.

It is already established that the Residue Number System (RNS) and the Signed-Digit Number System (SDNS) separately provide efficient arithmetic and parallel system architecture, necessary for high speed operations [4],[5],[6],[7] . However, the RNS is beset with difficult sign and overflow detection as well as complex magnitude comparison procedure and SDNS has the drawback of requiring more character digits to represent operands.

In the part I of this work [8] we suggested the cross over point for RNS and SDNS arithmetic in a way that resulted in tremendous increase in the addition operation execution speed. In this part II, we present the hardware implementation considerations of the scheme. This paper is organized as follows. In section II we present a brief review of the SSRNS
addition scheme. The circuit realization is presented in section III while performance evaluation is presented in section IV.

## II. SSRNS ADDDTION REVIEW

The SSRNS is composed of symmetrical signed residue numbers $x_{k}$, of the moduli set $p_{1} p_{2} . . p_{k} . . p_{n}$. A symmetrical signed residue number $x_{k}$, is a unique representation of the signed integer $x$, such that for a set of unsigned co-prime moduli set $p_{k} ; k=1,2, . ., n$ the signed integer $x$ can be described as

$$
\begin{equation*}
x=\left(x_{1}\left|x_{2}\right| \ldots \mid x_{n}\right) \operatorname{SSRNS}\left(p_{1}\left|p_{2}\right| \ldots \mid p_{n}\right) \tag{1}
\end{equation*}
$$

for all $|x| \in[-M, M]$, where: $x_{k}=x \bmod p_{k}$,
$|M|=\prod_{1}^{n} p_{k}$. If $\zeta_{k}$ is a restrictor $\leq\left|p_{k}-1\right|$ of modulo
$p_{k}$ residues then the Symmetrical Signed Residue Digit
(SSRD) $x_{p_{k}}$ takes the values

$$
\begin{equation*}
x_{p_{k}} \in\left\{\left(\overline{p_{k}-\left(\zeta_{k}-1\right)}\right), ., \overline{1}, 0,1, \ldots,\left(p_{k}-\left(\zeta_{k}-1\right)\right)\right\} \tag{2}
\end{equation*}
$$

For the co-prime moduli set $(7|5| 3 \mid)$ a unique representation of any signed integer $x$ in the dynamic range $-105<x<105$ can obtained. If the restrictors $\left(\zeta_{1}, \zeta_{2}, \zeta_{3}\right)=(3,2,1)$ corresponds to the moduli set $\{7,5,3\}$ then

$$
\left.\begin{array}{l}
x_{7_{k}} \in\{-3,-2,-1,0,1,2,3\}  \tag{3}\\
x_{5_{k}} \in\{-2,-1,0,1,2\} \\
x_{3_{k}} \in\{-1,0,1\}
\end{array}\right\}
$$

In our design, sign and overflow detection as well as magnitude comparison operations are performed in Rr 7 SqSd environments. The exact addition operation is conducted in SSRN domain. Operands pairs are initially converted into packets of radix $-M / 2$ character digit strings. Each
radix - $M / 2$ character digit is then converted to Rr 7 SqSd and SSRNS forms of number representations that yield 3Rr 7 SqSd and 3-SSRNS digits per partition respectively. The addition operation is performed simultaneously on each radix $-M / 2$ character digit pair now appearing as 3 -SSRNS digit partition-pair. Equations (4),(5), and (6) describe the addition process in radix $-M / 2$ character digit, Rr 7 SqSd and SSRNS respectively.

$$
\left.\begin{array}{c}
\gamma_{i}=\theta_{i}+\vartheta_{i}  \tag{4}\\
c_{i}=\left\{\begin{array}{c}
\mp 1 ; \text { if } \gamma_{i} \geq M / 2 \text { or } \gamma_{i}= \pm(M / 2-1), \\
\gamma_{i+1}= \pm M / 2-1 \text { and } c_{i-1}= \pm 1 \\
0 ; \text { otherwise. } \\
\gamma_{i+1}=\left\{\begin{array}{ll}
\gamma_{i+1} \mp l M / 2 ; \text { if } c_{i}= \pm 1 \\
\gamma_{i+1} ; \text { otherwise. }
\end{array}\right\}
\end{array}\right\}, 4(4 \mathrm{c})
\end{array}\right\}
$$

$\gamma_{i} \in\{-315,-314, \ldots \ldots,-1,0,1, \ldots, 314,315\} \quad \theta_{i}, \vartheta_{i} \quad$ are operand pairs in radix $-M / 2$ but in Rr 7 SqSd or SSRNS forms of representation. $\theta_{p_{i}} \in\left\{\alpha_{7_{i}}, \alpha_{5_{i}}, \alpha_{3 i}\right\}$, $v_{p_{i}} \in\left\{\beta_{7_{i}}, \beta_{5_{i}}, \beta_{3_{i}}\right\}$ and $l \in\{-3,-2,-1,0,1,2,3\}$ is the modulo $M / 2$ multiplier corresponding to

$$
\left.\begin{array}{rl}
l M / 2 \in\{-315,-210,-105,0,105,210,315\} \\
z_{i} & =\alpha_{i}+\beta_{i}  \tag{5}\\
\delta_{i} & =\left\{\begin{array}{c}
-1 ; \text { if } z_{i}<\bar{a} \\
1 ; \text { if } z_{i}>a \\
0 ; \text { if otherwise }
\end{array}\right\} \\
\tau_{i} & =z_{i}-7 \delta_{i} \\
\lambda_{i} & =\tau_{i}+\delta_{i-1}
\end{array}\right\}
$$

Where i) $\alpha_{i}, \beta_{i}, \tau_{i}, \lambda_{i} \in\{-3,-2,-1,0,1,2,3\}$ and ii) $\delta_{i}, \delta_{i-1} \in\{-1,0,1\}$. Again with $\alpha_{i_{p}}, \beta_{i_{p}}$ as the SSRNS operands pairs and $\gamma_{i_{p}}$ sum of the signed radix $-M / 2$

$$
\left.\begin{array}{l}
\delta_{i_{p}}=\left\{\begin{array}{c}
-1 ; \text { if } \alpha_{i_{p}}+\beta_{i_{p}}<\bar{a}_{p} \\
1 ; \text { if } \alpha_{i_{p}}+\beta_{i_{p}}>a \\
0 \text {; if otherwise. }
\end{array}\right\}  \tag{6}\\
\gamma_{i_{p}}=\alpha_{i_{p}}+\beta_{i_{p}}-p \delta_{i_{p}}
\end{array}\right\}
$$

character digits in the SSRNS representation. $\alpha_{7_{i}}, \beta_{7_{i},}, \gamma_{7_{i}} \in\left\{x_{7_{k}}\right\} \quad, \quad \alpha_{5_{i}}, \beta_{5_{i}}, \gamma_{5_{i}} \in\left\{x_{5_{k}}\right\} \quad$ and $\alpha_{3_{i}}, \beta_{3_{i},} \gamma_{3_{1}} \in\left\{x_{3_{k}}\right\}$ as defined in (3).

## III. THE SSRNS VLSI CIRCUIT

The SSRNS VLSI circuit is shown in fig. 1. All the functional blocks are implemented on Complementary Pass (CP) gate derived Rr 7 SqSd T -gates [3]. The radix - $M / 2$ character digit-packet-streams are passed, operand pair-wise simultaneously to the Rr7SqSd and SSRNS converters. The first SSRNS adder performs the SSRNS addition on the input operand while the second handles possible inter radix $-M / 2$ carry addition operation. Inter radix $-M / 2$ carry digit generation and the overflow evaluation operations are accomplished in the 4 -level $3,2,1,1-\mathrm{Rr} 7 \mathrm{SqSd}$ adders, the magnitude generator cascade and the wired AND-OR connection. The sign of a radix $-M / 2$ character-digit partition addition operation result is simply the sign of the most significant digit Rr 7 SqSd addition operation's sum. The cascade of $3,2,1,1-\mathrm{Rr} 7 \mathrm{SqSd}$ adders execute the operations $\alpha_{3_{i}}+\beta_{3_{i}}, \alpha_{2_{i}}+\beta_{2_{i}}, \alpha_{1_{i}}+\beta_{1_{i}}$ and $\alpha_{1_{i}}+\beta_{1_{i}}$ according to equation (5).

The dynamic range of input operands to these adders is $-(M / 2-1) \leq x \leq(M / 2-1)$ which in the Rr7SqSd number system is $\overline{2} \overline{1} 1 \leq x \leq 21 \overline{1}$. The magnitude of a partition sum outside this range indicates an overflow. The least value of this sum is $\pm M / 2 \equiv(210, \overline{2}, \overline{1}, 0)_{R r 7 \text { SGSd }}$ and the maximum is $\pm(171+171+1=343) \equiv(1000, \overline{1} 000)_{\text {Rr 7SqSd }}$. The output of Rr 7 SqSd adders cascade $w_{i}, x_{i}, y_{i}$ and $z_{i}$ goes into the 'magnitude generator' where equations (7) and (8) are implemented.

$$
\begin{align*}
& \omega_{i}^{*}=\left\{\begin{array}{l} 
\pm 1 ; \text { if } \gamma_{i}^{*} \geq \pm M / 2 \\
0 ; \text { otherwise. }
\end{array}\right.  \tag{7}\\
& \omega_{i-1}=\left\{\begin{array}{c} 
\pm 1 ; \text { if } \omega_{i-2}^{*}= \pm 1, \gamma_{i}=(M / 2-1) \\
\text { and } \gamma_{i-1}= \pm(M / 2-1) \\
0 ; \text { otherwise }
\end{array}\right. \tag{8}
\end{align*}
$$

The magnitude generator thus generates four outputs namely: $\omega_{1 i}^{*}$ for $\gamma_{i}^{*} \geq|M / 2|, \omega_{2 i}^{*}$ for $\gamma_{i}^{*}=|M / 2-1|, \omega_{3 i}^{*}$ for
$\gamma_{i}^{*}<|M / 2-1|$ and $\omega_{4 i}^{*}$ for "sign of the addition" (sgn) operation. The first two outputs can be described by equations (9) and (10) respectively.

$$
\begin{align*}
& \omega_{1 i}^{*}=\left\{\begin{array}{l} 
\pm 1 ; \text { if } w_{i}= \pm 1 \\
0 ; \text { otherwise. }
\end{array}\right.  \tag{9}\\
& \omega_{2 i}^{*}=\left\{\begin{array}{l} 
\pm 1 ; \text { if }\left|x_{i}\right| \leq 2,\left|y_{i}\right| \leq 1, z_{i} \geq 0 \text { or } \\
\left|x_{i}\right|=\left|y_{i}\right| \geq 2 \text { or } w_{i}=0,\left|x_{i}\right|=3 \\
0 ; \text { if otherwise. }
\end{array}\right. \tag{10}
\end{align*}
$$

Detailed explanation of the $\operatorname{Rr} 7 \mathrm{SqSd}$ full adder circuit can be found in [9]. The same procedure is used to design the $\operatorname{SSRNS}(7|5| 3)$ adder.

## IV. EVALUATION OF THE EXPECTED PERFORMANCE

A major cause of information delay in VLSI circuits is the signal propagation time between interconnected subunits. In our proposal, we adopted the most popular solution to this problem which is embracing an interconnection-free architecture by integrating several subunits into a single functional block. For example, the $\operatorname{SSRNS}(7|5| 3)$ adder block consists of moduli 7,5 and 3 SSRNS adders. Similarly, the $3-R r 7 S q S d$ adder unit consists of 3 separate $\operatorname{Rr} 7 S q S d$ full adders. Consequently, the signal propagation delay time metric is used to evaluate the expected performance of the proposed VLSI circuit.

Let $\tau_{R r 7}, \tau_{M G}, \tau_{S S R N S+}, \tau_{R r 7_{+}}, \tau_{A N D \_O R}$ be the signal delay times of a 7 -value CP-gate derived T -gate, the magnitude generator, $\operatorname{SSRNS}(7|5| 3)$ and $\operatorname{Rr} 7 \mathrm{SqSd}$ full adders respectively and the wired AND-OR interconnection. The ratio of the number of radix $-M / 2$ character digits $n$ to a given operand length of $m \gg 1$ decimal digits, we found to be $1: 2$. Neglecting the time for converting operands to $\operatorname{Rr} 7 S q S d / S S R N S$ representation the speed $T$, of the addition circuit can be described by equation (11)

$$
\begin{gather*}
T=\max \left(4 \tau_{R r 7+}, \tau_{S S R N S+}\right)+\tau_{M G}+n / 2\left(\tau_{A N D-O R}\right) \\
+\tau_{S S R N S+} \tag{11}
\end{gather*}
$$

In [3] $\tau_{R r 7}=300 n \mathrm{sec}$ while from our circuit's synthesis $\quad \tau_{S S R N S+}=2 \tau_{R r 7} \quad, \quad \tau_{M G}=\tau_{A N D-O R}=4 \tau_{R r 7} \quad$, $\tau_{R r 7+}=3 \tau_{R r 7}$. Hence, the speed of a radix $-M / 2$ character digit operands restricted moduli
signed-digit residue addition operation is expected to be $6 \mu \mathrm{sec}$. The $\bmod -23(\approx 6$-bits $)$ also a single radix $-M / 2$ character digit operands addition in a bit-slice MVL architecture [10] required 10 msec [5] operationexecution time. Ordinary binary circuits where component interconnections are necessary for parallel processing required 100 msec to perform $\bmod -2^{16}$ or a 3 , radix $-M / 2$ character-digits arithmetic [5]. Using our proposed circuit the estimated periods for, $\bmod -2^{16}$, mod $-2^{64}$ and $2^{512}$ (corresponding to 3,10 and 96 radix $-M / 2$ character-digits arithmetic operations) are: $7.2 \mu \mathrm{sec}, 11.4 \mu \mathrm{sec}$ and $63 \mu \mathrm{sec}$ respectively. Most merging Information Technology and Secured Communication devices are elliptic curve cryptosystem supported. These devices require just 160 - decimal digits or 512 - bits keysize for an acceptable transmitted message security level. It is our opinion that the performance of such devices using our proposed scheme for the design implementation of the arithmetic circuits will no doubt have an edge over their contemporaries.

## REFERENCES

[1] R. Zimmermann, "Lecture notes on computer arithmetic: Principles architecture and VLSI design." Integrated system Laboratory, Swiss Federal Institute of Technology ( Technology (ETH), CH-8092 Zurich, Switzerland, March 16, 1999.
[2] Israel Koren, "Digital computer arithmetic ECE 666 Part 2 Unconventional number systems," (2004). http://www.ece. Umass.edu/ece/koren/arith/slides/Part2-unconv....
[3] Oluwole Adegbenro and Daikpor, M.N (2005), "Design of a 7-value Complementary Pass gate derived T-gate," Proc. International Engineering Conference, University of Lagos, 2005, Lagos Nigeria, Vol. 1 pp 146-158.
[4] S.Shieh and C. Wu, "Asymmetric high-radix signed-digit number system for carr-free addition. Accepted for publication", Journal of Information Science and Engineering 19, February 24, 2003, pp. 1015 1039.
[5] B. Parhami, "RNS Representation with redundant residues-Invited Paper," IEEE(2001) 0-7803-7147-x/01/S10.00, pp. 1651-1655.
[6] S.Wei, "Number conversion between RNS and mix-radix number system based on modulo ( $2^{\mathrm{P}}-1$ ) signed-digit arithmetic," Proc. SBCCT'05 September 4-7, 2005, Florianopolis, Brazil, ACM 1-59593-174-0/05/0009, pp. 160-165.
[7] M. Kameyama and Higuchi, "Design of a radix-4 signed-digit arithmetic circuits for digital filters," Proc. Of the $12^{\text {th }}$ IEEE International Symposium on Multiple-valued Logic, North-western University, Evanston, IL. USA 1980. Pp 272-277.
[8] M.N. Daikpor and O. Adegbenro. "Restricted Moduli Symmetrical Signed Residue Addition Part I". Proc of the $18^{\text {th }}$ Telecommunication forum TELFOR 2010, Serbia, Belgrade, November 23-25, 2010, pp 646 - 649 .
[9] M.N. Daikpor and O. Adegbenro, "Efficient Carry-free Rr7SqSd Addition Algorithm," Proc of the $3^{\text {rd }}$ International Conference on Emerging Trends, Research directions and Training Requirements of the $21^{\text {st }}$ century
[10] M. Honda, M. kameyama and T. Higuchi, " Residue Arithmetic based Multiple-Valued VLSI Image Processor," Proc of the $22^{\text {nd }}$ IEEE


Fig. 1 Restricted moduli symmetrical SD residue parallel addition circuit

# Concurrent Realization of the Multiply-by-7 Elliptic Curve Scalar Multiplication algorithm 

Michael NaseimoDaikpor<br>Department of Electrical and Electronic Engineering,, Faculty of Engineering,<br>University of Lagos,<br>Lagos, Nigeria<br>mndaikpor@yahoo.com

Oluwole Adegbenro<br>National Research Center for Energy Efficiency and Conservation University of Lagos<br>Lagos, Nigeria<br>Oluwole_adegbenro@yahoo.com

Abstract- this paper investigates the multiply-by-7 Elliptic Curve (EC) point $P$ Scalar Multiplication algorithm for reduced computational complexity and enhanced inherent parallel property based on the Area-Time ( $\mathrm{AT}^{2}$ ) metric. The findings were compared with those obtained when the algorithm was again realized on the Jacobian projective coordinate and the Non-Adjacent Form (NAF). The investigation revealed 27\% and $8 \%$ computational complexity reductions over the Jacobian and NAF realizations. The algorithm also presents the best $\mathrm{AT}^{2}$ value.

Keywords- concurrent processing; computational complexity; data flow graph; scheduling algorithm

## I. Introduction

Todays' emerging Information Technology and secure Communication Systems (ITCS) such as: pocket size laptop computers, palm held operating systems, Personal digital assistant, ubiquitous sensor networks, automated teller machines, smart cards, mobile phones are beneficiaries of Elliptic Curve Cryptography (ECC) [1], [2]. These gadgets are driven by EC arithmetic operations of which scalar or point [3] multiplication is the major operation. EC scalar multiplication is essentially scaling a random point $P(x, y)$ on the elliptic curve with a scalar quantity $m \gg 1$, in such manner that the product $W=m P \quad$ is also a point $P\left(x_{W}, y_{W}\right)$ on the same elliptic curve. Finding $W$, from known values of $P$ and $m$ is easy; but finding $m$ from given values of $W$ and $P$ is an exponentially hard mathematic problem known as the Elliptic Curve Discrete Logarithm Problem (ECDLP). While ECDLP is the main attraction of EC into Cryptography the manner of computing $m P$ is also a critical issue in the design of secure communication systems [4], [2]. Elliptic curve scalar multiplication computation is based on two basic arithmetic operations namely: EC points' addition and EC point doubling. They are in turn calculated using finite field addition, multiplication, squaring and inversion operations. The inversion operation is slow and expensive.

The conventional method of computing $W=m P$ by
$m-1$ consecutive EC point addition operations in the affine coordinate system requires $m-1$ field inversions. For large values of $m$ the procedure is sluggish, slow and cost prohibitive. There are now various high speed EC scalar multiplication techniques. The Double Base Number System (DBNS) [6], the direct Tripling $3 P$ and Quadrupling $4 P$ approach in [3], the binary addition/subtraction chain methods of [7] have become house hold methods. Another popular method of accelerating EC scalar multiplication operation is to perform the main computation in projective coordinate system [5] that is inversion free and return to the affine system for the final $x_{W}, y_{W}$ computation. Many forms of projective coordinate systems now exist but the choice of a particular is determined by the dominating EC arithmetic operation. Most of the existing scalar multiplication algorithms are serial or sequential machine implementoriented. This poses on devices upper speed limit. Consequently, current solutions have not been able to totally meet the intended high speed requirements such as in real time critical services delivery system.

Naturally EC arithmetic is concurrent implement-friendly in nature so attention now focuses on exploiting this inherent parallelism to accelerate scalar multiplication operation. Concurrent realization of an operation is all about performing several sub operations simultaneously provided resources are available to extract optimal performance from the system. The resources are usually the hardware circuits of the most dominant arithmetic operation(s). Generally with EC arithmetic, field multiplication is the dominant operation hence; the resources in this case are multipliers. Pioneer works on concurrent EC scalar multiplication computation procedures can be found in [8] where non adjacent binary sequences are used to reduce the total number of addition. In [9], where scalable multipliers are used to replicate design for varying key sizes and [10] in which EC processors are designed and simulated on FGPA. These works remain novel and laudable contributions. However, they are all based on the projective coordinate systems that trade fewer number of
inversion operations for increased computational complexity cost.

In this contribution, we model concurrent realization of EC scalar multiplication in the affine coordinate system based on a simple multiply-by-7 algorithm to reduce computational complexity and enhance operation speed. The rest of this paper is organised as follows. In section II we recall the normal EC basic arithmetic operation equation in the affine and Jacobian coordinate systems and a slight modification we have made in the original affine coordinate version. The multiply-by-7 scalar multiplication algorithm is presented in section III. The concurrent computation model is presented in section IV. In section V we briefly discuss the results of our investigation. Conclusion is given in section VI.

## II. EC basic operation Equation

EC rank after straight lines and conics in the hierarchy of curves. They are projections of non-singular genus one algebraic curves defined over some $\mathbf{K}$ fields with $k$ rational points and a point at infinity constituting a group. It is customary [11], [12], and [13] to concisely describe an elliptic curve $E$, together with the point $O$, at infinity defined over either binary or prime fields by (1).

$$
E:\left\{\begin{array}{l}
y^{2}=x^{3}+a x+b ; \text { for prime fields }  \tag{1}\\
p \geq 3 ; a, b,(x, y) \in G F\left(p^{k}\right) \text { and } \\
4 a^{3}+27 b^{2} \neq 0 \\
y^{2}+x y=x^{3}+a x^{2}+b ; \text { for binary Galois } \\
\text { fields and } a, b,(x, y) \in G F\left(2^{k}\right)
\end{array}\right.
$$

If the three points $U=P\left(x_{U}, y_{U}\right), V=P\left(x_{V}, y_{V}\right)$ and $W=P\left(x_{W}, y_{W}\right) \in E$ then, $W=V+U \quad$ and $W=U+U=2 U$ are respectively referred to as the EC points' addition and point doubling operations. These operations are traditionally computed in two phases namely, pre-computation of the quantity $\lambda$, phase and a cardinal point components $x_{W}, y_{W}$ computation phase. The prime field versions of the two phases can be represented by (2).
$\lambda=\left\{\begin{array}{l}\frac{y_{V}-y_{U}}{x_{V}-x_{U}} ; \text { if points' addition } \\ \frac{3 x_{U}^{2}+a}{2 y_{U}} ; \text { if point doubling } \\ x_{W}=\left\{\begin{array}{l}\lambda^{2}-x_{U}-x_{V} ; \text { if points addition } \\ \lambda^{2}-2 x_{U} ; \text { if point doubling }\end{array}\right\} \\ y_{W}=\lambda\left(x_{U}-x_{W}\right)-y_{U}\end{array}\right\}$

As earlier mentioned performing EC arithmetic operations in the projective coordinate system increases computational complexity but it is less expensive. In the projective and in particular the Jacobian projective coordinate system a point $P(x, y) \quad$ in the affine coordinate is equivalent to $P\left(X / Z^{2}, Y / Z^{3}\right)$. Thus, (2) in the Jacobian projective coordinate is represented as in (3)

$$
\begin{align*}
& \lambda=\left\{\begin{array}{l}
\frac{Y_{v} / Z_{v}^{3}-Y_{U} / Z_{U}^{3}}{X_{V} / Z_{V}^{2}-\frac{X_{U}}{Z_{U}^{2}}} ; \text { if points' addition } \\
\frac{3 X_{U}^{2}+a}{Y_{U} / Z_{U}^{3}} ; \text { if point doubling } \\
X_{W}=\left\{\begin{array}{l}
\lambda^{2}-X_{U} / Z_{U}^{2}-X_{V} / Z_{V}^{2} ; \text { if points' addition } \\
\lambda^{2}-2 X_{U} ; \text { if point doubling }
\end{array}\right. \\
Y_{W}=\lambda^{2}\left(X_{U} / Z_{U}^{2}-X_{W} / Z_{W}^{2}\right.
\end{array}\right)-Y_{U} / Z_{U}^{3} \tag{3}
\end{align*} \$
$$

Equations (2) or (3) form the bases of all scalar multiplication techniques. However, in both (2) and (3) $y_{W}$ component is computed based on the availability of the $x_{W}$ component. We eliminated this $x_{W}$ before $y_{W}$ requirement by modifying (2) in such manner that $x_{w}$ or $y_{W}$ is computed directly from source to enhance parallel hardware-implementation friendly architecture. The modification is described by (4).
$x_{W}=\left\{\begin{array}{l}\frac{\Delta^{2} y-\Delta^{2} x \nabla x}{\Delta^{2} x} ; \text { if EC points' addition } \\ \frac{A_{0}^{2}-8 x_{u} y_{u}^{2}}{\left(2 y_{u}\right)^{2}} ; \text { if EC point doubling } \\ y_{W}=\left\{\begin{array}{l}\frac{\Delta^{3} y-\Delta^{2} x(\nabla x \Delta y-\Delta x y)}{\Delta^{3} x} ; \text { if } \\ \text { points' addition operation } \\ \frac{-A_{o}^{3}+12 A_{o} x_{u} y_{u}^{2}-8 y_{u}^{4}}{\left(2 y_{u}\right)^{3}} ; \text { if } \\ \text { point doubling operation }\end{array}\right.\end{array}\right\}$ (4)

Where: $\nabla x=x_{V}+x_{U}, \Delta x=x_{V}-x_{U}, \Delta y=y_{V}-y_{U}$, $\Delta x y=x_{V} y_{U}-x_{U} y_{V}$ and $A_{0}=3 x_{U}^{2}+a$. It is important to note that migrations to projective coordinate system only postpone inversion operation and never completely remove it. Practical applications require the computed result in the affine domain. The implication is that there is at least one inversion operation for every terminal computation in the projective coordinate system. Using: $A, M, S, I$ to represent field Addition, Multiplication, and Squaring and Inversion operations respectively the computational complexity costs of executing (2), (3) and (4) are as presented in table 1.

TABLE 1 COMPARATIVE COMPUTATIONAL COMPLEXITY

| EC <br> Operation | Equation (2) <br> in affine <br> coordinates <br> system | Equation (3) <br> in <br> projective <br> coordinate | Equation (4) <br> in affine <br> coordinate |
| :--- | :---: | :---: | :---: |
| Addition | $6 \mathrm{~A}+3 \mathrm{M}+1 \mathrm{I}$ | $6 \mathrm{~A}+23 \mathrm{M}+1 \mathrm{I}$ | $7 \mathrm{~A}+12 \mathrm{M}+1 \mathrm{I}$ |
| Doubling | $4 \mathrm{~A}+4 \mathrm{M}+1 \mathrm{I}$ | $4 \mathrm{~A}+13 \mathrm{M}+1 \mathrm{I}$ | $4 \mathrm{~A}+11 \mathrm{M}+1 \mathrm{I}$ |

Considering multiplication operation only the proposed modification is efficient. The algorithm we review in the next section is based on (4).

## III. THE MULTIPLY-BY-7 SCALAR MULTIPLICATION ALGORITHM

As already enunciated there are several novel schemes [3], [6] for computing EC scalar multiplication operation. The work [14] in particular used special addition chains. The multiply-by-7 algorithm repeatedly multiply a random point on an EC by 7 and adds or subtracts 1,2 or 3 multiples of the previous point. The exact multiple is determined from a Restricted radix-7 Symmetrical quaternary Signed digit (Rr7SqSd) representation [15] of the scalar quantity. Hence, we shall first review the Rr7SqSd number system and its addition/subtraction chain concept before introducing the algorithm.

## A. Definition of the Rr7SqSd number system and its addition-subtraction chain

Definition 1: The Rr7SqSd number system is a radix-7 number system with its character digits $0,1,2,3,4,5,6$ recoded in the symmetrical quaternary signed character digits $\overline{3}, \overline{2}, \overline{1}, 0,1,2,3$.
The overlay $\bar{i}=-i$ and the recoding are obtained by reducing all the radix- 7 character digits with a restrictor $a=3$. Table 2 shows the radix-7 character digit-Rr7SqSd-binary equivalents.

TABLE 2 RADIX-7 Rr7SqSd BINARYEQUIVALENTS

| Radix-7 | 4 | 5 | 6 | 0 | 1 | 2 | 3 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| Rr7SqSd | -3 | -2 | -1 | 0 | 1 | 2 | 3 |
| Binary | 100 | 101 | 110 | 000 | 001 | 010 | 011 |

Arithmetic operations in the Rr7SqSd number system can be described by (5)
where: the operator $\Theta \in\{+,-, *, /\}$ and $\alpha_{i}, \beta_{i}, \tau_{i} \in\{\overline{3}, \overline{2}, \overline{1}, 0,1,2$,$\} . Decimal numbers are first$ converted to radix-7 before using the addition option of (5) to convert the radix-7 character digits to Rr 7 SqSd number system. Conversion from Rr7SqSd to decimal can be carried out using (6)

$$
\begin{equation*}
X=\sum_{i=n-1}^{0} \alpha_{i} 7^{i} \tag{6}
\end{equation*}
$$

Where $X$ is the decimal equivalence of the Rr 7 SqSd string

$$
\alpha=\alpha_{n-1} \alpha_{n-2} \ldots \ldots . . \alpha_{1} \alpha_{0}
$$

The Rr7SqSd number system similar to the binary number system supports addition-subtraction chain representation [7] of an integer. A binary addition-subtraction chain of integer $W$ is a sequence of integers $a_{0} a_{1} a_{2} \ldots a_{r}$ where the starting value $a_{0}= \pm 1$ and the ending term $a_{r}=w$ such that any $a_{k}=a_{i}+a_{j}$ for $0<i, j<k$. For example one form of the integer $w=23$ addition-subtraction chain is $\pm 1,2,4,5,7,1,1,18,23$. In a similar manner we define the Rr7SqSd addition-subtraction as follows.
Definition 2: If $\alpha=\alpha_{n-1} \alpha_{n-2} \ldots \ldots \ldots \alpha_{1} \cdot \alpha_{0}$ is the Rr7SqSd representation of any integer $w$. then an Rr 7 SqSd additionsubtraction chain of $w$ is the sequence of integers $w_{0}$ $w_{1} w_{2} \ldots \ldots \ldots w_{r}$, starting from $w_{0} \in(1,2,3)$ and ending with $w_{r}=w$ such that any $w_{k}$ is the sum or difference of the previous integer term $w_{k-1}$ multiplied by 7 and the $k^{\text {th }}$ character symbol $\alpha_{k} \in\{\overline{3}, \overline{2}, \overline{1}, 0,1,2,3\}$ of the integer $w^{\prime} s$ Rr7SqSd representation

$$
\begin{equation*}
w_{k}=7 w_{k-1}+\alpha_{k} \tag{7}
\end{equation*}
$$

For example, the Rr 7 SqSd representation of $w=1237$ is $1 \overline{3} \overline{3} 2 \overline{2}$ and its Rr7SqSd addition-subtraction chain is 1 425177 1237. An addition-subtraction chain is an algorithm for computing $m P$ given the integer $m$ and the random point $P$ on an EC. Hence, we now present the multiply-by-7 EC scalar multiplication algorithm.

## B. Multiply-by-7 EC scalar multiplications

Detailed treatment of the algorithm we review below can be found in [16]. The algorithm can be applied to both EC defined over both binary and prime fields however, in this work discussion is limited to prime fields only. Let E , denote an EC defined over the prime field $G F\left(p^{k}\right)$ together with the point O , at infinity such that
E: $\quad y^{2} \bmod p^{k}=x^{3}+a x+b \bmod p^{k}$
;
$4 a^{3}+27 b^{2} \bmod p^{k} \neq 0$
If $P\left(x_{G}, y_{G}\right)$ be a generator point of $E$ and $m,(m \gg 1)$ a scalar quantity. Let the string $\alpha_{n-1}, \alpha_{n-2}, \ldots, \alpha_{j}, \ldots, \alpha_{1}, \alpha_{0}$ with $\quad \alpha_{n-1} \in\{1,2,3\} \quad$ and $\quad \alpha_{j} \in(\overline{3}, \overline{2}, \overline{1}, 0,1,2,3)$, $n-1<j \leq 0$ be the $\operatorname{Rr} 7 S q S d$ representation of $m$. Now if the elements of the sequence $\quad W \in\left\{W_{n-1}, W_{n-2}, . ., W_{j} . ., W_{1}, W_{0}\right\}$ $W_{j}=P\left(x_{j}, y_{j}\right)$ are points on $E$, obtained by using (9)
$W_{j}=\left\{\begin{array}{l}W_{n-1}=\alpha_{n-1} P\left(x_{G}, y_{G}\right) ; \text { if } j=n-1 \\ 7 W_{j-1}+\alpha_{j} W_{j-1} ; \text { if } n-2 \leq j \leq 0\end{array}\right.$
Then, (9) computes the consecutive points of the EC scalar multiplication operation $W=m P$ by way of several one-stop or direct $7 P$ computations We call (9) in this contribution the multiply-by-7 EC scalar multiplication algorithm. Obviously, (9): i) effectively reduce the number of inversions from $m-1$ to $j ;(j \gg m)$. ii) Decomposes the art of computing EC $W=m P$ arithmetic operation to several one-stop 7P computations. Fig. 1 shows the procedure for computing (9) using (4). First, depending on the most significant digit $\alpha_{n-1} \in\{1,2,3\}$ the EC generator point $P\left(x_{G}, y_{G}\right)$ is simply taken as the start point i.e., $Q=P\left(x_{G}, y_{G}\right)$ or doubled $Q=2 P\left(x_{G}, y_{G}\right)$ or tripled $Q=3 P\left(x_{G}, y_{G}\right)$ to become $W_{n-1}$ or $W_{j-1}$. These pre computations processes are designated $1 P_{S E L}, 2 P_{S E L}$ and $3 P_{S E L}$ respectively in fig. 1 . Their internal and external out variables are designated $\phi_{i} \in\{A, B, C, D, E, F, H\}, i \in\{0,1,2\}$ in the top square of fig. 1. This value of $Q\left(=W_{j-1}\right)$ is separately multiplied by 7 and by $\alpha_{j} \in\{\overline{3}, \overline{2}, \overline{1}, 0,1,2,3\}$ in the EC domain. The sum or difference of the two products $7 W_{j-1}$ and $\alpha_{j} W_{n-1}$ represents the $j^{\text {th }}$ iteration of the multiply-by-7 EC scalar multiplication operation. This sum/difference is field inverted to extract the $x_{w}, y_{w}$ components which is the new generator


Figure. 1 Multiply-by-7 EC scalar multiplication algorithm.
point for the $(j+1)^{\text {th }}$ iteration. Multiplying $W_{j-1}$ by a negative value of $\alpha_{j} \in\{-3,-2,-1\}$ is merely a tripling, doubling or just $1 P$ EC operation on $-P\left(x_{j-1}, y_{j-1}\right)$ which is equal to $\alpha_{j} P\left(x_{j-1},-y_{j-1}\right)$.

The manner of realizing the one-stop or direct 7P computation is crucial to the scalar multiplication operation speed. To avoid the traditional method of 6 repetitive EC point addition operations which in turn requires 6 field inversion operations the operation is executed with one field inversion by using four sub processes: $2 P=P+P$, $3 P=2 P+P, 4 P=3 P+P$ and $7 P=3 P+4 P$. The internal and external outputs of these four sub processes are similarly designated by the elements of the set $\varphi_{17} \in\{A, B, C, D, E, F, H\} \quad$ where $\quad i \in\{1,2,3,7\}$ correspond to $2 P, 3 P, 4 P$ and the $7 P$. They are the contents of the 3 left hand and first right hand rectangles that are below the top square.. Internal and external variables of the $7 W_{j-1}+\alpha_{j} W_{j-1}$ computation (or update) are also designated $A_{U D}, D_{U D}, F_{U D}, B_{U D}, C_{U D}, H_{U D}, E_{U D} \quad$ and presented in the rectangle directly below the $\varphi_{77}$ sub process of fig. 1 .

The following numerical example illustrates the working of fig. 1. Consider the EC $E$ described by the equation $y^{2}=x^{3}-53 x+218 \bmod 17011$ and
let $P(8360,14564)$ be a generator point. Given the scalar quantity $m=16299 \equiv 10 \overline{1} \overline{3} \overline{3} 3_{\text {Rr7 SqSd }}$. The result of an EC scalar multiplication operation $W=m P$ using the multiply-by-7 algorithm would yield the 6 curve points:
$1 P=(8360,14564), 7 P=(13509,216)$, $48 P=(12536,6665), 333 P=(10407,4280)$,
$2328 P=(14924,5526), 16299 P=(1594,2138)$.
The algorithm execution time with Quick Basic version 4.5 on a Pentium III processor for values of $m$ up to 640 decimal digits took 0.087 seconds.

Equation (9) was also realized in the Jacobian projective coordinate and in the Near Adjacent Form (NAF) projective coordinate domains using equation (3). The computational complexities for the NAF, Jocabian and the multiply-by-7 options are respectively: $34 A+94 M+1 I$, $38 A+110 M+1 I$ and $36 A+87 M+1 I$. Thus, there is a relative $27 \%$ and $8 \%$ reduction in computational complexity over the Jacobian and the NAF respectively by the multiply-by- 7 algorithm. In the next section we present the concurrent
realization model of this algorithm and estimate a parallel mode-versus- sequential mode speed enhancement figure of merit ratio $\xi$.

## IV CONCURRENT OPERATION MODEL OF THE MULTIPLY-BY-7 ALGORITHM

Concurrency is all about performing more than one operation simultaneously provided resources are available to extract optimal performance from a system. In order to balance devices chip areas and system speed it is necessary to determine the most appropriate number of resources that can run concurrently. Dependency graphs are usually employed to identify concurrency in a given computational processes and determine the optimal number of resources. To achieve this objective the operation processes are scheduled using popular scheduling algorithms such as: the As Soon As Possible (ASAP) and As Late As Possible (ALAP) scheduling algorithms described in [10], [9], [17] and [18]. In this work we adopted the method outlined in [10] to identify those statements in fig. 1 that can be performed simultaneously. These are grouped together to form a batch so that the multiply-by-7 scalar multiplication operation is executed in batches. We judge the system performance by establishing the number of field multiplication operations in a batch and the total number of batches necessary to execute the entire algorithm. It is necessary to note that in this context a batch is a multiplication cycle. The time to execute a particular field multiplication operation is distinct hence, in a batch the longest execution time constitutes the batch processing time. This constituted our concurrent model of fig. 1. The data flow graph for the $2 P$ sub process is given in appendix $A$ as an example.

Our concurrent processing model of the multiply-by-7 scalar multiplication algorithm based on the ASAP scheduling principle consist of eight data flow graphs corresponding to the eight sub processes $1 P_{S E L}, 2 P_{S E L}, 3 P_{S E L}, 2 P$, $3 P, 4 P, 7 P$ and the $7 W_{j-1}+\alpha_{j} W_{j-1}$. The computation flow of the graphs follows pipeline processing. Input variables of a $i^{\text {th }}$ sub process are output variables of $(i-1)^{\text {lh }}$ sub process. Thus, the input variables for $2 P$ sub process are the outputs of either $1 P_{S E L}, 2 P_{S E L}$ or $3 P_{S E L}$ in the start-value selection stage. The $2 P$ output variables: $C_{17}^{3}, E_{17}, F_{17}, D_{17}^{2}, B_{17}, D_{17}, C_{17}^{2}$ and $C_{17}$ are part of the input values to the $3 P$ sub process. These outputs in turn are the inputs for $4 P$ sub process. Analyses of the entire sets of graphs provide an insight to the concurrent processing ability of the algorithm. Furthermore since a batch is essentially a multiplication cycle the total number of batches constitutes the number of multiplication cycles $m_{c}$ of a given sub process.

The number of field operations $n_{f o}$ in a batch is the number of hardware units that run simultaneously in that batch. Thus the number of hardware units required to execute a sub process equals the highest number of field operations $n_{f o \max }$ among the batches of the sub process. That is $n_{f o \max }=\max \left(n_{f o 1}, n_{f o 2}, \ldots n_{f o h}\right)$. We ignored the number of addition and subtraction operation as well as their durations in our models.

Similar models ( 8 data flow graphs each) were developed for the Jacobian and NAF option of realizing equation (9). Up to 7 field multiplication operations could run simultaneously in the $7 P=3 P+4 P$ sub process of the NAF realization option. The information extracted from the models are:
i) Number of multiplication cycles per sub process.
ii) Total number of field multiplication operations per sub process per realization as the number of multipliers running concurrently increases from 1 to 7.
iii) The highest number of multipliers in one multiplication cycle per sub process per realization
The corresponding Area-Time squared ( $\mathrm{AT}^{2}$ ) values per sub process for each realization were computed. The expression $T_{j}=\sum_{i=1}^{h} t_{j i}$ where $t$ - represent multiplication cycle duration of the $j^{\text {th }}$ sub process evaluates the speed algorithm.

The trend in today's ITCS is for the system to be as small as possible. This has made device chip area (A) also as a factor of merit. Hence, in design works either area A, or time T is considered more important (sometimes A and T are considered equally important). Thus, the products metric: AT, $A^{2} T$ or $A T^{2}$ are often used to judge operation parallelism. We employed the $\mathrm{AT}^{2}$ metric. This automatically made the chip area A common denominator.

One objective of this section is to determine the best method of parallelizing the scalar multiplication process of (9). Equation (2) based Jacobian, NAF realizations or a (4) based method of implementing (9). The relevant parameters for this decision are the: i) lowest number of multiplications in the sequential mode. ii) Best $A T^{2}$ value as the number of resources increases from 1 (sequential) to 7 (parallel mode). iii) Highest speed enhancement as a result of the concurrent realization. iv) The total number of multipliers in parallel to provide the figures of merit in the various implementation options. . It must be noted also that the grand total number of (multiplication cycles for the Jacobian, NAF and the proposed algorithm are 37,34 and 29 respectively which shows that the proposed algorithm has the smallest number of field multiplication operation cycles. The speed enhancement ratio $\xi_{i}$ was determined using (10).
$\xi_{i}=\frac{\sum_{i, j=1}^{1,7}\left(A T_{j}^{2}\right)-\sum_{i=2, j=1}^{7,7}\left(A T_{j}^{2}\right)}{\sum_{i=2, j=1}^{7,7}\left(A T_{j}^{2}\right)}$
Where $i$ - is the number of multipliers in parallel and $j$ - a subscript representing any of the sub processes: $1 P_{S E L}, 2 P_{S E L}, 3 P_{S E L}, 2 P, 3 P, 4 P, 7 P$ and $U g p$.

## V DISCUSSION

Figures 2,3 and 4 show the graphical presentation of some major information from the various realizations. The bar chart plots of fig. 2 represent total number of field multiplication operations as the number of multipliers increases from 1 to 7 . The plots in groups of three represent the Jacobian, NAF and the algorithms options of realization. All plots as expected are exponential decays as the resources on line increase. The proposed method presents the smallest total number of field multiplication operations ( 87 M ) required to execute iteration. Similarly, fig. 3 shows the plot of $\mathrm{AT}^{2}$ values for increasing number of resources in parallel The Jacobian and NAF approaches presented their best $\mathrm{AT}^{2}$ values at 5 multipliers working concurrently. Our proposal do not only presents the lowest $\mathrm{AT}^{2}$ value but also at 4 multipliers operating in parallel.


Figure. 2 Total number of field multiplication operations per realization versus number of multipliers


Figure. $3 \mathrm{AT}^{2}$ Values as a function of number of resources in parallel.

Finally the plot of fig. 4 shows the percentage speed enhancements as a result of concurrent processing of scalar multiplication operation. The NAF approach presents the lowest speed enhancement percentage while the proposed algorithm presents the highest speed enhancement.


Figure. 4 Percentage speed enhancement over sequential regime.

The speed enhancements for the Jacobian and NAF approaches peaked at 5 multipliers working concurrently. In the related work of [19], the authors chose the standard projective coordinate as the appropriate efficient choice for parallel designs and implemented their parallel architecture. This was as a result of the corresponding speed enhancement
peaking at 4 multipliers. The proposed algorithm speed enhancement also peaked at 4 multipliers working in parallel. This simply indicates that the approach is efficient and that it provides a $25 \%$ saving in hardware over Jacobian and NAF styled implementations.

## VI CONCLUSION AND FURTHER WORK

This investigation has established that an Rr 7 SqSd addition/subtraction chain based multiply-by-7 EC scalar multiplication scheme if concurrently realized can result in reduced computational complexity and high parallel operation execution speed. The approach is useful in the design of multiple-valued logic elliptic curve cryptosystem.

## REFERENCES

[1] Kefa Rabah ; "Elliptic curve cryptography over binary finite fields GF(2m)" Asian network for Scientific Information. ISSN 1812-5638. Information Technology Journal 5(i) :pp 204 -229, 2006.
[2] David Malan, "Crypto for tiny objects", Computer Science group , Harvard University Cambridge, Massachusetts , 2004. http://www.cs.havard-edu/malan/publication/tr-04-04.pdf..
[3] Yasuyuki Sakai and Kouichi Sakurai "Efficient scalar multiplication with direct computation of several Doublings" :IEICE Trans. Fundamentals, Vol. E84-A N01 January 2001 pp. 120-128
[4] "An introduction to Elliptic Curve Cryptography-Deviceforge.com ....tomorrow's device technology today," http://www.DeviceForge.com/articles \& white papers /AT4234154468, July 20, 2004.
[5] M. S. Anoop, "Elliptic Curve Cryptography - An Implementation guide". Online implementation Tutorial, 2007. http://www.tataelxsi cer/whitepapers/ECC Tut v/ 0.pdf id
[6] Dimitrov,V.S.; Imbert, L.; Mishra,P.K.; "Fast Elliptic curve point multiplication using Double-base Chains"; University of Calgary, 2500 University drive NW Calgary AB, T2n In4, Canada 2005 ppl-13 http://eprint.iacr.org/205/069.ps.
[7] Cetin Kaya Koc "Elliptic curve cryptosystem" Department of Electrical and Computer engineering, Oregon State University, Oregon 97331, 2004.
[8] Jun-Hong Chen, Ming-Der Sheih and Chien-Ming Wu, "Concurrent algorithm got high-speed point multiplication in elliptic curve cryptography," 0-7803-8834-8/05\$20.00, IEEE 2005, pp. 5254-5257.
[9] Hakim Khali and Ahcene Farah," Cost effective implementations of GF (p) elliptic curve cryptography computations," IJCSNS International journal of computer science and network security, Vol 7 N0. 8, August 2007, pp. 29-37.
[10] Adnan Abdul-Aziz Gutub, "Remodeling of elliptic curve cryptography scalar multiplication architecture using parallel Jacobian coordinate system," International Journal for computer science and security (IJCSS), Vol. 4: Issue 4, 2009, pp. 409 -425.
[11] "Certicom ECC Challenges". Ecc.project@edu.informatik.tudarmstadt.de. 2003. hitp://wwav certicom.com/index.php/the-certicom-ecc--challenge.
[12] Avikak "Lecture notes on "Introduction to computer security". Avinagh Kak Purdue University 2010 http://cobweb.ecn.purdue.edu/ $/ \mathrm{kak} /$ compsec/newLectures/lecture 14.pdf.
[13] Igbal H. Jubril, Rus Salleh and AI-Shawabkeh M. "Efficient algorithm in projective coordinates for ECC over ", Internal Journal of Computer, internet and management. Vol. 15 N0 1, April 2007, pp., 43-50.
[14] Nicolas Meloni "fast and secure EC Scalar multiplication over prime fields using special Addition chains"; Institute de Mathematique et de modisatic de Montpeller, UmR 5149, Montpeller France. Citeseerx.ist.psu.edu/viewdoc/download? doi $=10.61 .5861[1]$ pdf.
[15] Michael Naseimo Daikpor and Oluwole Adegbenro, " Efficient carryfree Rr7SqSd addition algorithm," Proceedings of the $3^{\text {rd }}$ International
conference on emerging trends research direction and training requirements of the $21^{\text {si }}$ century Electrical and Electronics Engineering, Lagos, Nigeria, $22^{\text {nd }}-24^{\text {th }}$ July 2009, pp. $102-107$.
[16] Michael Naseimo Daikpor and Oluwole Adegbenro," Elliptic curve scalar multiplication with direct 7P computation for secure communication systems, " Proceedings of the International conference on innovations in Engineering and Technology (IET 2011)- Sustaining $21^{\text {st }}$ century Engineering infrastructure, Lagos- Nigeria $8^{\text {th }}-10^{\text {th }}$ August 2011, pp. 179-189.
[17] Turki F. Al-Somani, M. K. Ibrahim and Adnan Gutub, "Highly efficient elliptic curve crypto-processor with parallel GF ( 2 m ) field multiplier," Journal for Computer Science 2 (5): 2006, ISSN 1549-3636, pp. 395 400.
[18] Robert A Walker, "Introduction to scheduling problem," EEEE Design and Test of Computers, 1995, pp. 60-69.
[19] A. Gutub and M.K Ibrahim, "High radix parallel architecture for GF (p) elliptic curve processor," IEEE Conference on acoustic, speech and signal processing - ICASSP 2003, pages $625-628$, Hong-Kong, April 6 -10, 2003.

Appendix A

i) Example of the $2 P$ sub process data flow graph.

Notes to the appendix A:
i) The rectangular shapes represent field addition or subtraction operations while the circular and elliptical shapes represent field multiplication operations.


[^0]:    Corresponding First M. N. Daikpor, is with Faculty of Engineering, Department of Electrical and Electronics Engineering, University of Lagos, Akoka Lagos, Nigeria e-mail
    michaelnnaseimodaikpor@rocketmail.com
    Second O. Adegbenro, is with the Centre for Energy Efficiency and Conservation University of Lagos, Akoka Lagos, Nigeria e-mail wole adegbenro@yahoo.com

