Discussion:
[fedora-arm] Using generic tuning form armhfp
Florian Weimer
2018-01-25 14:46:17 UTC
Permalink
GCC offers a generic tuning option for Arm these days, but we select
-mtune=cortex-8a instead.

Is this still a good choice?

Thanks,
Florian
_______________________________________________
arm mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to arm-leav
Peter Robinson
2018-01-25 14:52:10 UTC
Permalink
Post by Florian Weimer
GCC offers a generic tuning option for Arm these days, but we select
-mtune=cortex-8a instead.
Is this still a good choice?
I suspect the generic tuning is likely a better choice, is there any
details about it anywhere? Basically Cortex-A8 is pretty much the
lowest common denominator for ARMv7

Peter
_______________________________________________
arm mailing list -- ***@lists.fedoraproject.org
To unsubscribe send
Florian Weimer
2018-01-25 15:02:45 UTC
Permalink
Post by Peter Robinson
Post by Florian Weimer
GCC offers a generic tuning option for Arm these days, but we select
-mtune=cortex-8a instead.
Is this still a good choice?
I suspect the generic tuning is likely a better choice, is there any
details about it anywhere? Basically Cortex-A8 is pretty much the
lowest common denominator for ARMv7
The generic tuning has this:

/* Generic Cortex tuning. Use more specific tunings if appropriate. */
const struct tune_params arm_cortex_tune =
{
&generic_extra_costs,
&generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
1, /* Constant limit. */
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* Thumb. */
tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* ARM. */
tune_params::DISPARAGE_FLAGS_NEITHER,
tune_params::PREF_NEON_64_FALSE,
tune_params::PREF_NEON_STRINGOPS_FALSE,
tune_params::FUSE_NOTHING,
tune_params::SCHED_AUTOPREF_OFF
};

The Cortex-A8 tuning is:

const struct tune_params arm_cortex_a8_tune =
{
&cortexa8_extra_costs,
&generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
1, /* Constant limit. */
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* Thumb. */
tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* ARM. */
tune_params::DISPARAGE_FLAGS_NEITHER,
tune_params::PREF_NEON_64_FALSE,
tune_params::PREF_NEON_STRINGOPS_TRUE,
tune_params::FUSE_NOTHING,
tune_params::SCHED_AUTOPREF_OFF
};

The real difference is in generic_extra_costs vs generic_extra_costs,
and too large to include here. One of the differences seems to be that
on Cortex-A8, floating point multiply & divide is considered relatively
more expensive, if I read the sources correctly. But this all a bit
black magic.

Thanks,
Florian
_______________________________________________
arm mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an ema
Peter Robinson
2018-01-25 15:13:48 UTC
Permalink
Post by Florian Weimer
Post by Peter Robinson
Post by Florian Weimer
GCC offers a generic tuning option for Arm these days, but we select
-mtune=cortex-8a instead.
Is this still a good choice?
I suspect the generic tuning is likely a better choice, is there any
details about it anywhere? Basically Cortex-A8 is pretty much the
lowest common denominator for ARMv7
So reading the gcc docs [1] it seems that generic-armv7-a makes sense.

To quote "should tune the performance for a blend of processors within
architecture arch. The aim is to generate code that run well on the
current most popular processors, balancing between optimizations that
benefit some CPUs in the range, and avoiding performance pitfalls of
other CPUs."

We still support a number of Cortex-A8 devices but we have a lot more
Cortex_A7/9/15 devices these days too so I think generic makes sense
here.

Thanks,
Peter

[1] https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
Post by Florian Weimer
/* Generic Cortex tuning. Use more specific tunings if appropriate. */
const struct tune_params arm_cortex_tune =
{
&generic_extra_costs,
&generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
1, /* Constant limit. */
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* Thumb. */
tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* ARM. */
tune_params::DISPARAGE_FLAGS_NEITHER,
tune_params::PREF_NEON_64_FALSE,
tune_params::PREF_NEON_STRINGOPS_FALSE,
tune_params::FUSE_NOTHING,
tune_params::SCHED_AUTOPREF_OFF
};
const struct tune_params arm_cortex_a8_tune =
{
&cortexa8_extra_costs,
&generic_addr_mode_costs, /* Addressing mode costs. */
NULL, /* Sched adj cost. */
arm_default_branch_cost,
&arm_default_vec_cost,
1, /* Constant limit. */
5, /* Max cond insns. */
8, /* Memset max inline. */
2, /* Issue rate. */
ARM_PREFETCH_NOT_BENEFICIAL,
tune_params::PREF_CONST_POOL_FALSE,
tune_params::PREF_LDRD_FALSE,
tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* Thumb. */
tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* ARM. */
tune_params::DISPARAGE_FLAGS_NEITHER,
tune_params::PREF_NEON_64_FALSE,
tune_params::PREF_NEON_STRINGOPS_TRUE,
tune_params::FUSE_NOTHING,
tune_params::SCHED_AUTOPREF_OFF
};
The real difference is in generic_extra_costs vs generic_extra_costs, and
too large to include here. One of the differences seems to be that on
Cortex-A8, floating point multiply & divide is considered relatively
more expensive, if I read the sources correctly. But this all a bit black
magic.
Thanks,
Florian
_______________________________________________
arm mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an em
Peter Robinson
2018-01-27 12:03:56 UTC
Permalink
Post by Peter Robinson
Post by Peter Robinson
Post by Florian Weimer
GCC offers a generic tuning option for Arm these days, but we select
-mtune=cortex-8a instead.
Is this still a good choice?
I suspect the generic tuning is likely a better choice, is there any
details about it anywhere? Basically Cortex-A8 is pretty much the
lowest common denominator for ARMv7
So reading the gcc docs [1] it seems that generic-armv7-a makes sense.
To quote "should tune the performance for a blend of processors within
architecture arch. The aim is to generate code that run well on the
current most popular processors, balancing between optimizations that
benefit some CPUs in the range, and avoiding performance pitfalls of
other CPUs."
We still support a number of Cortex-A8 devices but we have a lot more
Cortex_A7/9/15 devices these days too so I think generic makes sense
here.
I also wonder whether it's worthwhile using neon-vfpv3, I can't tell
from the docs though if a SoC doesn't have neon if it will fall back
to VFP3 or just fail altogether.

Peter
_______________________________________________
arm mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an ema
Florian Weimer
2018-02-07 16:57:03 UTC
Permalink
Post by Peter Robinson
I also wonder whether it's worthwhile using neon-vfpv3, I can't tell
from the docs though if a SoC doesn't have neon if it will fall back
to VFP3 or just fail altogether.
I commented on this already in various other places. NEON only offers
32-bit non-IEEE floats, so it's only applicable with manual tweaking
(even auto-vectorization at -O3 wouldn't use it due to the non-IEEE nature).

I don't think it's worth making the switch, even if we could somehow
verify that it wouldn't impact board support.

Thanks,
Florian
_______________________________________________
arm mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to arm-***@lists.fedorap

Loading...