You can read a full tutorial from perf wiki and that will give a good impression on this utility.
The main problem come when you need to understand why we have to use this utility in linux.
Intro A trivial use the top command will show you the necessary information about your Linux.
If you look closely you will notice that :
load average: 0.09, 0.05, 0.01
The three numbers represent averages over progressively longer periods of time (one, five, and fifteen minute averages). This means for us: that lower numbers are better and the higher numbers represent a problem or an overloaded machine. Now about multicore and multiprocessor the rule is simple: the total number of cores is what matters, regardless of how many physical processors those cores are spread across. Let's use this command: First I will record some data about my CPU:
[mythcat@localhost ~]$ perf record -e cpu-clock -ag
Error:
You may not have permission to collect system-wide stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid,
which controls use of the performance events system by
unprivileged users (without CAP_SYS_ADMIN).
The current value is 2:
-1: Allow use of (almost) all events by all users
>= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
>= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
>= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
[mythcat@localhost ~]$ su
Password:
[root@localhost mythcat]# perf record -e cpu-clock -ag
^C[ perf record: Woken up 17 times to write data ]
[ perf record: Captured and wrote 5.409 MB perf.data (38518 samples) ]
[root@localhost mythcat]# ls -l perf.data
-rw-------. 1 mythcat mythcat 5683180 Feb 21 13:24 perf.data
You can see the perf tool working with root account and result is owned by deafult user.
Let's show this data using the default user - mythcat and perf tool:[mythcat@localhost ~]$ perf report
The result of this command:

[mythcat@localhost ~]$ perf list
List of pre-defined events (to be used in -e):
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
ref-cycles [Hardware event]
alignment-faults [Software event]
bpf-output [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
page-faults OR faults [Software event]
task-clock [Software event]
Let's see one event from this list and that will told us how Fedora working:
[root@localhost mythcat]# perf top -e minor-faults -ns comm
Is use the comm (keys are available: pid, comm, dso, symbol, parent, cpu, socket, srcline,
weight, local_weight) and the -ns args see the manual of perf command.
The result of this command is:

perf sched record # low-overhead recording of arbitrary workloads
perf sched latency # output per task latency metrics
perf sched map # show summary/map of context-switching
perf sched trace # output finegrained trace
perf sched replay # replay a captured workload using simlated threads
Try this example to see the to capture a trace and then to check
latencies (which analyzes the trace in perf.data record file).
perf sched record sleep 10 # record full system activity for 10 seconds
perf sched latency --sort max # report latencies sorted by max
You can also make a map of map of scheduling events by using this command:
[root@localhost mythcat]# perf sched record
This tutorial show you just only 1% of ways of using the perf command.