The Trouble with Troubleshooting

It all starts when we’re told that an application is “broken.” There are many ways for something to be broken, so just saying that doesn’t help us much. We have to ask the client to tell us what they mean. The more details they provide the easier it is for us to replicate the error, and the easier it is to fix.

Our applications have been thoroughly tested before they get to our customers, so any bugs left over are not going to be easy to find. Oftentimes, we’ll find programs that work perfectly unless we follow the exact footsteps of the client.

Once we have replicated the error, we begin our search in earnest. Simple looking websites can contain thousands of lines of code. This is usually the result of multiple functions being written in the same page. If we can replicate the exact error, large swaths of the code can be ignored as they do different things.

After the relevant section is isolated, our job can still be far from done. We first have to analyze the section and check for typos or conditions that may not be met. Even small code pieces can be dense with information that must be sifted through. For complicated blocks, sometimes the only thing to do is remove sections of code and see what breaks. If we’re lucky, nothing breaks, and we might be able to remove the code entirely. If we’re very lucky, deleting it will fix the error completely because not only was it unnecessary, but it introduced the error we’re trying to solve.

Today, however, all the code here is correct and removing sections only caused more things to break. This problem lies deeper, buried on a page that is referenced by the page the client sees.

Investigating this new page, we may find the error, or we may find yet another reference to be explored. Some pages reference multiple files that each must be investigated. Branching paths that are all interconnected, and a single error in any one of them could have a cascading effect on the rest. Only by navigating to each new page and beginning the hunt all over again, can we find the bad code. To say nothing of how long it will take to fix it once we find it.

And that is why you should be patient with your IT guy.

Data Integration

You want to Own The Phone. You want to discover what’s generating your best calls and how your staff is performing on the phone. You see how much Century Interactive’s call tracking reports can help you be a better marketer. But, you are spending most of your day answering emails in your email inbox, watching leads in your CRM, or managing campaigns in a bid management tool. Is it possible to find your call tracking data in there too? Wouldn’t it be nice to view your reports and listen to calls in there?

The answer is Yes! We have more than one options to make it possible for you to integrate our data to where you live.

1. We can feed XML packet shares to designated email address, or post it by invoking webservices as soon as the call comes in. Countless CRM integrations push calls directly into your CRM to see calls with each prospect and customer. We will work with your CRM provider to create a seamless experience for you. In addition to call details information, we can also post worknotes with humanatic reviews to save your time reviewing the calls by extracting useful information for you. You can focus your precious time cultivating customer relations.

2. We can create integrations within your CRM to view call tracking reports without logging into our reporting platform. You don’t need another login and password. One click will take you to our master crafted reports.

google-adwords-call-tracking-integration

3. Our integrations with top bid management tools allow you to use the power of your bid management platform to optimize on phone calls. We can push offline conversions to your bid management platform to help you make informed decisions.

google-adwords-call-tracking-integration

4. Our open API allows you to pull your phone call data from wherever you want, with whatever parameters you need, whenever you prefer. All facets of our reporting and online management can be managed via our open API. You can create accounts, provision tracking lines, retrieve call data / session data, all without logging into our reporting site.

5. We also create custom flat file reports for our clients to make it possible for clients to use SAS or other data analysis software to process and integrate our call tracking data. Business intelligence empowers you with our invaluable data.

Through feed shares, or our open API, here are some of the leading platforms we integrate with:

  • Acquisio
  • Adobe AdLens
  • EleadCRM
  • Dealer.com’s Total Control Dominator
  • DealerSocket
  • Dealertrack
  • Google Analytics
  • Google AdWords
  • Google DoubleClick
  • Kenshoo
  • Marin Software
  • Omniture SiteCatalyst
  • Purecars.com
  • Salesforce.com
  • Showroom Logic
  • Syncapse
  • Vin Solutions
  • And more! Ask us about your CRM or bid management tool!

If you don’t see your preferred CRM on the list, just give us a call. Our developers are masterminds at creating unique integrations.

Ease of Access

Access to information has never been easier than it is today. Whether it is fact checking a rival or finding a local restaurant, our smartphones have given us the advantage no generation has had before. They have been incorporated into the daily routine, and that isn’t an easy thing to accomplish. If a product can make the things you want to get done easier, then it has value. We never want to stop making it easier for a client to get valuable information quickly. We constantly improve our methods of reporting and one way is by making access available through your smartphone.

We want our products to benefit you wherever you work, and our smartphone app will do exactly that. When you need to check a call, but you can’t get back to the office right away, that information will still accompany you throughout the day. When you move between platforms for your work, you won’t lose anything we have to offer. This wider range of access only adds to the value of our amazing products that drive revenue for you. We are excited to expand our coverage and show our dedication to serving our clients’ needs.

 carwars app

You will receive the same great experience with mobile friendly graphs and functionality you currently enjoy. That is the foremost goal for us, because it shouldn’t add more to your plate. This information will work with your schedule and help keep your customers happy.

Instruments for the Occasion

Chances are, if you’ve ever been to a college football game, you’ve heard the drum-line. The loud crashes of cymbals, tight tap of the snares, deep tones of the bass drums, you get the idea. In order to get these specific sounds, different tools are used. Quads use special mallets, as does the pit according to whatever instrument is being played. You wouldn’t use brushes on a snare drum just as you wouldn’t use bass beaters on a xylophone.

Percussion_Beaters

Writing Microsoft SQL queries is the same way. Just as specified mallets and sticks are used in drum-line, different SQL techniques are used depending on the situation.


select refname, lskinid, round(sum(leminutes), 2) as 'Minutes' from lskin
	join hproduct_lskin on frn_lskinid = lskinid
	join dnis on add_lskinid = frn_lskinid
	join archive.dbo.xcall_2014 on cf_frn_dnisid = dnisid
		and tz_date between '2014-01-01' and '2014-03-01'
where frn_hproductid = 1
group by refname, lskinidRegular_time

Above is just a basic query; nothing special, just some joins. As you can see, it took 1 minute and 58 seconds to complete.

with CTE (refname, lskinid, dnisid)
as (
select refname, lskinid, dnisid from lskin
	join hproduct_lskin on frn_lskinid = lskinid
	join dnis on add_lskinid = frn_lskinid
where frn_hproductid = 1
)
select refname, lskinid, round(sum(leminutes), 2) as 'Minutes' 
from CTE c
	join archive.dbo.xcall_2014 on cf_frn_dnisid = c.dnisid
		and tz_date between '2014-01-01' and '2014-03-01'
group by refname, lskinidCTE_time

This takes the same query and separates it into 2 parts using a CTE. Changing the technique cut the time down to only 8 seconds.

declare @TempTable table (refname varchar(155), lskinid int, dnisid int)

insert into @TempTable (refname,lskinid, dnisid)
(
select refname, lskinid, dnisid from lskin
	join hproduct_lskin on frn_lskinid = lskinid
	join dnis on add_lskinid = frn_lskinid
where frn_hproductid = 1
)
select refname, lskinid, round(sum(leminutes), 2) as 'Minutes' 
from @TempTable t
	join archive.dbo.xcall_2014 on cf_frn_dnisid = t.dnisid
		and tz_date between '2014-01-01' and '2014-03-01'
group by refname, lskinidTempTable_time

Again, just a different technique (using a Temp Table) cut the query down to 8 seconds.


Those are just a few examples of query techniques in SQL. So next time you have to write a query, make sure you’re using the right ‘instrument’ for the occasion.

Functionality: Keeping Your Users Happy

Picture this: You receive a text message saying that your bank account is overdrawn and you are in the middle of procuring your life’s necessities (also known as grocery shopping). You know you have money in your savings, so you’ll just transfer $200.00 until your next payday. So you pull up your bank’s app/website and enter your username and password on the log in screen and press enter, nothing happens. How do you feel? Most likely a bit ticked off that you can’t access your hard earned money, correct?

Here is where a website’s functionality comes into play. The design of the website/report can make is rain silver and gold, but if you click a button that is supposed to log you in or load up another date on a report, the design of the website means nothing to you as a user. As a back end developer it is my job to make sure that every time you click, drag, or drop that the information you need is accessible and loads as quickly as humanely possible. To keep our users happy and smiling, I implement a functionality goal list in all of my programming projects:

  1.  When a user creates an event on a page:
    1. Does it load correctly?
    2. Does it load quickly?
    3. Does it display the correct information?
  2. If a page is to error out:
    1. Does it display a message that says we will quickly fix this issue?
    2. Does it display any coding that a user can use to hack our system?
  3. Does it load correctly on all devices? (Meaning no cut off words, etc.)
    1. Computer?
    2. Mobile Phone?
    3. Tablet?
  4. Can something be done to speed up the final queries just a little bit more?
  5. Can (2-3) users other than myself confirm that all of this works correctly as well?

If you can confirm that all of these actions pass your expectations, you will have a functional website or report on your hands.

Want to see this list in action? Check out any of our reporting pods in Car Wars, Service Setter, or any of our other platforms!

Reliability: Creating Rock-Solid Software

What makes great software great? Does it have to look good? Does it have to have a great user interface? While those things are good and will make your application pleasant, those things alone won’t make software great. Software must work, it must work without errors, and it must work every time you try and use it. Some would say that the most important aspect of software is its reliability. If an application doesn’t work, pretty visuals and user interfaces are all for nothing.

So how do we make software reliable? Reliability goes hand-in-hand with functionality, so before you even start, you must decide what the software will do and have a basic plan of how it will accomplish its task. You might even create a barebones application as a proof-of-concept to ensure your methods will work and be a solid foundation to work off of. This initial prototype will actually help in squashing initial bugs and issues that could show up later in the process.

Once you know how you want to proceed and know that it will work, you begin building your application. First and foremost, you want it to function (obviously), but once you get it to function, it must be reliable. Your customers will begin to rely on this software, so it has to work fast and it has to work every time they access it. Working every time is more important than raw speed, so we’ll start there.

Software that doesn’t work is frustrating, but what’s more frustrating is software that works most of the time, but fails you when you most need it. To ensure your application will work each and every time you need it to, you must test it, test it, and test it. Test it until it works without fail, then test it again. Play the malicious user and try to break it. You wrote the code, you know where the vulnerabilities are; exploit those vulnerabilities and learn how to plug those holes. If your app uses forms for user input, validate those inputs, first on the user interface, and again within processing.

Make sure that nothing you can do will send the application into an unusable state. Even when you are sure that there is no way that your code can be broken, add in error handling. In the case that somehow you didn’t consider all possible scenarios that could lead to an error, you must have a way to gracefully handle these rare exceptions. Handling errors can be as simple as a generic message asking the user to please try again, to the most sophisticated algorithms that predict what the user was trying to do.

Once the application works without failure, you must ensure that it runs efficiently. If it feels slow and sluggish, this will be a source of frustration for your users.  With modern programming languages, efficiency of your code base isn’t as much of a concern as it once was, but you should still strive to be as efficient as possible. Simple is usually better. You shouldn’t create some off-the-wall, complicated function to do a task that can be quickly and easily accomplished by a method built in to your language or provided in a library. In most instances, it’s a waste of time. Also, anyone who comes behind you and tries to update your code or do maintenance will need to take time to figure out what the application is doing there.

If you are pulling data from a database, then the efficiency of your database is very important in the speed of your application. Not only do the tables need the appropriate indexes and keys, but the way in which you query the information must also be efficient. Simple and quick queries are best. Get in, get the information you need, and get out so that your application doesn’t tie up valuable resources on you database server. Any complicated calculations should be handled in the application layer, not the database. You should do simple aggregates (count, sum, etc.) within the database, then hand those off to the application to do the bulk of the data processing.

Just because these steps are taken, it doesn’t mean you are finished. Software can always be improved, and as developers we should strive to always be improving.  These also aren’t the only steps to take in making your software rock-solid, there are countless other aspects in which you can improve your application. These two, speed and reliability, are great building blocks to start with, though.

The Car and Engine of the Numbers

Before I go over the engine, I’m going to explain the other parts so that you have an idea of what our engine powers. So when you look at a car the first thing that you see is the paint, rims and accents of the car. This is the wonderful product of our design team. They ensure that our users have the Ultimate user experience by designing things with the user in mind. Next is the inside of the car, all the bells and whistles! This consist of our queries and ColdFusion coding that are not really outward facing. This part is the product of a variety of teams. All of these ensure the reports, processes and data gathering are done properly. Humanatic would be the person driving the car. You don’t get from point A to point B without driving. All of our call reviews go through a system maintained by a small amount of people. The maintenance of the car is important here as well. Here our Cases team goes through anything that is going wrong and meticulously hunts it down and fixes the bells and whistles. This is the heart of the Hacker Farm. I say this because it is not the most glamorous job, but it allows people to focus on other issues and not to be pulled in a thousand directions. Now that you have an idea of the process, to the Engine. This is the part of the car that no one ever sees. Most people never see the engine or know all it does but expect it to work none the less. One last important team is the Projects team, which is in essence is our GPS. You, know Google maps because they keep us on the right path. They tell us the projects to do and keep them in flow
The engine of all of our reports, queries, processes and even our data gathering all comes from our hardware. This gets broken down into our servers, dialtag boxes, email systems, databases, system updates to name a few of the engine components. The integrity of the databases are an extremely important task that is done. Since we don’t have a DBA it comes down to the team leads and the Systems team to ensure that they are running smoothly. Our servers are monitored highly. It would be like have a thousand sensors in your engine telling you exactly what is going wrong. We get emails and have systems set up that tell us the load of the server, if and when it went down, the loads that it has had over a period of time. Even our email systems are monitored for anything that could pop up. High email flow, services that crash and block reports being sent out. Our server have hard drive failures, system updates and configurations that all have to be taken care of. The way a lot of systems are set up are that they have redundant drives to hold the system up until we can replace the drive.

Usability: Designing with Data

Last summer, I added a fancy date picker to the top of all of CI’s major reports. This date picker included buttons to quickly access data for a variety of date ranges. This feature enhanced the usability of our software, but we recently realized a problem – the date picker was taking up too much vertical space on the top of the page, pushing each report’s meaningful data down and out of view when the page loads. To remedy this problem, we have collapsed the date picker behind an expandable button – which bumps up the meat of the report upwards a good 150 pixels when the page loads.

date_picker

This was a quick change that was really just a solution to a longstanding (and rational) complaint from our CEO about the date picker being too big. But before making this change, I wanted to track the impact the change would have on our users, so I built an analytics script that tracked how often each button was being clicked within the date picker on a report-by-report basis.

After a week of data collection, I noticed a very curious trend. Of the thirty or so unique reports that we have across our various platforms, nearly 80% of all report views were for the Outbound Activity report. The Outbound Activity report is a useful report, but is certainly not one of our flagship reports.

Even more interesting, of the people who accessed the Outbound Activity report, 65% clicked the “Day” button on the date picker (which shows yesterday’s data), and 82% clicked the “Next Range” button.

I was at baffled by this data. Why are our users so obsessed with the Outbound Activity report? Why are they clicking the “Day” and “Next Range” buttons so often?

After a discussion with one of our consultants, the answer became blindingly obvious. The vast majority of our users use this Outbound Activity report to monitor their live outbound calling activity on a per-staff member basis, ensuring that everyone is meeting their daily OB call quotas.

Not having known this, we still had the default date range for this Outbound Activity report set to the previous week. Thus, our users were (apparently every hour or so) having to load the OB report, then click “Day” to load yesterday’s data, then click “Next Range” to load today’s data. Convoluted by the fact that the date picker was about to be collapsed behind a button, I realized that the process for accessing live outbound call data in our product was not at all easy. In fact, it really sucked.

Solving this problem was as simple as adjusting the OB Activity report’s date range to default to today’s date. The real challenge in the scenario was identifying that there was a problem in the first place, and this would never have happened had I not built an analytics script to gather data on our users’ behavior.

This experiment has undoubtedly inspired me to integrate user data collection into future design decisions. Improving usability always starts with understanding the user, and I look forward to using data collection to supplement my own intuition in my journey to get inside the minds of our users.

Functional. Reliable. Usable. Delightful.

Aarron Walter, MailChimp’s user experience design lead, has developed an insightful model for what goes in to creating great software. Clearly a reskinning of the famed Maslow’s hierarchy of needs, Walter’s pyramid (as seen below) is a representation of the four basic user needs that must be considered when building software.

pyramid

During my architectural studies at Texas A&M, we didn’t spend much time discussing pyramid construction, but I’m fairly sure that if the lower tiers are weak, the top tiers will crumble. That said, here is a quick breakdown of Walter’s model:

  1. Functional – Above all, software has to actually function. A button titled “Outbound Activity” must actually load a report that displays outbound calling data.
  2. Reliable – A user needs to know that the outbound calling data they are viewing is accurate and secure data.
  3. Usable – This data must be organized to be quickly accessible and easy to understand.
  4. Delightful – This is the key piece that Walter focuses on, and the piece that sets MailChimp apart from its competitors. Delightful software must appeal to the user on an emotional level, and produce an insightful and memorable experience.

Different members of Century Interactive’s dev team focus on different tiers of this pyramid within our products, and so I am going to kick of a short mini-series written by our different team members, focusing on how we strive to improve our functionality, reliability, usability, and delightfulness.

Here are the links to the posts for each topic:

Intro to Network Programming – Part 1 – Overview and TCP connections

This is the first post in a series on Network Programming.

For part two of the series click here.

Have you ever wondered how data travels over the internet?   Ever pondered how most of the cool technologies you use like Dropbox, Bittorrent, Email, web servers, works on a lower level?

The answer is via User Datagram Protocol (UDP) and Transmission Control Protocl (TCP) sockets.

Honestly, the Wikipedia pages for UDP and TCP probably describe them better than I could, but the basic gist is that UDP is connectionless and does not have error correction and reliable transmission built into it and that TCP has the concept of connections, error correction, and reliable transmission, all at the cost of speed.

We are going to use the python programming language to examine sockets in this series on network programming.

Consider the following program:


#!/usr/bin/env python
# server.py

from __future__ import print_function
import socket

def main(*args, **kwargs):
    server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    server_socket.listen(5)
    print(server_socket.getsockname())
    server_socket.shutdown(socket.SHUT_RDWR)
    server_socket.close()

if __name__ == '__main__':
    main()

Line 8 creates the server socket by calling socket.socket.  The python documentation for this method describes this in more detail, but basically we’re saying that we want an IPv4 TCP socket.

Line 9 then tells to socket to listen for connections and to allow a connection backlog queue with no more than five connections (see docs).    This doesn’t mean that you can only have a maximum of 5 connections (more on that later).

Normally when you run a server you want to specify the IP address and port which it listens on, but for now we don’t care about that.   If you call socket.listen without calling socket.bind, then your operating system will select an open port for you to use.

Line 10 prints out the IP address and port number the operating system gave us.

I imagine lines 11 and 12 probably look a bit redundant, but there are very subtle differences between socket.shutdown and socket.close.  This stack overflow post describes it in detail (skip over the answer which was marked as best and look at the one with the most upvotes instead.  IMHO that is a better answer)

Essentially server_socket.shutdown(socket.SHUT_RDWR) sends the IP packets necessary to tell the other end of the connection that server_socket will no longer be receiving or sending any data over this connection and server_socket.close() deallocates the reference to the socket from memory.  A common annoyance for people starting to do network programming using TCP is that when they “shut down” their servers they can’t restart them because the socket they requested is “already in use.”  Generally most people don’t realize how to correctly close a socket connection and the connection “stays locked” after they close their program.

So let’s make a simple, single-threaded chat server:

#!/usr/bin/env python

from __future__ import print_function
import socket

def main(*args, **kwargs):
    print('creating server socket')
    server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    print('done. now binding to address')
    server_socket.bind(('0.0.0.0', 54321))
    print('done. now listening for connections')
    server_socket.listen(20)
    try:
        while True:
            client_socket, client_address = server_socket.accept()
            print('new client connected from: ', client_address)
            content = client_socket.recv(100)
            print('#### new data received ####')
            print(content)
            print('#### finished receiving data ####')
            client_socket.shutdown(socket.SHUT_RDWR)
            client_socket.close()
    finally:
        server_socket.shutdown(socket.SHUT_RDWR)
        server_socket.close()

if __name__ == '__main__':
    main()

This time we do want a specific IP and port so in line 11 we bind to all ip addresses on this physical machine (that’s why we use the ip address 0.0.0.0) and we request port 54321. If you have some other program running which is already bound to this port, then server.py will error out at this point.

Notice that basically everything we do after we start listening on our server socket is within a try/finally block. This way, if anything bad happens within the try block, we will still gracefully shutdown our server using socket.shutdown and socket.close.

Also note that we’re doing an infinite loop here. That way we’ll be able to continue to keep getting connections even after our first one is finished.

So within our infinite loop, all we do is accept a new connection (line 16), receive a maximum of 100 bytes of data from the connection (line 18), and then close the connection (lines 20 and 21).

So go ahead and run this program. You’ll see that it printed out everything until the “done. now listening to connections” and that it doesn’t appear to have done anything after that.

This is because we are using blocking sockets (I’ll touch on that a bit later). Essentially our program is “stuck waiting” on line 16, or rather it’s waiting for something to connect to it. This program will literally wait forever for a new connection (because we did not specify any timeouts on this blocking socket). Don’t worry I’ll get into timeouts later as well.

So let’s get a client together to talk to it:

#!/usr/bin/env python
# client.py

from __future__ import print_function
import socket

def main(*args, **kwargs):
    client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    client_socket.connect(('0.0.0.0', 54321))
    client_socket.send('the rain in spain stays mainly in the plain')
    client_socket.shutdown(socket.SHUT_RDWR)
    client_socket.close()

if __name__ == '__main__':
    main()

Notice that this looks very similar to the server but does a few things differently. On line 9 we connect to server.py (it’ll throw an error if it can’t connect, say if you closed your server script) Then we send the text “the rain in spain stays mainly in the plain” and close the connection.

Go ahead and run it and you’ll see your server print out “the rain in spain stays mainly in the plain.”

Because of the infinite loop in server.py, you can run client.py over and over again and server.py will “serve” them all.

But remember from server.py that we only read 100 bytes, so what happens if you change up client.py to send more than 100 bytes?

#!/usr/bin/env python
# client.py

from __future__ import print_function
import socket

def main(*args, **kwargs):
    client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    client_socket.connect(('0.0.0.0', 54321))
    client_socket.send('banana '*100)
    client_socket.shutdown(socket.SHUT_RDWR)
    client_socket.close()

if __name__ == '__main__':
    main()

When we sent the string “banana ” repeated 100 times we got an error.

  File "./client.py", line 16, in <module>
    main()
  File "./client.py", line 11, in main
    client_socket.shutdown(socket.SHUT_RDWR)
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
socket.error: [Errno 107] Transport endpoint is not connected

The error is because our server only read out the first 100 bytes we sent (about 14 bananas) and then closed the connection. I’m sure you’re thinking “why don’t we just increase the number of bytes read?” We can, but eventually we’ll want to be able to read any amount of data.

But that’s for Part Two of this series.