Importance of Information in Application Log and how to make it Effective: A Guide for Developers

Summary 

In the world of software development, building applications is not just about developing functionality; it’s about creating systems that are robust, reliable, and maintainable. One essential tool in achieving these goals is application logging.  

In this article, we’ll delve into what level of information should be added in application logs, its significance, and why it’s crucial for developers to master this skill. The focus of this article will not be on how to setup logging using log4j and ELK etc. because there is plenty of great documentation available on these already.  However, there are very few examples on what information should be logged, what is its impact on business and on stakeholders involved. We’ll therefore explore the various aspects of effective logging and provide examples to illustrate the concepts. 

Target Audience: Software Developers & Engineers 

Application logging is particularly relevant for software developers and engineers. Logging becomes an integral part of their daily coding work. It’s not just about writing lines of code; but also creating a system that is both functional and maintainable over time. 

Additionally, architects or team leads seeking to guide developers and establish effective logging practices will find this content valuable. 

What is Application Logging? 

Application logging involves recording information about how a specific application or system behaves. For instance, in the insurance industry, these logs can show when a policy was created for a new customer or if there were issues collecting payments or processing customer claims. These logs capture real-time details about how things are happening within the system. 

  • Logs provide runtime insights into application behaviour. E.g., in the insurance domain, they can track policy creation, payment collection issues, and claim processing. In a healthcare domain they can track whether sugar levels in blood are recorded, or if there was a problem accessing Electronic Health Record (EHR) of a patient etc. 
  • Logs are essential for understanding, troubleshooting, and enhancing system behavior. 
  • The main goal is to prompt action: to understand or resolve issues within the system.
  •  There is a difference between print and log statements; see the next section to know the exact difference and when to use what. 

Difference between Printf/Print and Application Log Statements: 

`printf` statements do not add entries to application logs in the traditional sense. printf is used in programming language like C or C++, to write output directly to the standard output (often the console or terminal) during the program’s execution. These outputs are generally intended for debugging and development purposes and are not typically part of a structured logging system. 

On the other hand, application logs are a more formalized way of recording events, messages, errors, and other important information that occur during the execution of a program. These logs are usually stored in files, databases, or specialized logging systems, making them accessible for monitoring, analysis, and debugging. 

While both printf statements and application logs provide insights into a program’s behaviour, they serve different purposes and have distinct characteristics: 

  • printf statements: These are used for immediate feedback and debugging during development. They are often temporary and not intended to be a part of the application’s long-term logging strategy. printf outputs are typically displayed in the console where the program is executed. 
  • Application logs: These are systematic records generated by an application to provide a detailed account of its activities, errors, and events. They are structured and usually contain additional information such as timestamps, severity levels, and context. Application logs are crucial for ongoing monitoring, diagnosing issues, and gaining insights into user interactions. 

In summary, printf statements provide immediate outputs during development, while application logs offer a structured and organized way to record and analyse the behaviour of an application in various environments, including production. 

Current Context and Why it’s Important to Know? 

In the world of application logging, context is akin to the background story that makes sense of every log entry. Let’s break it down simply: 

  • What Does Context Mean?
    • Context provides a background or framework that helps to understand and interpret the meaning of a particular situation, statement, or action. Context plays a crucial role in communication and understanding, as it provides the necessary information to make sense of the information or behaviour being presented.
    • As an example, in the insurance domain, imagine a system that handles policy rates, quotes, and issuance. Here, the act of creating or issuing a policy is a context, further enriched with account and business unit details. In another scenario, like in the field of medical technology, a context could be the AI model’s training data comparison module. For instance, the context could be a trainee’s data from an input Excel file, or an X-ray scan image being processed. Context can also encompass the environment in which actions occur, such as “PROD” for production or “QA” for quality assurance. 
  • Why Should Developers Care About Context?
    • Imagine a scenario where an error alert pops up due to a log entry. The first question that arises is, “What business functionality is impacted, and to what extent?” Here’s why context matters: 
    • If developers know the complete context, they can add precise details to error logs. For instance, if a policy issuance service fails, a log might read: “Policy issuance failed: Policy ID 5656, User ID 1234, failed due to an HTTP 503 response code from Issuance service.” 
    • With this level of detail, troubleshooters understand that user 1234 faced an issuance problem, potentially resulting in lost business. 
    • Additionally, a 503 response from the issuance API indicates a broader issue—other policy issuances might fail too. The impact is more significant. 
    • A 503 could signify unexpected load causing service unavailability. DevOps might need to intervene, restarting containers or pods, and adjusting resource pool sizes. 
    • If fixing the issue takes time, the policy issuance service might be paused to prevent further requests. Kafka consumers handling message queues might be halted, and an alert sent to the relevant user group. 
    • Once the problem is resolved, the failed policies need to be reprocessed. 

All these actions can occur simultaneously, facilitated by the context-rich message. Context provides the necessary background to swiftly understand, assess, and address critical situations. 

When Is This Logged Information Used? 

  • Triaging Production Issues Quickly – This stands as the most critical and prevalent application of logged information. Whenever a problem arises, the first question is usually “What exactly happened?” Log data provides this crucial insight. Once the issue is identified, finding a solution is generally more straightforward. 
  • Proactive Alerting and System Monitoring – Modern tools and frameworks like ELK and Splunk are intelligent; they can extract alerts and insights from logs to establish comprehensive observability. In today’s cloud-driven landscape, businesses run vast systems comprising thousands of services. Proactively addressing potential problems through effective monitoring prevents significant business losses and safeguards credibility. 
  • Efficient Debugging – The pace of development has accelerated, with production-ready Minimum Viable Products (MVPs) being deployed frequently. This speed necessitates quick issue resolution even in development or testing environments. Logs play a crucial role in pinpointing problems quickly and ensuring the agile development cycle remains unobstructed. 
  • Analysing Historical Events & Patterns – Logs also serve as a valuable resource for improving system performance and functionality. Sometimes, gaining insights from historical events is necessary to enhance the system’s overall value. This information aids in fine-tuning processes, making data-driven decisions, and enhancing the user experience. 

Why is it so Important to Effectively Log Information?

1. Business Impact – Every line of code written, and every piece of information generated has a direct or indirect impact on the business. 

  • Clear debug and info level logs enable new team members to understand the system’s behaviour without needing extensive guidance from existing team members. This saves time and money on knowledge transfer efforts. This can also be used in building observability and in turn useful insights on user behaviour and pattern. 
  • In case of a production issue, detailed and accurate information expedites issue resolution. This helps control potential losses in new business and reduces the effort required to fix failed transactions or partially processed data. 
  • While providing sufficient context is crucial, it’s equally vital to avoid adding sensitive information like personal details or passwords. Revealing such information could negatively impact the business’s reputation and brand. 

2. Helps New Team Members in Debugging  

  • In today’s dynamic job market, employees change roles frequently, resulting in a continuous flow of new team members. These newcomers might take longer than seasoned experts to comprehend the system. Detailed logs can expedite the process of identifying problem areas. 

3. More Details Required in Today’s Rapidly Changing Technology 

  • The pace of technological advancement is rapid. Previously, monolithic services gathered most logs in one place. With microservices and cloud technologies, numerous microservices are distributed across multiple teams. Each team owns specific microservices with limited insights into others’ code. 
  • Having the right level of information aids in quickly identifying the owner of a problem area and promptly resolving issues. 
  • In complex transaction flows involving multiple microservices (ranging from two to twenty or more), tracing problems becomes challenging. Business transaction IDs and component correlation IDs play a vital role in connecting the dots. Imagine an issue in the 20th microservice causing failure in the Previous 19 services. Delaying problem identification increases log volumes exponentially, making the process even more intricate. 

What makes Information in Logs Useful? 

  • Context: Always provide relevant context in the logs to understand the situation. E.g., for python a script (which can be run multiple times), that loops through list of JSON files and creates an Excel file from it, might ignore JSON files having missed required files. But see below an example for 100 JSON files; the difference that is made just because message has more context: 
MessageEfforts required
JSON not valid.Worst case! The user will need to check all the 100 files for error or find out difference between all JSON files vs successful ones in output Excel file.
File ‘policy_123.json’ not valid.The user knows which file has the error, so it is easier to browse, but the user still needs to remember what the required fields are and if not documented properly, then needs to read the code.
File ‘/user/user_name/scripts/input/policy_123.json’ JSON missing required fields. Check policy_nbr and issued_dt fields has valid values.The user can copy the JSON file path and open it in an editor and check and fix policy_nbr and/or issued_dt fields value.
  • Choose the right Log Level: When it comes to logging, selecting the appropriate log level is akin to choosing the right tool for the job. Different scenarios call for different levels of detail and urgency. In modern systems, production environments typically have only Warning and Error levels that are enabled. Not selecting the right level means not finding critical information on time.  

Let’s delve into when each log level should be used: 

LevelDescriptionExample Scenario
DEBUG / DebuggingWhen you need intricate insights into your application’s inner working. This level is perfect for developers during development and debugging phases.During fine-tuning code and when detailed information is needed about variable values, function calls, and process flows.
INFO / InformationThe INFO level is your go-to for general operational information that helps track the system’s state and events.Logging when a user successfully logs into the system, or when a new module is initialized.
WARN / WarningWhen something unexpected occurs, but the system can recover or continue functioning, employ the WARN level.Logging when a minor error occurs, such as when a version of a library becomes deprecated. As an owner of that reusable library, inform the consumers to migrate to a newer version.
ERROR / ErrorUse the ERROR level when an issue arises that impacts the system’s functionality or operation. These logs typically indicate problems that need attention. Always add actual exception and stack trace to error logs.If custom exception needs to be raised, then add original exception as inner exception.Logging when a critical API call fails or when data processing encounters an unrecoverable error.
FATALThe FATAL level is reserved for the direst situations, where the application cannot proceed and might even crash. This level demands immediate action.Logging when a major component fails, causing the application to halt completely.
  • Handle sensitive information carefully – In the realm of software development, safeguarding data privacy and adhering to compliance regulations is not just a best practice—it’s a critical responsibility that holds significant consequences when overlooked. As developers, it’s crucial to understand the implications of logging sensitive information like personal or business data and the importance of adhering to regulations like General Data Protection Regulation (GDPR) and Personally Identifiable Information (PII). 
    • Instead of logging an email address like “user@example.com, “mask” it as “u***@e*******.com.” This way, the functionality is retained without exposing sensitive data. 
    • As developers, it’s our duty to protect data privacy and comply with rules. By smartly avoiding sensitive data in logs and sticking to regulations, we contribute to secure applications that respect user privacy and the law. 
    • Research from sources like the IBM Cost of a Data Breach Report quantifies the average cost of a data breach at around $4.45 million in year 2023, an increase in 15% over 3 years. This cost includes not only financial implications but also the long-term damage to an organization’s reputation. 
    • Incorporating secure logging practices significantly reduces these risks. It safeguards both user data and the organization’s integrity. 
  • Limit The Amount of Logs To Necessary Details Only: Make sure useful logs are added and don’t overwhelm the system with too much information that will never be used or is too obvious. It becomes very difficult to narrow down to the problem area when there is too much information. Also, storage is a big concern if there is too much information logged. There is a high chance of losing some critical logs if the logs are deleted due to the cleanup policy duration or size.  

Also, on mobile and IoT devices, the space is always limited and they could run out of space. 

  • Add ‘When & Where’ Details: Tailor the log detail to the audience: developers might need more details than “something happened”. Most of below is easily configurable with latest logging frameworks and typically done at project setup level. The developer doesn’t need to do anything additional than calling logger.log (…). But knowing about this is very helpful and even if working in an isolated development like some script, one can still configure below with standard logging framework provided as out-of-the box feature: 
    • Date & time – 2023-07-25 18:28:43,682 
    • Environment name: DEV2 or PERF or PROD 
    • Service or application name with version: policy_v1.3.3 
    • Component or Class name: dashboard_controller or validator etc. 
    • Debug level: DEBUG or WARN or ERROR etc. 
    • Trace Ids: Transaction ID or Component correlation ID: GUID {3d67b59c-ff8e-4bba-9d93-d0690cad0f76} 

Look at the examples below: 

For ease of reading in other than DEBUG examples below information will be called ‘TIME_AND_ENVIRONMENT’. Please note that the modern logging framework takes care of this information in setup process and these is no need to add it separately:  

2023-07-25 18:28:43,682 – ENV – SERVICE_WITH_VER – CLASS_NAME – 3d67b59c-ff8e-4bba-9d93-d0690cad0f76 

DEBUG Level 

  • 2023-07-25 18:28:43,682 – PERF – dashboard_v1.2.0 – dashboard_controller – 3d67b59c-ff8e-4bba-9d93-d0690cad0f76 – DEBUG –Performance: Dashboard loading time for user 67890 – 2.5 seconds.  
  • 2023-07-25 18:28:43,682 – DEV2 – policy_v1.3.3 – policy – 3d67b59c-ff8e-4bba-9d93-d0690cad0f76 – DEBUG Policy status update: Policy P23456 changed to “Under Review” due to claim review.  
  • 2023-07-25 18:28:43,682 – DEV2 – policy_v1.3.3 – policy – 3d67b59c-ff8e-4bba-9d93-d0690cad0f76 – DEBUG Premium calculation request received for policy P98765. Parameters: {…}  
  • 2023-07-25 18:28:43,682 – QA – policy_etl_v3.0.0 – validator – 3d67b59c-ff8e-4bba-9d93-d0690cad0f76 – DEBUG Data synchronization: Detected and resolved 5 duplicate policy entries.  

INFO Level 

  • TIME_AND_ENVIRONMENT – INFO – Claim approved: Claim ID C54321 processed and approved for user Jane Smith.  
  • TIME_AND_ENVIRONMENT – INFO – New product added: Product ID PR123 added to the catalogue – “Family Health Plan”.  
  • TIME_AND_ENVIRONMENT – INFO – Batch job complete: Renewal status updated for 1000 policies.  
  • TIME_AND_ENVIRONMENT – INFO – User authenticated: User ID U789 authenticated for policy management.  

WARN Level 

  • TIME_AND_ENVIRONMENT – WARN – Incomplete claim submission: Missing documentation ‘Proof of Loss’ for claim ID C987. An email has been sent to registered email Id notifying the same.  
  • TIME_AND_ENVIRONMENT – WARN – Low disk space: Available storage capacity is critically low. Disk space on /data partition is below 5% capacity. Immediate attention required to prevent service disruption. Suggested Action: Identify and delete unnecessary files, expand storage capacity, and monitor disk space regularly. 
  • TIME_AND_ENVIRONMENT – WARN – API rate limit exceeded: unusual surge in requests detected. IP address 123.45.67.89 exceeded the rate limit of 100 requests per minute. Requests have been temporarily throttled. Suggested Action: Monitor API usage for suspicious activity, investigate the cause of the surge, and consider implementing rate limiting and IP blocking mechanisms. 
  • TIME_AND_ENVIRONMENT – WARN – Excluded from processing due to missing data. Account ID is missing in file ‘reinsurance_may_2023.xlsx’ at cell ‘C16’, Premium Amount $10,000.00 found on this row.  

Suggested Action: Suggested Action: Add right account ID and reprocess the file. 

ERROR Level 

  • TIME_AND_ENVIRONMENT – ERROR – Flight search failed: Unable to retrieve data from API endpoint. User ID 567 initiated a flight search for route XYZ on date 2023-08-15. The request to API endpoint http://api.example.com/flights failed with a 404 response. Suggested Action: Verify the API endpoint URL, ensure it’s accessible, and check if there are any changes in API documentation or version. Exception…. 
  • TIME_AND_ENVIRONMENT – ERROR – Payment processing failed: Unable to establish a database connection. User ID 123 attempted to complete checkout for Order ID 789. Failed at step 3 of payment processing due to database connection issue. Suggested Action: Check database server status, validate connection credentials, and ensure the database service is running. Exception…. 

FATAL Level 

  • TIME_AND_ENVIRONMENT – FATAL – Database corruption detected: System cannot proceed. An internal consistency check failed while reading data from the database. Last successful database backup was performed on 2023-08-01. Suggested Action: Immediately halt system operations, restore the last known good backup, investigate the cause of corruption, and consider engaging database recovery experts.  
  • TIME_AND_ENVIRONMENT – FATAL – Security breach: Unauthorized access to user accounts detected.  Multiple failed login attempts originating from IP address 100.45.6x.x9. Affected user IDs include: 567, 789, 901. Immediate action required to prevent further unauthorized access. Suggested Action: Block the suspicious IP address, reset passwords for affected user accounts, and initiate a security audit to identify vulnerabilities. 

Do’s & Don’ts 

Do’s 

  • Logging information is very critical, practice it properly, always. Consider adding this to “Definition of Done” in the agile process. 
  • Understand that there will always be a business impact of any information that is logged. A useful information can save thousands of dollars’ worth of business impact in the event of a high impact production issue. 
  • Add necessary context to understand the situation and the problem. 
  • Always add stack trace to exception scenarios. i.e., ERROR or FATAL logs. 
  • Always select appropriate severity and log level. Production typically has warn+ severity of logs. Do ask yourself whether info should be used or if warn is warranted. Then think about how the system reacts based on each on of this will help in decision making. 
  • Add ‘suggested action’ that needs to be taken (if any). 
  • Understanding how the information is logged will be used and how the current setup has been done is also crucial. For example, timestamps and correlation IDs in microservices architecture and log files size and name etc. 

Don’ts

  • Don’t log redundant information and keep the disk storage limitation in consideration. A log cleanup policy may remove critical information after 2-3 days due to total log size. Also, too much information increases the overall time required to find the problem. 
  • Don’t log sensitive information like personally identifiable information etc. and use masking techniques wherever required. Make sure the guidelines are followed for all applicable compliance requirements like GDPR, HIPPA and PII etc. 
  • Though not directly related to logging and more of an exception handling aspect, don’t throw stack traces to customer response if not required and keep exceptions on server end. 

Conclusion 

Effective application logging is a skill that every developer should master. It’s not just about recording information; it’s about creating a robust system, providing valuable insights, and enabling efficient issue resolution. By understanding the context, choosing the right log levels and detail, and leveraging appropriate tools, developers can build applications that are not only functional but also maintainable in the long run. So, embrace the power of application logging and elevate the development journey! 

About the Author 

Hemant Sharma works as a Senior Technical Architect at Zimetrics, with extensive experience in building software systems of different scale across domains. Passionate about clean, object-oriented code, he advocates for best practices and enjoys mentoring emerging software engineers to excel in the field. 

SHARE
dev

You may also like to read

Get the latest Zimetrics articles delivered to you inbox

Stay upto date with Zimetrics