At Semetrical our SEO specialists have undertaken countless technical SEO audits over the years, and have come across common technical issues that websites suffer within multiple industries. Our guide outlines the most common technical SEO issues with recommended solutions.
Below lists the most common technical SEO issues:
When undertaking technical SEO audits, we often find that disallow rules in the robots.txt are not catering for both uppercase and lowercase rules.
For example, on ecommerce sites the basket paths often run off both /basket/ and /Basket/, but only the lowercase path is included as a rule in the robots.txt. This means that the URLs with /Basket/ would still be indexable and that would cause content duplication, which you must avoid for improved indexation of your website on search engines.
Audit your website and check if there are both uppercase and lowercase versions of a path that needs to be blocked. You can do this using a web crawler, such as our friends at DeepCrawl. If there are both versions active on the website, add a second rule in the robots.txt to cater for the uppercase path to be blocked. For example, Disallow: /Basket/*
If you don’t have access to a web crawler, then a site protocol search can be very useful to see if both uppercase and lowercase versions are being indexed.
A common issue we find is the duplication of case insensitive URLs being linked to throughout a website and Google sees these are two different URLs. For example:
This may occur due to editors on a blog post adding a direct link to a product page but they have typed in an uppercase letter instead of lowercase letter.
We have also seen this happen due to internal linking modules having a bug where popular product links are linked to via uppercase letters.
We would recommend setting up a rule at server level where all uppercase URLs redirect to lowercase via a 301 redirect. This will safeguard the website from any future duplication where both an uppercase and lowercase URL are being linked to.
Adding a 301 redirect rule will also consolidate any link equity where an external site may link to your site by mistake via an uppercase letter.
If a 301 redirect is not possible then we would recommend adding a canonical tag in the source code of the uppercase URLs to reference the lowercase URL version.
Companies often migrate their website to secure HTTPS URLs but they do not always implement a 301 redirect rule, and instead implement a 302 redirect, so this in theory tells search engines that the HTTP version of a URL has only temporarily moved instead of permanently. This can reduce the link equity and overall authority of your website as the HTTP URLs that have acquired backlinks over time will not fully pass over the link equity to the HTTPS version unless a 301 redirect is in place.
We would recommend setting up a rule at server level where all HTTP URLs 301 redirect to the HTTPS version.
On a number of ecommerce websites we have seen products having multiple product URL variations but each variation linking to a canonical product URL to prevent duplication. However, the canonical product page can only be found via canonical tags and no other internal links.
Additionally the canonical product page does not include any breadcrumbs which impacts the internal linking across the website.
This internal linking canonical setup has on occasions prevented search engines from picking up the canonical URL version due to ignoring the instruction because the internal links throughout the site are sending mixed signals. This can result in the non-canonical versions of products being indexed which causes URL cannibalisation – ultimately negatively impacting your SEO performance.
To help the canonical URLs to be indexed, websites should:
Add the canonical URLs to the XML sitemap and not the other URL variants
Internally link to the canonical URL versions within site-wide internal linking modules such as “popular products”
Add a primary breadcrumb structure to the canonical URL page.
Canonical URLs occasionally reference 404 URLs but this sends mixed signals to search
engines. The canonical URL is instructing a crawler of the preferred URL to index but the preferred URL currently does not exist anymore.
Firstly, you should establish if the canonical URL should be a 404 or if it should be reinstated. If it is reinstated then the issue is fixed, however if the canonical URL should be a 404 then you should pick a new canonical URL or update the canonical to be self referencing.
In the HTML code of a webpage there sometimes could be two canonical tags found. This can send conflicting messages to a search engine and only the first canonical will be counted and used.
Some website crawlers may flag multiple canonical tags, however, if this is not the case then you should set up a custom extraction when crawling the site to look for multiple canonical tags.
Web pages with multiple canonical tags in the HTML code need to be updated where one gets removed and only the correct canonical tag remains.
Websites occasionally have multiple homepage URLs which causes duplication and can cause a split of link equity. Common homepage duplication URLs include:
If your website has multiple homepage URLs, we would recommend setting up a 301 redirect where all duplication versions redirect to the main homepage version.
Mobile sites should contain the same content as the desktop version of a website. When undertaking website audits and comparing desktop to mobile website crawls, we have come across content differences where the mobile version contains less content than the desktop version on certain pages.
This can cause issues because almost all indexing of a website comes from the mobile version and if priority content is missing, rankings may start to drop.
The mobile version of a site should contain the same content as the desktop version and missing content should be added to the mobile website.
For websites that have implemented geo IP redirects, the most common issue is that the implementation redirects for all users, which includes bots.
Googlebot will usually crawl from a US IP and if bots are being redirected based on geographical location then Googlebot will only crawl and index the US version of a website. This will prevent other geographical versions of the site from being crawled and indexed.
Additionally this can cause issues for product pricing Schema markup on Ecommerce sites where pricing is updated based on geographical location as only the US price will appear in all markets. For example, the below snippet shows US pricing coming through on the UK version of a website within the UK.
If you need to implement geo IP redirects then we would recommend excluding all bots from the redirect rules, as this will allow bots such as Googlebot to crawl and index all international versions.
This is a useful UX feature if a user has landed on the incorrect international website version. The pop-up will appear based on IP detection, for example, if a user lands on the US website from a UK IP the banner will appear telling the user the UK site may be more suitable.
It is common to see multiple versions of a website when companies operate in different countries around the world. This is common practice as ideally you want to provide the best user experience and to do this, country specific websites enable companies to tailor the user journey based on where the user is in the world.
However, companies can make the mistake of creating multiple versions of their website but don’t send any signals to search engines to indicate which website should target a specific country or region.
When website owners create multiple site versions with no instructions for search engines this can cause chaos such as website duplication and cross domain cannibalisation.
When creating international versions of your website, Hreflang tags should be used to help signal to search engines such as Google the correct webpage to serve to a user based on their location and language.
Hreflang tags also prevent international versions of a website being seen as duplicates to search engines as the Hreflang tag essentially indicates that a specific page is needed to serve a user in X location with X language setting.
Setting up and mapping out Hreflang tags can get confusing and is a big task depending on the size of your website. If set up incorrectly, it can be detrimental to your website traffic.
Please visit our international SEO services page if you are in the process of planning an international website expansion or are having issues with your international websites.
An interesting issue we come across more often than you would think is websites having old URLs in their XML sitemaps or staging URLs somehow squeezing themselves into an XML sitemap.
This can cause issues as if staging URLs appear in your sitemaps and your staging site may not be blocked by search engines, these URLs could start to be indexed and in turn cause unnecessary duplication.
Historical URLs in your sitemap that now serve a 4xx or 3xx status code can send confusing signals to search engines on which pages you want crawled or indexed.
Make sure to audit your XML sitemap on a regular basis by keeping an eye on the Search Console and monitoring errors that appear or set up a regular crawl in a tool such as Deepcrawl.
Setting up a regular crawl of XML sitemaps in Deepcrawl is very useful as this can quickly flag any URLs that should not be appearing in your sitemap and enables you to keep on top of this potential issue.
Surprisingly, a number of companies have their staging websites indexable to search engines such as Google, not on purpose but by mistake. This can cause significant duplication as the staging website usually will be a replica of your live environment. From doing a simple URL protocol search on Google there are millions of staging webpages live and indexable.
At Semetrical, we would recommend adding an authentication layer where you need to enter a username and password in order to access the staging website. Adding a disallow rule is also an option to prevent staging environments from being indexed, however it is better to implement this if the staging site has not already been indexed. For example:
Most website crawler tools have a robots.txt overwrites functionality in place so you can easily override the disallow rule when conducting tests on your staging environment.
Internal search URLs on websites can be great for SEO where it allows websites to rank for hyper-long tail search queries, or to rank for keywords where they do not have a main URL to rank.
However, in a lot of cases internal search pages can cause a lot of duplication on websites and can also cause crawl budget issues on large scale websites. For this guide we will focus on the negative side of internal search.
Internal search pages are usually very low quality as they will not be optimised and on a lot of occasions be classified as thin content as they will house a low number of results such as products.
Before deciding to block internal search pages it is advised to check that these pages currently do not rank for any keywords or bring in regular traffic.
Additionally check that these URLs have not built backlinks over the years. If your internal search pages have no authoritative backlinks and don`t generate organic traffic then at Semetrical we would recommend two steps:
Step One: Add NOINDEX,FOLLOW tags to all search pages to allow search engines to de-index thes pages. Once these pages have been de-indexed over a few months we then would implement step two.
Step Two: Add the internal search directory to the robots.txt file such as Disallow: */search*
Sort and filter parameter duplication can be a common issue when auditing websites. Lots of websites will use filters as it can enhance the user experience and allow users to filter down their search results. However, the main issue is when websites keep filters indexable as this generates a significant amount of duplication across the website. For example:
Occasionally we will come across websites that add tracking parameters to the end of URLs on internal links to indicate where in the site that link was clicked on. We would not recommend this setup in the first instance however, when sites already have this in place it can cause a lot of duplication on a website as it can create multiple versions of the same page. For example:
Another common tracking parameters that can cause duplication are UTM tracking parameters where links are being used for specific campaigns in order to track how the campaign has performed. For example:
There are a number of ways to prevent parameters being indexed and causing duplication, these include:
Canonicalising the parameter URL to the clean URL version
Adding a rule in the robots.txt file to disallow specific parameters
Adding parameters to the URL parameters tool in Search Console which signals to Google that certain parameters should not be crawled.
On e-commerce websites product URL duplication can be a big issue as well as on publisher websites. The main reason for product URL duplication is because products can inherit the category/sub category in its URL structure and if the product sits in multiple categories/subcategories then multiple URLs are therefore created.
On publisher websites, documents can also sit in multiple areas and if the document URL inherits the document location then multiple versions are created. For example:
When we come across duplication like this there are various ways in cleaning it up so we can make sure the correct URL version is crawled and indexed.
To fix the URL duplication we would recommend canonicalising all product URL variants to the parent or to a generic version. For example:
Parent canonical example
would canonicalise to:
Generic canonical example:
Would canonicalise to
If you have access to developers, then an alternative solution would be to internally link to product canonicals throughout the website and 301 redirect all product URLs that run off category/sub-categories to the generic canonical product URL.
This would stop product duplication and enable you to link to products via multiple routes
Page depth is the number of clicks a specific page is from the homepage of a website. When conducting website audits, we come across websites that have a website depth greater than 10. That means these pages are 10 clicks away from the homepage!
The more clicks needed to find a web page the harder it is for a search engine to find that URL and it is more likely that URL will not be revisited as often as pages higher up in the website.
Additionally, the higher a page is within your website architecture the higher the chance it will be seen as a priory page by search engines. If priority pages are lower down in the architecture, there is a risk that it will not rank as well.
The main ways to improve website depth and to make sure priority pages are high up in the website architecture include:
Internal linking across the website such as recommended products, related products and featured pages
The use of breadcrumbs across the website
Setting up pagination where it includes first, last and the two result pages either side of the page you are on
Conducting keyword research to uncover top level category pages that should be linked within the main navigation of a website and adding links to priority pages
Often we see that the “you also may like ” modules on e-commerce product pages cannot be seen by search engine crawlers, making the internal linking module redundant.
Our technical SEO team has audited websites and uncovered that NOINDEX tags have been added to the source code of pages by mistake. Additionally, seen pages that historically brought in traffic having a NOINDEX tag in place.
Surprisingly an issue that can happen more often than you would think is developers pushing staging environments live with the NOINDEX tag still present in the source code.
Ultimately the NOINDEX tag will tell search engines not to index the page and will prevent the page from showing up in search results.
If you come across pages that have a NOINDEX tag in place when auditing a website and its not clear on why the tag is in place then check with the development team to see when and also why those pages include the tag.
If a NOINDEX tag has been added by mistake then you should ask developers to update the source code and remove the tag completely or update it to read <meta name=”robots” content=” INDEX, FOLLOW”>
A soft 404 page should not exist on a website, it happens when a non-existent page which should return a 404 status code returns a 200 OK status code. If 404 pages return a 200 status code they can still be crawled and indexed.
This ultimately is an issue as search engines such as Google can waste time crawling these pages which provide no value wasting crawl budget instead of focusing time on valuable pages. These pages can also create duplicate issues on a website, especially if a website has 1,000s of soft 404 pages showing a “page not found” message.
There are a few different ways to find soft 404 pages which include:
Visiting Search Console where it flags soft 404 pages
Crawling your website and looking out for 200 status code pages with title tags of “Page Not Found”
Crawling your website with a custom extraction which looks for the body copy message that is present on 404 status code pages and any 200 status code page with that message should be a soft 404
If you come across soft 404 pages on your website there are a couple of solutions that can be implemented, these include:
301 redirect soft 404 pages to an appropriate alternative page if available
Change the status code of these pages to a 404 or 410 status code but check that no link equity will be lost.
If you are facing issues with your website or are needing a technical SEO audit please visit our technical SEO services page for more information on how Semetrical can help out.