[Solved] How to remove illegal characters from path and filenames?

I need a robust and simple way to remove illegal path and file characters from a simple string. I’ve used the below code but it doesn’t seem to do anything, what am I missing?

using System;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string illegal = """M<>""a/ry/ h**ad:>> a/:*?""<>| li*tt|le|| la""mb.?";

            illegal = illegal.Trim(Path.GetInvalidFileNameChars());
            illegal = illegal.Trim(Path.GetInvalidPathChars());

            Console.WriteLine(illegal);
            Console.ReadLine();
        }
    }
}

Solution #1:

Try something like this instead;

string illegal = """M""a/ry/ h**ad:>> a/:*?""| li*tt|le|| la""mb.?";
string invalid = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());

foreach (char c in invalid)
{
    illegal = illegal.Replace(c.ToString(), ""); 
}

But I have to agree with the comments, I’d probably try to deal with the source of the illegal paths, rather than try to mangle an illegal path into a legitimate but probably unintended one.

Edit: Or a potentially ‘better’ solution, using Regex’s.

string illegal = """M""a/ry/ h**ad:>> a/:*?""| li*tt|le|| la""mb.?";
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
illegal = r.Replace(illegal, "");

Still, the question begs to be asked, why you’re doing this in the first place.

Respondent: Matthew Scharley

Solution #2:

The original question asked to “remove illegal characters”:

public string RemoveInvalidChars(string filename)
{
    return string.Concat(filename.Split(Path.GetInvalidFileNameChars()));
}

You may instead want to replace them:

public string ReplaceInvalidChars(string filename)
{
    return string.Join("_", filename.Split(Path.GetInvalidFileNameChars()));    
}

This answer was on another thread by Ceres, I really like it neat and simple.

Respondent: Shehab Fawzy

Solution #3:

I use Linq to clean up filenames. You can easily extend this to check for valid paths as well.

private static string CleanFileName(string fileName)
{
    return Path.GetInvalidFileNameChars().Aggregate(fileName, (current, c) => current.Replace(c.ToString(), string.Empty));
}

Update

Some comments indicate this method is not working for them so I’ve included a link to a DotNetFiddle snippet so you may validate the method.

https://dotnetfiddle.net/nw1SWY

Respondent: Michael Minton

Solution #4:

You can remove illegal chars using Linq like this:

var invalidChars = Path.GetInvalidFileNameChars();

var invalidCharsRemoved = stringWithInvalidChars
.Where(x => !invalidChars.Contains(x))
.ToArray();

EDIT
This is how it looks with the required edit mentioned in the comments:

var invalidChars = Path.GetInvalidFileNameChars();

string invalidCharsRemoved = new string(stringWithInvalidChars
  .Where(x => !invalidChars.Contains(x))
  .ToArray());
Respondent: Gregor Slavec

Solution #5:

For file names:

var cleanFileName = string.Join("", fileName.Split(Path.GetInvalidFileNameChars()));

For full paths:

var cleanPath = string.Join("", path.Split(Path.GetInvalidPathChars()));

Note that if you intend to use this as a security feature, a more robust approach would be to expand all paths and then verify that the user supplied path is indeed a child of a directory the user should have access to.

Respondent: Lily Finley

Solution #6:

These are all great solutions, but they all rely on Path.GetInvalidFileNameChars, which may not be as reliable as you’d think. Notice the following remark in the MSDN documentation on Path.GetInvalidFileNameChars:

The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. The full set of invalid characters can vary by file system. For example, on Windows-based desktop platforms, invalid path characters might include ASCII/Unicode characters 1 through 31, as well as quote (“), less than (<), greater than (>), pipe (|), backspace (), null ( ) and tab ( ).

It’s not any better with Path.GetInvalidPathChars method. It contains the exact same remark.

Respondent: René

Solution #7:

For starters, Trim only removes characters from the beginning or end of the string. Secondly, you should evaluate if you really want to remove the offensive characters, or fail fast and let the user know their filename is invalid. My choice is the latter, but my answer should at least show you how to do things the right AND wrong way:

StackOverflow question showing how to check if a given string is a valid file name. Note you can use the regex from this question to remove characters with a regular expression replacement (if you really need to do this).

Respondent: user7116

Solution #8:

The best way to remove illegal character from user input is to replace illegal character using Regex class, create method in code behind or also it validate at client side using RegularExpression control.

public string RemoveSpecialCharacters(string str)
{
    return Regex.Replace(str, "[^a-zA-Z0-9_]+", "_", RegexOptions.Compiled);
}

OR

<asp:RegularExpressionValidator ID="regxFolderName" 
                                runat="server" 
                                ErrorMessage="Enter folder name with  a-z A-Z0-9_" 
                                ControlToValidate="txtFolderName" 
                                Display="Dynamic" 
                                ValidationExpression="^[a-zA-Z0-9_]*$" 
                                ForeColor="Red">
Respondent: anomepani

Solution #9:

I use regular expressions to achieve this. First, I dynamically build the regex.

string regex = string.Format(
                   "[{0}]",
                   Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

Then I just call removeInvalidChars.Replace to do the find and replace. This can obviously be extended to cover path chars as well.

Respondent: Jeff Yates

Solution #10:

I absolutely prefer the idea of Jeff Yates. It will work perfectly, if you slightly modify it:

string regex = String.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

The improvement is just to escape the automaticially generated regex.

Respondent: Jan

Solution #11:

Here’s a code snippet that should help for .NET 3 and higher.

using System.IO;
using System.Text.RegularExpressions;

public static class PathValidation
{
    private static string pathValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex pathValidator = new Regex(pathValidatorExpression, RegexOptions.Compiled);

    private static string fileNameValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex fileNameValidator = new Regex(fileNameValidatorExpression, RegexOptions.Compiled);

    private static string pathCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex pathCleaner = new Regex(pathCleanerExpression, RegexOptions.Compiled);

    private static string fileNameCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex fileNameCleaner = new Regex(fileNameCleanerExpression, RegexOptions.Compiled);

    public static bool ValidatePath(string path)
    {
        return pathValidator.IsMatch(path);
    }

    public static bool ValidateFileName(string fileName)
    {
        return fileNameValidator.IsMatch(fileName);
    }

    public static string CleanPath(string path)
    {
        return pathCleaner.Replace(path, "");
    }

    public static string CleanFileName(string fileName)
    {
        return fileNameCleaner.Replace(fileName, "");
    }
}
Respondent: James

Solution #12:

Most solutions above combine illegal chars for both path and filename which is wrong (even when both calls currently return the same set of chars). I would first split the path+filename in path and filename, then apply the appropriate set to either if them and then combine the two again.

wvd_vegt

Respondent: wvd_vegt

Solution #13:

If you remove or replace with a single character the invalid characters, you can have collisions:

<abc -> abc
>abc -> abc

Here is a simple method to avoid this:

public static string ReplaceInvalidFileNameChars(string s)
{
    char[] invalidFileNameChars = System.IO.Path.GetInvalidFileNameChars();
    foreach (char c in invalidFileNameChars)
        s = s.Replace(c.ToString(), "[" + Array.IndexOf(invalidFileNameChars, c) + "]");
    return s;
}

The result:

 <abc -> [1]abc
 >abc -> [2]abc
Respondent: Maxence

Solution #14:

Throw an exception.

if ( fileName.IndexOfAny(Path.GetInvalidFileNameChars()) > -1 )
            {
                throw new ArgumentException();
            }
Respondent: mirezus

Solution #15:

I wrote this monster for fun, it lets you roundtrip:

public static class FileUtility
{
    private const char PrefixChar = '%';
    private static readonly int MaxLength;
    private static readonly Dictionary<char,char[]> Illegals;
    static FileUtility()
    {
        List<char> illegal = new List<char> { PrefixChar };
        illegal.AddRange(Path.GetInvalidFileNameChars());
        MaxLength = illegal.Select(x => ((int)x).ToString().Length).Max();
        Illegals = illegal.ToDictionary(x => x, x => ((int)x).ToString("D" + MaxLength).ToCharArray());
    }

    public static string FilenameEncode(string s)
    {
        var builder = new StringBuilder();
        char[] replacement;
        using (var reader = new StringReader(s))
        {
            while (true)
            {
                int read = reader.Read();
                if (read == -1)
                    break;
                char c = (char)read;
                if(Illegals.TryGetValue(c,out replacement))
                {
                    builder.Append(PrefixChar);
                    builder.Append(replacement);
                }
                else
                {
                    builder.Append(c);
                }
            }
        }
        return builder.ToString();
    }

    public static string FilenameDecode(string s)
    {
        var builder = new StringBuilder();
        char[] buffer = new char[MaxLength];
        using (var reader = new StringReader(s))
        {
            while (true)
            {
                int read = reader.Read();
                if (read == -1)
                    break;
                char c = (char)read;
                if (c == PrefixChar)
                {
                    reader.Read(buffer, 0, MaxLength);
                    var encoded =(char) ParseCharArray(buffer);
                    builder.Append(encoded);
                }
                else
                {
                    builder.Append(c);
                }
            }
        }
        return builder.ToString();
    }

    public static int ParseCharArray(char[] buffer)
    {
        int result = 0;
        foreach (char t in buffer)
        {
            int digit = t - '0';
            if ((digit < 0) || (digit > 9))
            {
                throw new ArgumentException("Input string was not in the correct format");
            }
            result *= 10;
            result += digit;
        }
        return result;
    }
}
Respondent: Johan Larsson

Solution #16:

File name can not contain characters from Path.GetInvalidPathChars(), + and # symbols, and other specific names. We combined all checks into one class:

public static class FileNameExtensions
{
    private static readonly Lazy<string[]> InvalidFileNameChars =
        new Lazy<string[]>(() => Path.GetInvalidPathChars()
            .Union(Path.GetInvalidFileNameChars()
            .Union(new[] { '+', '#' })).Select(c => c.ToString(CultureInfo.InvariantCulture)).ToArray());


    private static readonly HashSet<string> ProhibitedNames = new HashSet<string>
    {
        @"aux",
        @"con",
        @"clock$",
        @"nul",
        @"prn",

        @"com1",
        @"com2",
        @"com3",
        @"com4",
        @"com5",
        @"com6",
        @"com7",
        @"com8",
        @"com9",

        @"lpt1",
        @"lpt2",
        @"lpt3",
        @"lpt4",
        @"lpt5",
        @"lpt6",
        @"lpt7",
        @"lpt8",
        @"lpt9"
    };

    public static bool IsValidFileName(string fileName)
    {
        return !string.IsNullOrWhiteSpace(fileName)
            && fileName.All(o => !IsInvalidFileNameChar(o))
            && !IsProhibitedName(fileName);
    }

    public static bool IsProhibitedName(string fileName)
    {
        return ProhibitedNames.Contains(fileName.ToLower(CultureInfo.InvariantCulture));
    }

    private static string ReplaceInvalidFileNameSymbols([CanBeNull] this string value, string replacementValue)
    {
        if (value == null)
        {
            return null;
        }

        return InvalidFileNameChars.Value.Aggregate(new StringBuilder(value),
            (sb, currentChar) => sb.Replace(currentChar, replacementValue)).ToString();
    }

    public static bool IsInvalidFileNameChar(char value)
    {
        return InvalidFileNameChars.Value.Contains(value.ToString(CultureInfo.InvariantCulture));
    }

    public static string GetValidFileName([NotNull] this string value)
    {
        return GetValidFileName(value, @"_");
    }

    public static string GetValidFileName([NotNull] this string value, string replacementValue)
    {
        if (string.IsNullOrWhiteSpace(value))
        {
            throw new ArgumentException(@"value should be non empty", nameof(value));
        }

        if (IsProhibitedName(value))
        {
            return (string.IsNullOrWhiteSpace(replacementValue) ? @"_" : replacementValue) + value; 
        }

        return ReplaceInvalidFileNameSymbols(value, replacementValue);
    }

    public static string GetFileNameError(string fileName)
    {
        if (string.IsNullOrWhiteSpace(fileName))
        {
            return CommonResources.SelectReportNameError;
        }

        if (IsProhibitedName(fileName))
        {
            return CommonResources.FileNameIsProhibited;
        }

        var invalidChars = fileName.Where(IsInvalidFileNameChar).Distinct().ToArray();

        if(invalidChars.Length > 0)
        {
            return string.Format(CultureInfo.CurrentCulture,
                invalidChars.Length == 1 ? CommonResources.InvalidCharacter : CommonResources.InvalidCharacters,
                StringExtensions.JoinQuoted(@",", @"'", invalidChars.Select(c => c.ToString(CultureInfo.CurrentCulture))));
        }

        return string.Empty;
    }
}

Method GetValidFileName replaces all incorrect data to _.

Respondent: Backs

Solution #17:

I think it is much easier to validate using a regex and specifiing which characters are allowed, instead of trying to check for all bad characters.
See these links:
http://www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspx
http://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html

Also, do a search for “regular expression editor”s, they help a lot. There are some around which even output the code in c# for you.

Respondent: Sandor Davidhazi

Solution #18:

This seems to be O(n) and does not spend too much memory on strings:

    private static readonly HashSet<char> invalidFileNameChars = new HashSet<char>(Path.GetInvalidFileNameChars());

    public static string RemoveInvalidFileNameChars(string name)
    {
        if (!name.Any(c => invalidFileNameChars.Contains(c))) {
            return name;
        }

        return new string(name.Where(c => !invalidFileNameChars.Contains(c)).ToArray());
    }
Respondent: Alexey F

Solution #19:

Scanning over the answers here, they all** seem to involve using a char array of invalid filename characters.

Granted, this may be micro-optimising – but for the benefit of anyone who might be looking to check a large number of values for being valid filenames, it’s worth noting that building a hashset of invalid chars will bring about notably better performance.

I have been very surprised (shocked) in the past just how quickly a hashset (or dictionary) outperforms iterating over a list. With strings, it’s a ridiculously low number (about 5-7 items from memory). With most other simple data (object references, numbers etc) the magic crossover seems to be around 20 items.

There are 40 invalid characters in the Path.InvalidFileNameChars “list”. Did a search today and there’s quite a good benchmark here on StackOverflow that shows the hashset will take a little over half the time of an array/list for 40 items: https://stackoverflow.com/a/10762995/949129

Here’s the helper class I use for sanitising paths. I forget now why I had the fancy replacement option in it, but it’s there as a cute bonus.

Additional bonus method “IsValidLocalPath” too 🙂

(** those which don’t use regular expressions)

public static class PathExtensions
{
    private static HashSet<char> _invalidFilenameChars;
    private static HashSet<char> InvalidFilenameChars
    {
        get { return _invalidFilenameChars ?? (_invalidFilenameChars = new HashSet<char>(Path.GetInvalidFileNameChars())); }
    }


    /// <summary>Replaces characters in <c>text</c> that are not allowed in file names with the 
    /// specified replacement character.</summary>
    /// <param name="text">Text to make into a valid filename. The same string is returned if 
    /// it is valid already.</param>
    /// <param name="replacement">Replacement character, or NULL to remove bad characters.</param>
    /// <param name="fancyReplacements">TRUE to replace quotes and slashes with the non-ASCII characters ” and ?.</param>
    /// <returns>A string that can be used as a filename. If the output string would otherwise be empty, "_" is returned.</returns>
    public static string ToValidFilename(this string text, char? replacement = '_', bool fancyReplacements = false)
    {
        StringBuilder sb = new StringBuilder(text.Length);
        HashSet<char> invalids = InvalidFilenameChars;
        bool changed = false;

        for (int i = 0; i < text.Length; i++)
        {
            char c = text[i];
            if (invalids.Contains(c))
            {
                changed = true;
                char repl = replacement ?? ' ';
                if (fancyReplacements)
                {
                    if (c == '"') repl = '”'; // U+201D right double quotation mark
                    else if (c == ''') repl = '’'; // U+2019 right single quotation mark
                    else if (c == '/') repl = '?'; // U+2044 fraction slash
                }
                if (repl != ' ')
                    sb.Append(repl);
            }
            else
                sb.Append(c);
        }

        if (sb.Length == 0)
            return "_";

        return changed ? sb.ToString() : text;
    }


    /// <summary>
    /// Returns TRUE if the specified path is a valid, local filesystem path.
    /// </summary>
    /// <param name="pathString"></param>
    /// <returns></returns>
    public static bool IsValidLocalPath(this string pathString)
    {
        // From solution at https://stackoverflow.com/a/11636052/949129
        Uri pathUri;
        Boolean isValidUri = Uri.TryCreate(pathString, UriKind.Absolute, out pathUri);
        return isValidUri && pathUri != null && pathUri.IsLoopback;
    }
}
Respondent: Daniel Scott

Solution #20:

public static class StringExtensions
      {
        public static string RemoveUnnecessary(this string source)
        {
            string result = string.Empty;
            string regex = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
            Regex reg = new Regex(string.Format("[{0}]", Regex.Escape(regex)));
            result = reg.Replace(source, "");
            return result;
        }
    }

You can use method clearly.

Respondent: aemre

Solution #21:

One liner to cleanup string from any illegal chars for windows file naming:

public static string CleanIllegalName(string p_testName) => new Regex(string.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars())))).Replace(p_testName, "");
Respondent: Zananok

Solution #22:

Here is my small contribution. A method to replace within the same string without creating new strings or stringbuilders. It’s fast, easy to understand and a good alternative to all mentions in this post.

private static HashSet<char> _invalidCharsHash;
private static HashSet<char> InvalidCharsHash
{
  get { return _invalidCharsHash ?? (_invalidCharsHash = new HashSet<char>(Path.GetInvalidFileNameChars())); }
}

private static string ReplaceInvalidChars(string fileName, string newValue)
{
  char newChar = newValue[0];

  char[] chars = fileName.ToCharArray();
  for (int i = 0; i < chars.Length; i++)
  {
    char c = chars[i];
    if (InvalidCharsHash.Contains(c))
      chars[i] = newChar;
  }

  return new string(chars);
}

You can call it like this:

string illegal = """M<>""a/ry/ h**ad:>> a/:*?""<>| li*tt|le|| la""mb.?";
string legal = ReplaceInvalidChars(illegal);

and returns:

_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

It’s worth to note that this method will always replace invalid chars with a given value, but will not remove them. If you want to remove invalid chars, this alternative will do the trick:

private static string RemoveInvalidChars(string fileName, string newValue)
{
  char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];
  bool remove = newChar == char.MinValue;

  char[] chars = fileName.ToCharArray();
  char[] newChars = new char[chars.Length];
  int i2 = 0;
  for (int i = 0; i < chars.Length; i++)
  {
    char c = chars[i];
    if (InvalidCharsHash.Contains(c))
    {
      if (!remove)
        newChars[i2++] = newChar;
    }
    else
      newChars[i2++] = c;

  }

  return new string(newChars, 0, i2);
}

BENCHMARK

I executed timed test runs with most methods found in this post, if performance is what you are after. Some of these methods don’t replace with a given char, since OP was asking to clean the string. I added tests replacing with a given char, and some others replacing with an empty char if your intended scenario only needs to remove the unwanted chars. Code used for this benchmark is at the end, so you can run your own tests.

Note: Methods Test1 and Test2 are both proposed in this post.

First Run

replacing with '_', 1000000 iterations

Results:

============Test1===============
Elapsed=00:00:01.6665595
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test2===============
Elapsed=00:00:01.7526835
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test3===============
Elapsed=00:00:05.2306227
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test4===============
Elapsed=00:00:14.8203696
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test5===============
Elapsed=00:00:01.8273760
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test6===============
Elapsed=00:00:05.4249985
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test7===============
Elapsed=00:00:07.5653833
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test8===============
Elapsed=00:12:23.1410106
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test9===============
Elapsed=00:00:02.1016708
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._

============Test10===============
Elapsed=00:00:05.0987225
Result=M ary had a little lamb.

============Test11===============
Elapsed=00:00:06.8004289
Result=M ary had a little lamb.

Second Run

removing invalid chars, 1000000 iterations

Note: Test1 will not remove, only replace.

Results:

============Test1===============
Elapsed=00:00:01.6945352
Result= M     a ry  h  ad    a          li tt le   la mb.

============Test2===============
Elapsed=00:00:01.4798049
Result=M ary had a little lamb.

============Test3===============
Elapsed=00:00:04.0415688
Result=M ary had a little lamb.

============Test4===============
Elapsed=00:00:14.3397960
Result=M ary had a little lamb.

============Test5===============
Elapsed=00:00:01.6782505
Result=M ary had a little lamb.

============Test6===============
Elapsed=00:00:04.9251707
Result=M ary had a little lamb.

============Test7===============
Elapsed=00:00:07.9562379
Result=M ary had a little lamb.

============Test8===============
Elapsed=00:12:16.2918943
Result=M ary had a little lamb.

============Test9===============
Elapsed=00:00:02.0770277
Result=M ary had a little lamb.

============Test10===============
Elapsed=00:00:05.2721232
Result=M ary had a little lamb.

============Test11===============
Elapsed=00:00:05.2802903
Result=M ary had a little lamb.

BENCHMARK RESULTS

Methods Test1, Test2 and Test5 are the fastest. Method Test8 is the slowest.

CODE

Here’s the complete code of the benchmark:

private static HashSet<char> _invalidCharsHash;
private static HashSet<char> InvalidCharsHash
{
  get { return _invalidCharsHash ?? (_invalidCharsHash = new HashSet<char>(Path.GetInvalidFileNameChars())); }
}

private static string _invalidCharsValue;
private static string InvalidCharsValue
{
  get { return _invalidCharsValue ?? (_invalidCharsValue = new string(Path.GetInvalidFileNameChars())); }
}

private static char[] _invalidChars;
private static char[] InvalidChars
{
  get { return _invalidChars ?? (_invalidChars = Path.GetInvalidFileNameChars()); }
}

static void Main(string[] args)
{
  string testPath = """M <>""a/ry/ h**ad:>> a/:*?""<>| li*tt|le|| la""mb.?";

  int max = 1000000;
  string newValue = "";

  TimeBenchmark(max, Test1, testPath, newValue);
  TimeBenchmark(max, Test2, testPath, newValue);
  TimeBenchmark(max, Test3, testPath, newValue);
  TimeBenchmark(max, Test4, testPath, newValue);
  TimeBenchmark(max, Test5, testPath, newValue);
  TimeBenchmark(max, Test6, testPath, newValue);
  TimeBenchmark(max, Test7, testPath, newValue);
  TimeBenchmark(max, Test8, testPath, newValue);
  TimeBenchmark(max, Test9, testPath, newValue);
  TimeBenchmark(max, Test10, testPath, newValue);
  TimeBenchmark(max, Test11, testPath, newValue);

  Console.Read();
}

private static void TimeBenchmark(int maxLoop, Func<string, string, string> func, string testString, string newValue)
{
  var sw = new Stopwatch();
  sw.Start();
  string result = string.Empty;

  for (int i = 0; i < maxLoop; i++)
    result = func?.Invoke(testString, newValue);

  sw.Stop();

  Console.WriteLine($"============{func.Method.Name}===============");
  Console.WriteLine("Elapsed={0}", sw.Elapsed);
  Console.WriteLine("Result={0}", result);
  Console.WriteLine("");
}

private static string Test1(string fileName, string newValue)
{
  char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];

  char[] chars = fileName.ToCharArray();
  for (int i = 0; i < chars.Length; i++)
  {
    if (InvalidCharsHash.Contains(chars[i]))
      chars[i] = newChar;
  }

  return new string(chars);
}

private static string Test2(string fileName, string newValue)
{
  char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];
  bool remove = newChar == char.MinValue;

  char[] chars = fileName.ToCharArray();
  char[] newChars = new char[chars.Length];
  int i2 = 0;
  for (int i = 0; i < chars.Length; i++)
  {
    char c = chars[i];
    if (InvalidCharsHash.Contains(c))
    {
      if (!remove)
        newChars[i2++] = newChar;
    }
    else
      newChars[i2++] = c;

  }

  return new string(newChars, 0, i2);
}

private static string Test3(string filename, string newValue)
{
  foreach (char c in InvalidCharsValue)
  {
    filename = filename.Replace(c.ToString(), newValue);
  }

  return filename;
}

private static string Test4(string filename, string newValue)
{
  Regex r = new Regex(string.Format("[{0}]", Regex.Escape(InvalidCharsValue)));
  filename = r.Replace(filename, newValue);
  return filename;
}

private static string Test5(string filename, string newValue)
{
  return string.Join(newValue, filename.Split(InvalidChars));
}

private static string Test6(string fileName, string newValue)
{
  return InvalidChars.Aggregate(fileName, (current, c) => current.Replace(c.ToString(), newValue));
}

private static string Test7(string fileName, string newValue)
{
  string regex = string.Format("[{0}]", Regex.Escape(InvalidCharsValue));
  return Regex.Replace(fileName, regex, newValue, RegexOptions.Compiled);
}

private static string Test8(string fileName, string newValue)
{
  string regex = string.Format("[{0}]", Regex.Escape(InvalidCharsValue));
  Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);
  return removeInvalidChars.Replace(fileName, newValue);
}

private static string Test9(string fileName, string newValue)
{
  StringBuilder sb = new StringBuilder(fileName.Length);
  bool changed = false;

  for (int i = 0; i < fileName.Length; i++)
  {
    char c = fileName[i];
    if (InvalidCharsHash.Contains(c))
    {
      changed = true;
      sb.Append(newValue);
    }
    else
      sb.Append(c);
  }

  if (sb.Length == 0)
    return newValue;

  return changed ? sb.ToString() : fileName;
}

private static string Test10(string fileName, string newValue)
{
  if (!fileName.Any(c => InvalidChars.Contains(c)))
  {
    return fileName;
  }

  return new string(fileName.Where(c => !InvalidChars.Contains(c)).ToArray());
}

private static string Test11(string fileName, string newValue)
{
  string invalidCharsRemoved = new string(fileName
    .Where(x => !InvalidChars.Contains(x))
    .ToArray());

  return invalidCharsRemoved;
}
Respondent: c-chavez

Solution #23:

public static bool IsValidFilename(string testName)
{
    return !new Regex("[" + Regex.Escape(new String(System.IO.Path.GetInvalidFileNameChars())) + "]").IsMatch(testName);
}
Respondent: mbdavis

Solution #24:

This will do want you want, and avoid collisions

 static string SanitiseFilename(string key)
    {
        var invalidChars = Path.GetInvalidFileNameChars();
        var sb = new StringBuilder();
        foreach (var c in key)
        {
            var invalidCharIndex = -1;
            for (var i = 0; i < invalidChars.Length; i++)
            {
                if (c == invalidChars[i])
                {
                    invalidCharIndex = i;
                }
            }
            if (invalidCharIndex > -1)
            {
                sb.Append("_").Append(invalidCharIndex);
                continue;
            }

            if (c == '_')
            {
                sb.Append("__");
                continue;
            }

            sb.Append(c);
        }
        return sb.ToString();

    }
Respondent: mcintyre321

Solution #25:

I think the question already not full answered…
The answers only describe clean filename OR path… not both. Here is my solution:

private static string CleanPath(string path)
{
    string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
    Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
    List<string> split = path.Split('').ToList();
    string returnValue = split.Aggregate(string.Empty, (current, s) => current + (r.Replace(s, "") + @"""));
    returnValue = returnValue.TrimEnd('');
    return returnValue;
}
Respondent: Suplanus

Solution #26:

I created an extension method that combines several suggestions:

  1. Holding illegal characters in a hash set
  2. Filtering out characters below ascii 127. Since Path.GetInvalidFileNameChars does not include all invalid characters possible with ascii codes from 0 to 255. See here and MSDN
  3. Possiblity to define the replacement character

Source:

public static class FileNameCorrector
{
    private static HashSet<char> invalid = new HashSet<char>(Path.GetInvalidFileNameChars());

    public static string ToValidFileName(this string name, char replacement = ' ')
    {
        var builder = new StringBuilder();
        foreach (var cur in name)
        {
            if (cur > 31 && cur < 128 && !invalid.Contains(cur))
            {
                builder.Append(cur);
            }
            else if (replacement != ' ')
            {
                builder.Append(replacement);
            }
        }

        return builder.ToString();
    }
}
Respondent: schoetbi

Solution #27:

Here is a function which replaces all illegal characters in a file name by a replacement character:

public static string ReplaceIllegalFileChars(string FileNameWithoutPath, char ReplacementChar)
{
  const string IllegalFileChars = "*?/:<>|""";
  StringBuilder sb = new StringBuilder(FileNameWithoutPath.Length);
  char c;

  for (int i = 0; i < FileNameWithoutPath.Length; i++)
  {
    c = FileNameWithoutPath[i];
    if (IllegalFileChars.IndexOf(c) >= 0)
    {
      c = ReplacementChar;
    }
    sb.Append(c);
  }
  return (sb.ToString());
}

For example the underscore can be used as a replacement character:

NewFileName = ReplaceIllegalFileChars(FileName, '_');
Respondent: Hans-Peter Kalb

Solution #28:

I’ve rolled my own method, which seems to be a lot faster of other posted here (especially the regex which is so sloooooow) but I didn’t tested all methods posted.

https://dotnetfiddle.net/haIXiY

The first method (mine) and second (also mine, but old one) also do an added check on backslashes, so the benchmark are not perfect, but anyways it’s just to give you an idea.

Result on my laptop (for 100 000 iterations):

StringHelper.RemoveInvalidCharacters 1: 451 ms  
StringHelper.RemoveInvalidCharacters 2: 7139 ms  
StringHelper.RemoveInvalidCharacters 3: 2447 ms  
StringHelper.RemoveInvalidCharacters 4: 3733 ms  
StringHelper.RemoveInvalidCharacters 5: 11689 ms  (==> Regex!)

The fastest method:

public static string RemoveInvalidCharacters(string content, char replace = '_', bool doNotReplaceBackslashes = false)
{
    if (string.IsNullOrEmpty(content))
        return content;

    var idx = content.IndexOfAny(InvalidCharacters);
    if (idx >= 0)
    {
        var sb = new StringBuilder(content);
        while (idx >= 0)
        {
            if (sb[idx] != '' || !doNotReplaceBackslashes)
                sb[idx] = replace;
            idx = content.IndexOfAny(InvalidCharacters, idx+1);
        }
        return sb.ToString();
    }
    return content;
}

Method doesn’t compile “as is” dur to InvalidCharacters property, check the fiddle for full code

Respondent: Fabske

Solution #29:

If you have to use the method in many places in a project, you could also make an extension method and call it anywhere in the project for strings.

 public static class StringExtension
    {
        public static string RemoveInvalidChars(this string originalString)
        {            
            string finalString=string.Empty;
            if (!string.IsNullOrEmpty(originalString))
            {
                return string.Concat(originalString.Split(Path.GetInvalidFileNameChars()));
            }
            return finalString;            
        }
    }

You can call the above extension method as:

string illegal = """M<>""a/ry/ h**ad:>> a/:*?""<>| li*tt|le|| la""mb.?";
string afterIllegalChars = illegal.RemoveInvalidChars();
Respondent: Simant

Solution #30:

Or you can just do

[YOUR STRING].Replace('', ' ').Replace('/', ' ').Replace('"', ' ').Replace('*', ' ').Replace(':', ' ').Replace('?', ' ').Replace('<', ' ').Replace('>', ' ').Replace('|', ' ').Trim();
Respondent: Danny Fallas

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

Leave a Reply

Your email address will not be published.