[Solved] PowerShell: Import-CSV with no headers and remove partial duplicate lines
I have a log file that is formatted as a CSV with no headers. The first column is basically the unique identifier for the issues being recorded. There may be multiple lines with different details for the same issue identifier. I would like to remove lines where the first column is duplicated because I don’t need the other data at this time.
I have fairly basic knowledge of PowerShell at this point, so I’m sure there’s something simple I’m missing.
I’m sorry if this is a duplicate, but I could find questions to answer some parts of the question, but not the question as a whole.
So far, my best guess is:
Import-Csv $outFile | % { Select-Object -Index 1 -Unique } | Out-File $outFile -Append
But this gives me the error:
Import-Csv : The member “LB” is already present.
At C:UsersjnurczykDesktopScratchPOImportgetPOImport.ps1:6 char:1
+ Import-Csv $outFile | % { Select-Object -InputObject $_ -Index 1 -Unique } | Out …
+ ~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Import-Csv], ExtendedTypeSystemException
+ FullyQualifiedErrorId : AlreadyPresentPSMemberInfoInternalCollectionAdd,Microsoft.PowerShell.Commands.ImportCsvCommand
Solution #1:
Because your data has no headers, you need to specify the headers in your Import-Csv
cmdlet. And then to select only unique records using the first column, you need to specify that in the Select-Object
cmdlet. See code below:
Import-Csv $outFile -Header A,B,C | Select-Object -Unique A
To clarify, the headers in my example are A, B, and C. This works if you know how many columns there are. If you have too few headers, then columns are dropped. If you have too many headers, then they become empty fields.
Solution #2:
Every time I look for a solution to this issue I run across this thread. However the solution accepted here is more generic that I would like. The function below Increments each time it sees the same header name: A, B, C, A1 D, A2, C1 etc.
Function Import-CSVCustom ($csvTemp) {
$StreamReader = New-Object System.IO.StreamReader -Arg $csvTemp
[array]$Headers = $StreamReader.ReadLine() -Split "," | % { "$_".Trim() } | ? { $_ }
$StreamReader.Close()
$a[email protected]{}; $Headers = $headers|%{
if($a.$_.count) {"$_$($a.$_.count)"} else {$_}
$a.$_ += @($_)
}
Import-Csv $csvTemp -Header $Headers
}
Solution #3:
To expand upon Benjamin Hubbard’s post here is a little Sql Script (assuming that you will be inserting this data into a table in a database of course!) I use to create the header property in my script:
SELECT
'-Header '
+ STUFF((SELECT
',' + QUOTENAME(COLUMN_NAME, '"')
+ CASE WHEN C.ORDINAL_POSITION % 5 = 0 THEN ' `' + CHAR(13) + CHAR(10) ELSE '' END
FROM
INFORMATION_SCHEMA.COLUMNS C
WHERE
TABLE_NAME = '<Staging Table Name>'
FOR XML PATH (''), type).value('.', 'nvarchar(max)'), 1, 1, '')