-
Notifications
You must be signed in to change notification settings - Fork 2
Updated pixels analysis #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Aidan-MT
wants to merge
20
commits into
DuguidLab:master
Choose a base branch
from
Aidan-MT:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
e0f43d7
First Commit
1bdfe1e
Moved confidence interval function to new folder
Aidan-MT c714062
Updated noise analysis to run as a function.
Aidan-MT 5d752d2
Formatted files, updated and created figures
Aidan-MT 2b890cd
Created file containing noise clustering function/plot
Aidan-MT 0002ff3
Updated CI analysis function
Aidan-MT 3bb076c
Updated data wrangling in function, should only run when required
Aidan-MT f9f7256
Updated functions to utilise multiple experimental sessions and gener…
Aidan-MT ab5255f
Moved figures
Aidan-MT c7f5f7b
Added a number of new scripts to my analyses folder concerning the fi…
Aidan-MT be089ce
Merge branch 'DuguidLab:master' into master
Aidan-MT f543b03
Deleted Superfluous Files
Aidan-MT 8aaa5ec
Deleted Duplicate File
Aidan-MT 9e45ffb
Updated statistical tests and added Fano Fac calc
Aidan-MT 7883abd
Merge branch 'DuguidLab:master' into master
Aidan-MT 8ede878
Updated K-Means Clustering to Include Plot Func.
Aidan-MT 7c37a03
Added plot function to SD clustering
Aidan-MT 266c567
Updated functions to return axes
Aidan-MT 49d6914
Updated imports, removed typos/comments
Aidan-MT cbb794e
Updated functions script with fano factor calculations, added two new…
Aidan-MT File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
156 changes: 156 additions & 0 deletions
156
pixtools/clusters/noise_analysis_SD_kmeans_Clustering.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,156 @@ | ||
| # First import required packages | ||
| import sys | ||
| import json | ||
| from sklearn.cluster import KMeans | ||
|
|
||
| import numpy as np | ||
| import pandas as pd | ||
| import seaborn as sns | ||
| import matplotlib.pyplot as plt | ||
| import probeinterface as pi | ||
|
|
||
|
|
||
| def meta_spikeglx(exp, session): | ||
| """ | ||
| Simply extracts channel depth from probe metadata. | ||
|
|
||
| exp: exp class containing mouse IDs | ||
|
|
||
| session: specific recording session to extract information from | ||
| """ | ||
| meta = exp[session].files[0]["spike_meta"] | ||
| data_path = exp[session].find_file(meta) | ||
| data = pi.read_spikeglx(data_path) | ||
|
|
||
| return data | ||
|
|
||
|
|
||
| def noise_per_channeldepth(myexp): | ||
| """ | ||
| Function extracts the noise for each channel, combining this into a dataframe | ||
|
|
||
| myexp: the experiment defined in base.py, will extract the depth information from here. | ||
| """ | ||
| noise = pd.DataFrame( | ||
| columns=["session", "project", "SDs", "x", "y"] | ||
| ) # Create the empty array to hold the noise information | ||
| depths = meta_spikeglx(myexp, 0) | ||
| depths = depths.to_dataframe() | ||
| coords = depths[ | ||
| ["x", "y"] | ||
| ] # Create a dataframe containing the generic x and y coords. | ||
| tot_noise = [] | ||
|
|
||
| # Iterate through each session, taking the noise for each file and loading them into one continuous data frame. | ||
| for s, session in enumerate(myexp): | ||
| for i in range(len(session.files)): | ||
| path = session.processed / f"noise_{i}.json" | ||
| with path.open() as fd: | ||
| ses_noise = json.load(fd) | ||
|
|
||
| chan_noises = [] | ||
| for j, SD in enumerate( | ||
| ses_noise["SDs"][0:-1] | ||
| ): # This will iterate over first 384 channels, and exclude the sync channel | ||
| x = coords["x"].iloc[j] | ||
| y = coords["y"].iloc[j] | ||
| noise_row = pd.DataFrame.from_records( | ||
| {"session": [session.name], "SDs": [SD], "x": x, "y": y} | ||
| ) | ||
| chan_noises.append(noise_row) | ||
|
|
||
| # Take all datafrom channel noises for a session, then concatenate | ||
| noise = pd.concat(chan_noises) | ||
| tot_noise.append(noise) # Take all channel noises and add to a master file | ||
| df2 = pd.concat( | ||
| tot_noise | ||
| ) # Convert this master file, containing every sessions noise data into a dataframe | ||
|
|
||
| return df2 | ||
|
|
||
|
|
||
| def elbowplot(data, myexp): | ||
|
|
||
| """ | ||
|
|
||
| This function takes data formatted according to noise_per_channeldepth(), containing the noise values for all channels | ||
| Will iterate through each experimental session, producing the appropriate graph. Should take the optimal number of clusters as the point at which the elbow bends. | ||
| This point is defined as the boundary where additional clusters no longer explain much more variance in the data. | ||
|
|
||
| data: The dataframe, as formatted by noise_per_channeldepth() | ||
|
|
||
| myexp: The experiment, defined in base.py containing the session information. | ||
|
|
||
| """ | ||
|
|
||
| for s, session in enumerate(myexp): | ||
| name = session.name | ||
| ses_data = data.loc[data["session"] == name] | ||
| df3 = ses_data["SDs"].values.reshape( | ||
| -1, 1 | ||
| ) # Just gives all noise values, for each session | ||
| Sum_of_squares = [] # create an empty list to store these in. | ||
|
|
||
| k = range(1, 10) | ||
| for num_clusters in k: | ||
| kmeans = KMeans(n_clusters=num_clusters) | ||
| kmeans.fit(df3) | ||
| Sum_of_squares.append(kmeans.inertia_) | ||
|
|
||
| fig, ax = plt.subplots() | ||
|
|
||
| # This code will plot the elbow graph to give an overview of the variance in the data explained by the varying the number of clusters | ||
| # This gives the distance from the centroids, as a measure of the variability explained | ||
| # We want this to drop off indicating that there is no remaining data explained by further centroid inclusion | ||
|
|
||
| # Figure has two rows, one columns, this is the first plot | ||
| plt.plot(k, Sum_of_squares, "bx-") # bx gives blue x as each point. | ||
| plt.xlabel("Putative Number of Clusters") | ||
| plt.ylabel("Sum of Squares Distances/Inertia") | ||
| plt.title( | ||
| f"Determining Optimal Number of Clusters for Analysis - Session {name}" | ||
| ) | ||
|
|
||
| f = plt.gca() | ||
| return f | ||
|
|
||
|
|
||
| def clusterplot(data, myexp, cluster_num): | ||
| """ | ||
| Function takes the noise per channel and depth information, produced by noise_per_channeldepth() and produces a clusterplot. | ||
| Clustering is performed by K-means analysis, elbow plot should be produced by elbowplot() to determine optimal cluster number. | ||
|
|
||
| data: data produced by noise_per_channel_depth() containing channel ID, coordinate, and recording noise for each session in the exp class | ||
|
|
||
| myexp: the exp class containing mouse IDs | ||
|
|
||
| cluster_num: the number of clusters to produce through the k-means analysis, determined by qualitative analysis of elbow plots. (where the "bow" of the line occurs) | ||
|
|
||
| """ | ||
|
|
||
| # First define k-means parameters for clustering | ||
| kmeans = KMeans( | ||
| init="random", # Initiate the iterative analysis with random centres | ||
| n_clusters=cluster_num, # How many clusters to bin the data into, based on the elbow analysis! | ||
| n_init=10, # Number of centroids to generate initially | ||
| max_iter=300, # Max number of iterations before ceasing analysis | ||
| random_state=42, # The random number seed for centroid generation, can really be anything for our purposes | ||
| ) | ||
|
|
||
| for s, session in enumerate(myexp): | ||
| name = session.name | ||
|
|
||
| ses = data.loc[data["session"] == name] | ||
| sd = ses["SDs"].values.reshape(-1, 1) | ||
| y_means = kmeans.fit_predict(sd) | ||
|
|
||
| # Now plot the kmeans analysis | ||
| # Remember we use our original data (ses) but use the clustering analysis to generate the labels | ||
| plt.scatter(ses["y"], ses["SDs"], c=y_means, cmap="viridis") | ||
|
|
||
| plt.xlabel("Probe Channel Y-Coordinate") | ||
| plt.ylabel("Channel Noise (SD)") | ||
| plt.title(f"{name} Channel Noise k-Mean Clustering Analysis") | ||
|
|
||
| f = plt.gca() | ||
| return f |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| def significance_extraction(CI): | ||
| """ | ||
| This function takes the output of the get_aligned_spike_rate_CI method under the myexp class and extracts any significant values, returning a dataframe in the same format. | ||
|
|
||
| CI: The dataframe created by the CI calculation previously mentioned | ||
|
|
||
| """ | ||
|
|
||
| sig = [] | ||
| keys=[] | ||
| rec_num = 0 | ||
|
|
||
| #This loop iterates through each column, storing the data as un, and the location as s | ||
| for s, unit in CI.items(): | ||
| #Now iterate through each recording, and unit | ||
| #Take any significant values and append them to lists. | ||
| if unit.loc[2.5] > 0 or unit.loc[97.5] < 0: | ||
| sig.append(unit) #Append the percentile information for this column to a list | ||
| keys.append(s) #append the information containing the point at which the iteration currently stands | ||
|
|
||
|
|
||
| #Now convert this list to a dataframe, using the information stored in the keys list to index it | ||
| sigs = pd.concat( | ||
| sig, axis = 1, copy = False, | ||
| keys=keys, | ||
| names=["session", "unit", "rec_num"] | ||
| ) | ||
|
|
||
| return sigs | ||
|
|
||
| def percentile_plot(CIs, sig_CIs, exp, sig_only = False, dir_ascending = False): | ||
| """ | ||
|
|
||
| This function takes the CI data and significant values and plots them relative to zero. | ||
| May specify if percentiles should be plotted in ascending or descending order. | ||
|
|
||
| CIs: The output of the get_aligned_spike_rate_CI function, i.e., bootstrapped confidence intervals for spike rates relative to two points. | ||
|
|
||
| sig_CIs: The output of the significance_extraction function, i.e., the units from the bootstrapping analysis whose confidence intervals do not straddle zero | ||
|
|
||
| exp: The experimental session to analyse, defined in base.py | ||
|
|
||
| sig_only: Whether to plot only the significant values obtained from the bootstrapping analysis (True/False) | ||
|
|
||
| dir_ascending: Whether to plot the values in ascending order (True/False) | ||
|
|
||
| """ | ||
| #First sort the data into long form for the full dataset, by percentile | ||
| CIs_long = CIs.reset_index().melt("percentile").sort_values("value", ascending= dir_ascending) | ||
| CIs_long = CIs_long.reset_index() | ||
| CIs_long["index"] = pd.Series(range(0, CIs_long.shape[0]))#reset the index column to allow ordered plotting | ||
|
|
||
| #Now select if we want only significant values plotted, else raise an error. | ||
| if sig_only is True: | ||
| CIs_long_sig = sig_CIs.reset_index().melt("percentile").sort_values("value", ascending=dir_ascending) | ||
| CIs_long_sig = CIs_long_sig.reset_index() | ||
| CIs_long_sig["index"] = pd.Series(range(0, CIs_long_sig.shape[0])) | ||
|
|
||
| data = CIs_long_sig | ||
|
|
||
| elif sig_only is False: | ||
| data = CIs_long | ||
|
|
||
| else: | ||
| raise TypeError("Sig_only argument must be a boolean operator (True/False)") | ||
|
|
||
| #Plot this data for the experimental sessions as a pointplot. | ||
| for s, session in enumerate(exp): | ||
| name = session.name | ||
|
|
||
| p = sns.pointplot( | ||
| x="unit", y = "value", data = data.loc[(data.session == s)], | ||
| order = data.loc[(data.session == s)]["unit"].unique(), join = False, legend = None) #Plots in the order of the units as previously set, uses unique values to prevent double plotting | ||
|
|
||
| p.set_xlabel("Unit") | ||
| p.set_ylabel("Confidence Interval") | ||
| p.set(xticklabels=[]) | ||
| p.axhline(0) | ||
| plt.suptitle("\n".join(wrap(f"Confidence Intervals By Unit - Grasp vs. Baseline - Session {name}"))) #Wraps the title of the plot to fit on the page. | ||
|
|
||
| plt.show() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| def significance_extraction(CI): | ||
| """ | ||
| This function takes the output of the get_aligned_spike_rate_CI method under the myexp class and extracts any significant values, returning a dataframe in the same format. | ||
|
|
||
| CI: The dataframe created by the CI calculation previously mentioned | ||
|
|
||
| """ | ||
|
|
||
| sig = [] | ||
| keys=[] | ||
| rec_num = 0 | ||
|
|
||
| #This loop iterates through each column, storing the data as un, and the location as s | ||
| for s, unit in CI.items(): | ||
| #Now iterate through each recording, and unit | ||
| #Take any significant values and append them to lists. | ||
| if unit.loc[2.5] > 0 or unit.loc[97.5] < 0: | ||
| sig.append(unit) #Append the percentile information for this column to a list | ||
| keys.append(s) #append the information containing the point at which the iteration currently stands | ||
|
|
||
|
|
||
| #Now convert this list to a dataframe, using the information stored in the keys list to index it | ||
| sigs = pd.concat( | ||
| sig, axis = 1, copy = False, | ||
| keys=keys, | ||
| names=["session", "unit", "rec_num"] | ||
| ) | ||
|
|
||
| return sigs | ||
|
|
||
| def percentile_plot(CIs, sig_CIs, exp, sig_only = False, dir_ascending = False): | ||
| """ | ||
|
|
||
| This function takes the CI data and significant values and plots them relative to zero. | ||
| May specify if percentiles should be plotted in ascending or descending order. | ||
|
|
||
| CIs: The output of the get_aligned_spike_rate_CI function, i.e., bootstrapped confidence intervals for spike rates relative to two points. | ||
|
|
||
| sig_CIs: The output of the significance_extraction function, i.e., the units from the bootstrapping analysis whose confidence intervals do not straddle zero | ||
|
|
||
| exp: The experimental session to analyse, defined in base.py | ||
|
|
||
| sig_only: Whether to plot only the significant values obtained from the bootstrapping analysis (True/False) | ||
|
|
||
| dir_ascending: Whether to plot the values in ascending order (True/False) | ||
|
|
||
| """ | ||
| #First sort the data into long form for the full dataset, by percentile | ||
| CIs_long = CIs.reset_index().melt("percentile").sort_values("value", ascending= dir_ascending) | ||
| CIs_long = CIs_long.reset_index() | ||
| CIs_long["index"] = pd.Series(range(0, CIs_long.shape[0]))#reset the index column to allow ordered plotting | ||
|
|
||
| #Now select if we want only significant values plotted, else raise an error. | ||
| if sig_only is True: | ||
| CIs_long_sig = sig_CIs.reset_index().melt("percentile").sort_values("value", ascending=dir_ascending) | ||
| CIs_long_sig = CIs_long_sig.reset_index() | ||
| CIs_long_sig["index"] = pd.Series(range(0, CIs_long_sig.shape[0])) | ||
|
|
||
| data = CIs_long_sig | ||
|
|
||
| elif sig_only is False: | ||
| data = CIs_long | ||
|
|
||
| else: | ||
| raise TypeError("Sig_only argument must be a boolean operator (True/False)") | ||
|
|
||
| #Plot this data for the experimental sessions as a pointplot. | ||
| for s, session in enumerate(exp): | ||
| name = session.name | ||
|
|
||
| p = sns.pointplot( | ||
| x="unit", y = "value", data = data.loc[(data.session == s)], | ||
| order = data.loc[(data.session == s)]["unit"].unique(), join = False, legend = None) #Plots in the order of the units as previously set, uses unique values to prevent double plotting | ||
|
|
||
| p.set_xlabel("Unit") | ||
| p.set_ylabel("Confidence Interval") | ||
| p.set(xticklabels=[]) | ||
| p.axhline(0) | ||
| plt.suptitle("\n".join(wrap(f"Confidence Intervals By Unit - Grasp vs. Baseline - Session {name}"))) #Wraps the title of the plot to fit on the page. | ||
|
|
||
| plt.show() |
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| { | ||
| // Use IntelliSense to learn about possible attributes. | ||
| // Hover to view descriptions of existing attributes. | ||
| // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 | ||
| "version": "0.2.0", | ||
| "configurations": [ | ||
| { | ||
| "name": "Python: Current File", | ||
| "type": "python", | ||
| "request": "launch", | ||
| "program": "${file}", | ||
| "console": "integratedTerminal", | ||
| "stopOnEntry": true, | ||
| "justMyCode": false, | ||
| "editor.bracketPairColorization.independentColorPoolPerBracketType": true | ||
| } | ||
| ] | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| { | ||
| "python.formatting.provider": "black" | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the functions in this module are duplicated elsewhere in one of your own scripts. What's the difference between them?