About

Observe existence , drink tea , experiment , cognize

Live services

Click and test it out
  • Memo cards
    Service helps to memorize and repeat words to enrich a vocubalary.
  • Status Page
    Check for the live services are running.

Articles

Subscribe

08-2023

Generated by ChatGPT

The author of the article initially planned to write about the Plotly community but shifted their focus to analyzing the performance of a multithreaded parser they implemented. They wanted to determine if the parser had any performance issues and if the data requesting, writing, and parsing processes were functioning correctly. The parser generated a log file containing different types of events and activities performed by the threads. The author used the klib library to visualize the dataset and discovered interesting insights such as the high frequency of certain events, the variety of values in the thread_name category, and the dominance of a single value in the target category. The continuous running index indicated that the parser ran without interruptions. The author further analyzed the event frequency per target and per thread, revealing the distribution of events across different targets and threads. The analysis showed that the event "task.get" was more prevalent in both targets, but more dominant in the "latest" target. The frequency per thread analysis indicated that the workload of each thread was similar, suggesting that the parser was functioning as expected. The author then explored the network latency time series and computed the time delta between events data.received and data.get. The latency overall was found to be between 2.5 and 5.0 seconds, with some larger values. The author plotted the latency distribution and identified three main groups. The latency per thread analysis showed that producers and consumers had different latency patterns, with producers having a more volatile pattern. The author concluded that the performance of the parser may vary depending on the thread type and recommended examining the data separately for producers and consumers. In addition, the author analyzed the disk write time series and found that producers wrote less intensively than consumers. They also identified patterns in the write load, such as spikes at certain times for certain consumers. The author used quantile plots to detect anomalies in the write times and concluded that the parser was performing well, as only a small percentage of writes took longer than expected. The afterword suggests that the subjects discussed in the article can be applied to a typical parser working with network resources and disk writes in parallel. The author did not anticipate the volatile nature of the Producers' work, but overall, the parser performed well.

06-2023

Generated by ChatGPT

The individual is experiencing RAM issues after updating their laptop OS from Kubuntu 20.04 to 22.04. They are unable to expand the memory due to lack of available slots, so they need to resize the swap volume. The process involves shrinking the file system, reducing the LVM volume, and extending the file system. They also increase the size of their swap volume by adding 3GB of space. The author found that a larger swap size improved the stability and performance of their system and questions why the standard Ubuntu installation suggests a small swap size.

03-2023

Generated by ChatGPT

The research being conducted focuses on the activity of moderators in the Streamlit community. The goal is to analyze how well the company develops its community by answering questions from beginners to advanced users. Data is collected and preprocessed for analysis, including information such as topic ID, user ID, username, post ID, creation time, post number, staff status, moderator status, and admin status. The data is collected from the Streamlit website's API endpoints, which provide access to topics and posts in JSON format. The collected data is then denormalized, converted to tabular format using Dask for parallel processing, and saved as Parquet files. The posts are serialized, fields are reduced, and duplicates are filtered for further analysis. The data is now ready to compute statistics and find trends. The top two moderators are the most active, with randyzwitch being the only leader. The distribution of responses over the community's lifetime shows that there are many unresponded topics on the right side of the distribution. The response frequency also decreases on the right side. On the left side, almost all topics are responded to, suggesting that they are closed after a long period of time. The first response delay statistics show a large dispersion, indicating that the dataset should be trimmed to compute a better fit. However, responses within 393 and 1093 days should not be considered abnormally large and have zero value to the questioners. The majority of responses are given within approximately 25 hours or one day, with a few coming in later. The common delay statistics summarize the above findings, indicating that the average response time from a moderator on the Streamlit community is 5.6 hours, with a range of 1.59 to 15.86 hours.

03-2023

Generated by ChatGPT

The author is researching the most active users in a Telegram group chat. They have prepared the data for analysis by converting it into a DataFrame and filtering out missing data. They have created a top 10 leaderboard based on the number of messages sent by each user, and have split the data into facets per month. The top two positions on the leaderboard remain consistent each month, with the group owner being the most active and a regular user being the second most active. The frequency of messages from the top two leaders is higher than others, and analyzing the distribution of these messages can reveal new activity trends. The author has gained a new understanding of the message ratios of the top two individuals compared to others.

02-2023

Generated by ChatGPT

The author is searching for an alternative communication channel to SSH for managing their home server. They are experiencing issues with a tunnel agent that disconnects and is unable to reconnect to a tunnel server, causing them to lose remote control of their server. They have considered other tunnel providers but have decided to stay with their current provider, Playit.gg, due to its free, easy-to-set-up, TCP/UDP functionality. However, they are still facing the issue and are looking for a solution to stay connected to the server when this problem occurs. The author has explored the option of using a bot on the Telegram platform to restart the agent automatically. They have registered the bot, implemented command handlers, and are running the bot on their home server. The bot has access to the docker daemon's unix socket to execute commands. The author has identified three essential commands and parameters for managing their server: listing containers, viewing container logs for detecting connection problems, and restarting containers to restore network state. Additionally, they have created a commands menu with descriptions of the commands and their parameters for future reference. The author has also created a repository called "tg-docker-command-bot" for this project.

01-2023

Generated by ChatGPT

The author discusses the challenges of data conversion and schema changes in relational databases and introduces the libraries Redframes and Dataset as tools to simplify the process. Redframes provides intuitive methods for working with data, while Dataset allows for NoSQL-like interaction with data. The focus is on working with datasets rather than individual rows, enabling efficient querying, filtering, and modification of data. In a demonstration problem, Dataset is used to automatically create new fields in a table and solve an inconsistency in event tracking code. The modified Dataset is then used to update the database state. Additionally, the solution is visualized using pie charts.

12-2022

Generated by ChatGPT

The author discusses the importance of colorizing printed text to improve focus and understanding when working with extended file attributes. They mention using termcolor as a helpful tool for text formatting and colorizing. Additionally, the author mentions the frustration of reading plain text when there are hundreds of lines, and how formatting the text as a column with a blank line separator at the bottom can help alleviate this issue. The final improvement is colorizing old and new attributes with the colored() function. The author provides an example snippet where colors are used to highlight differences between old and new attributes. They emphasize that colorization is a small investment of time but has a significant impact, and no complicated libraries are required. Alternatives to termcolor, such as colorama and rich, are also mentioned as options for text colorization.

12-2022

Generated by ChatGPT

The author used Matplotlib and Python code to create a visual cheat sheet table of derivative functions. They discuss their experience with using Matplotlib and mention the use of a script called make_table.py. The note suggests comparing exponential and tangent charts to understand their differences in velocity, speed, and derivative.

11-2022

Generated by ChatGPT

The author explores how to handle events to determine selected elements on a timeseries plot using a range tool in Bokeh. They suggest examining the events of both the Plot and RangeTool objects to understand how to extract the selected elements. The author provides a function to assign a callback to the events of the range_plot and range_tool objects and suggests printing out the event data to understand the specific types containing the selected range. They explore the events of the Plot object and identify PanStart, PanEnd, and RangesUpdate as the events that may contain the selected range. They suggest extending the callback to print the attributes of these events. The author also explores the events of the range_tool object and identifies the range_tool.x_range.start and range_tool.x_range.end attributes as the targets for extracting the selected range. They suggest checking the timestamps of these attributes to match the selection range position on the plot. Finally, the author suggests handling the PanEnd event and converting the timestamp attributes of the range_tool.x_range object to datetime to compute the final selection. The complete stand.py test stand script is provided, which allows for filtering the data source based on the selected elements for further server-side processing.

10-2022

Generated by ChatGPT

The author tested out Sourcegraph, a code search tool, and found it easy to set up but noticed a high number of processes running in the container. They also discovered a large number of subprocesses and were surprised by the complexity of a server instance. The author discusses resource estimation and notes that the estimated amount is significantly smaller than what was observed. The silver searcher is recommended as a better alternative for code search. The author acknowledges that Sourcegraph may be difficult to learn and expensive to maintain, but is willing to try it out for its features, especially for a large number of repositories.

10-2022

Generated by ChatGPT

The author has been using TOML files to store account data, which allows for a tree structure and the storage of multiple account data in one file. They highlight the benefits of TOML, such as its inclusion in Python 3.11 and its concise specification. The author also mentions the importance of accounts interrelation and the need for automation to check the integrity of these relationships. Additionally, they mention the use of references in the TOML files and a checker program that scans the files, follows the references, and checks for the existence of related files. The checker program offers both a text output for a quick check and a visual output in the form of a networkx graph on matplotlib to understand the accounts structure and identify any issues. The author provides a demonstration of the steps involved, including the use of a text output for regular reference checking, a visual graph output to understand the relationships, and an errors output to identify and fix any issues. They also mention that having the checker program allows them to not keep in mind all the accounts relations and preserve their integrity. The checker program is available in the reference-checker repository.

08-2022

Generated by ChatGPT

The author is seeking a solution to store the history of changes made to text files. They have considered using GIT but find it too complicated. They are now exploring LVM snapshots, ZFS, and BTRFS as potential solutions. The author conducted experiments comparing the performance of each file system and found that BTRFS was the preferred choice due to its speed, ease of installation, and simple usage. They also found that automation is possible using inotify_simple and python-btrfs-progs. The process can be wrapped into a systemd service to create file history snapshots.

05-2022

Generated by ChatGPT

The author is considering using their laptop and phone storage instead of a cloud disk for data storage. They weigh the benefits of increased data durability, availability, and security, as well as cost savings and eliminating the need for a third-party network node. However, they acknowledge the cons of manual syncing, lack of automation, sharing restrictions, and no file history. The author discusses their approach to manually syncing data between devices, using a hot spot with KDEConnect. They use FreeFileSync for syncing and KDiff3 for merging conflicts. FreeFileSync offers automatic syncing and a two-way sync mode. The author finds the usage experience satisfactory, with simplicity and cost savings being the main benefits.

01-2022

Generated by ChatGPT

The author faced a problem with extending the __str__ method of a list of objects from a third-party service. They found a solution by using the patching technique of unittest.mock to define a function that prints object details and patch the __str__ method. Although the approach is considered bad, it still achieves the desired goal.

11-2021

Generated by ChatGPT

The author switched from Chrome to Vivaldi due to its promising features and annoyance with Chrome. They praise Vivaldi for its ad blocking options and lack of cookie banners. The author finds tabs tiling, stacking, and pinning helpful for organization. They express concern about Chrome's automatic logging in to Google services. Vivaldi offers persistent page actions and a quick commands popup feature. The reader view feature allows for easier reading. Vivaldi's screenshot feature captures entire webpages. Mouse gestures, including rocker gestures, are suggested for navigation.

09-2021

Generated by ChatGPT

The author investigates slow bootloading time on their Linux laptop and discovers that SystemD is responsible. They use systemd-analyze to identify problematic services and disable unnecessary ones. However, they are unable to find chronological statistics for the services and conclude that manual collection may be necessary. The author is satisfied with the improved bootloading time but does not prioritize manual statistics collection.

07-2021

Generated by ChatGPT

This article compares the performance of columnar and row-based databases using DuckDB as an example. It shows that columnar databases are more efficient for complex OLAP workloads and significantly reduce query execution time compared to row-based databases like SQLite. The article provides steps for setting up and running the experiment, and visualizes the performance results. It concludes that storing data as column values on disk can greatly enhance performance in computational queries.

06-2021

Generated by ChatGPT

The summary discusses the use of Mitmproxy to proxy Python HTTPS requests and how some clients may encounter an error due to not having access to the OS ssl system-wide settings directories. It explains that this issue can be resolved by manually specifying a cert file for proxying. The summary also mentions the use of the environment variable REQUESTS_CA_BUNDLE to enable the HTTPS client to use the correct certificate file. Overall, the text suggests using Mitmproxy for intercepting and analyzing network traffic.

02-2021

Generated by ChatGPT

This article discusses the importance of selecting appropriate technologies and libraries to simplify the architecture of a service or program. It focuses on the selection of embedded database libraries and provides a list of practices and tools for analyzing these libraries. The article analyzes three libraries, TinyDB, Sqlitedict, and Peewee-kv, and concludes that sqlitedict is the simplest option while peewee-kv offers more customization and data storage capabilities. The conclusion emphasizes the importance of choosing the most suitable candidate based on specific requirements.