Ryujinx

mirror of https://github.com/Ryujinx/Ryujinx.git synced 2024-12-21 17:12:06 +00:00

Author	SHA1	Message	Date
gdkchan	95017b8c66	Support memory aliasing (#2954 ) * Back to the origins: Make memory manager take guest PA rather than host address once again * Direct mapping with alias support on Windows * Fixes and remove more of the emulated shared memory * Linux support * Make shared and transfer memory not depend on SharedMemoryStorage * More efficient view mapping on Windows (no more restricted to 4KB pages at a time) * Handle potential access violations caused by partial unmap * Implement host mapping using shared memory on Linux * Add new GetPhysicalAddressChecked method, used to ensure the virtual address is mapped before address translation Also align GetRef behaviour with software memory manager * We don't need a mirrorable memory block for software memory manager mode * Disable memory aliasing tests while we don't have shared memory support on Mac * Shared memory & SIGBUS handler for macOS * Fix typo + nits + re-enable memory tests * Set MAP_JIT_DARWIN on x86 Mac too * Add back the address space mirror * Only set MAP_JIT_DARWIN if we are mapping as executable * Disable aliasing tests again (still fails on Mac) * Fix UnmapView4KB (by not casting size to int) * Use ref counting on memory blocks to delay closing the shared memory handle until all blocks using it are disposed * Address PR feedback * Make RO hold a reference to the guest process memory manager to avoid early disposal Co-authored-by: nastys <nastys@users.noreply.github.com>	2022-05-02 20:30:02 -03:00
merry	6a1a03566a	T32: Implement load/store single (immediate) (#3186 ) * T32: Implement load/store single (immediate) * tests * tidy formatting * address comments	2022-04-21 01:25:43 +02:00
gdkchan	26a881176e	Fix tail merge from block with conditional jump to multiple returns (#3267 ) * Fix tail merge from block with conditional jump to multiple returns * PPTC version bump	2022-04-09 16:56:50 +02:00
merry	df70442c46	InstEmitMemoryEx: Barrier after write on ordered store (#3193 ) * InstEmitMemoryEx: Barrier after write on ordered store * increment ptc version * 32	2022-03-19 10:32:35 -03:00
merry	bb2f9df0a1	KThread: Fix GetPsr mask (#3180 ) * ExecutionContext: GetPstate / SetPstate * Put it in NativeContext * KThread: Fix GetPsr mask * ExecutionContext: Turn methods into Pstate property * Address nit	2022-03-11 03:16:32 +01:00
merry	7af9fcbc06	T32: Implement Data Processing (Modified Immediate) instructions (#3178 ) * T32: Implement Data Processing (Modified Immediate) instructions * Update tests * switch -> lookup table	2022-03-06 22:25:01 +01:00
merry	b97ff4da5e	A32: Fix ALU immediate instructions (#3179 ) * Tests: Add A32 tests for immediate ADC/ADCS/RSC/RSCS/SBC/SBCS * A32: Fix bug in ADC/ADCS/RSC/RSCS/SBC/SBCS * CpuTestAluImm32: Add more opcodes * Increment PTC version	2022-03-05 15:23:10 -03:00
merry	747081d2c7	Decoders: Fix instruction lengths for 16-bit B instructions (#3177 )	2022-03-05 16:20:24 +01:00
merry	497199bb50	Decoder: Exit on trapping instructions, and resume execution at trapping instruction (#3153 ) * Decoder: Exit on trapping instructions, and resume execution at trapping instruction * Resume at trapping address * remove mustExit	2022-03-04 23:16:58 +01:00
merry	bd9ac0fdaa	T32: Implement B, B.cond, BL, BLX (#3155 ) * Decoders: Make IsThumb a function of OpCode32 * OpCode32: Fix GetPc * T32: Implement B, B.cond, BL, BLX * rm usings	2022-03-04 23:05:08 +01:00
merry	7b35ebc64a	T32: Implement ALU (shifted register) instructions (#3135 ) * T32: Implement ADC, ADD, AND, BIC, CMN, CMP, EOR, MOV, MVN, ORN, ORR, RSB, SBC, SUB, TEQ, TST (shifted register) * OpCodeTable: Sort T32 list * Tests: Rename RandomTestCase to PrecomputedThumbTestCase * T32: Tests for AluRsImm instructions * fix nit * fix nit 2	2022-02-22 19:11:28 -03:00
merry	dc063eac83	ARMeilleure: Implement single stepping (#3133 ) * Decoder: Implement SingleInstruction decoder mode * Translator: Implement Step * DecoderMode: Rename Normal to MultipleBlocks	2022-02-22 11:11:42 -03:00
merry	f1460d5494	A32: Fix BLX and BXWritePC (#3151 )	2022-02-22 10:41:56 -03:00
Berkan Diler	644b497df1	Collapse AsSpan().Slice(..) calls into AsSpan(..) (#3145 ) * Collapse AsSpan().Slice(..) calls into AsSpan(..) Less code and a bit faster * Collapse an Array.Clear(array, 0, array.Length) call to Array.Clear(array)	2022-02-22 10:32:10 -03:00
gdkchan	f2087ca29e	PPTC version increment (#3139 )	2022-02-17 23:52:42 -03:00
gdkchan	92d166ecb7	Enable CPU JIT cache invalidation (#2965 ) * Enable CPU JIT cache invalidation * Invalidate cache on IC IVAU	2022-02-18 02:53:18 +01:00
merry	747876dc67	Decoders: Add IOpCode32HasSetFlags (#3136 )	2022-02-18 01:33:43 +01:00
merry	98e05ee4b7	ARMeilleure: Thumb support (All T16 instructions) (#3105 ) * Decoders: Add InITBlock argument * OpCodeTable: Minor cleanup * OpCodeTable: Remove existing thumb instruction implementations * OpCodeTable: Prepare for thumb instructions * OpCodeTables: Improve thumb fast lookup * Tests: Prepare for thumb tests * T16: Implement BX * T16: Implement LSL/LSR/ASR (imm) * T16: Implement ADDS, SUBS (reg) * T16: Implement ADDS, SUBS (3-bit immediate) * T16: Implement MOVS, CMP, ADDS, SUBS (8-bit immediate) * T16: Implement ANDS, EORS, LSLS, LSRS, ASRS, ADCS, SBCS, RORS, TST, NEGS, CMP, CMN, ORRS, MULS, BICS, MVNS (low registers) * T16: Implement ADD, CMP, MOV (high reg) * T16: Implement BLX (reg) * T16: Implement LDR (literal) * T16: Implement {LDR,STR}{,H,B,SB,SH} (register) * T16: Implement {LDR,STR}{,B,H} (immediate) * T16: Implement LDR/STR (SP) * T16: Implement ADR * T16: Implement Add to SP (immediate) * T16: Implement ADD/SUB (SP) * T16: Implement SXTH, SXTB, UXTH, UTXB * T16: Implement CBZ, CBNZ * T16: Implement PUSH, POP * T16: Implement REV, REV16, REVSH * T16: Implement NOP * T16: Implement LDM, STM * T16: Implement SVC * T16: Implement B (conditional) * T16: Implement B (unconditional) * T16: Implement IT * fixup! T16: Implement ADD/SUB (SP) * fixup! T16: Implement Add to SP (immediate) * fixup! T16: Implement IT * CpuTestThumb: Add randomized tests * Remove inITBlock argument * Address nits * Use index to handle IfThenBlockState * Reduce line noise * fixup * nit	2022-02-17 19:39:45 -03:00
Berkan Diler	9ca040c0ff	Use ReadOnlySpan<byte> compiler optimization for static data (#3130 )	2022-02-17 21:38:50 +01:00
merry	ce71f9144e	InstEmitMemory32: Literal loads always have word-aligned PC (#3104 )	2022-02-11 17:51:03 -03:00
gdkchan	c3c3914ed3	Add a limit on the number of uses a constant may have (#3097 )	2022-02-09 17:42:47 -03:00
merry	86b37d0ff7	ARMeilleure: A32: Implement SHSUB8 and UHSUB8 (#3089 ) * ARMeilleure: A32: Implement UHSUB8 * ARMeilleure: A32: Implement SHSUB8	2022-02-08 10:46:42 +01:00
merry	88d3ffb97c	ARMeilleure: A32: Implement SHADD8 (#3086 )	2022-02-06 12:25:45 -03:00
merry	222b1ad7da	ARMeilleure: OpCodeTable: Add CMN (RsReg) (#3087 )	2022-02-06 02:01:05 +01:00
gdkchan	bd412afb9f	Fix small precision error on CPU reciprocal estimate instructions (#3061 ) * Fix small precision error on CPU reciprocal estimate instructions * PPTC version bump	2022-01-29 23:59:34 +01:00
gdkchan	f3bfd799e1	Fix calls passing V128 values on Linux (#3034 ) * Fix calls passing V128 values on Linux * PPTC version bump	2022-01-24 11:23:24 +01:00
gdkchan	f0824fde9f	Add host CPU memory barriers for DMB/DSB and ordered load/store (#3015 ) * Add host CPU memory barriers for DMB/DSB and ordered load/store * PPTC version bump * Revert to old barrier order	2022-01-21 12:47:34 -03:00
sharmander	60f7cba30a	Implement FCVTNS (Scalar GP) (#2953 ) * Implement FCVTNS (Scalar GP) * Update Ptc Version	2022-01-19 22:21:44 -03:00
gdkchan	bd215e447d	Fix return type mismatch on 32-bit titles (#3000 )	2022-01-16 08:39:43 -03:00
sharmander	e5f7ff1eee	CPU - Implement FCVTMS (Vector) (#2937 ) * Add FCVTMS_V Implementation to Armeilleure * Fix opcode designation * Add tests * Amend Ptc version * Fix OpCode / Tests * Create Math.Floor helper method + Update implementation * Address gdk comments * Re-address gdk comments * Update ARMeilleure/Decoders/OpCodeTable.cs Co-authored-by: gdkchan <gab.dark.100@gmail.com> * Update Tests to use 2S (4S) and 2D Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2022-01-04 16:45:28 -03:00
gdkchan	e24949ca2c	Implement CSDB instruction (#2927 )	2021-12-19 11:19:05 -03:00
Mary	00c69f2098	Remove usage of Mono.Posix.NETStandard accross all projects (#2906 ) * Remove usage of Mono.Posix.NETStandard in Ryujinx project * Remove usage of Mono.Posix.NETStandard in ARMeilleure project * Remove usage of Mono.Posix.NETStandard in Ryujinx.Memory project * Address gdkchan's comments	2021-12-08 18:24:26 -03:00
Piyachet Kanda	3e2f89b4fd	Implement UHADD8 instruction (#2908 ) * Implement UHADD8 instruction along with a test unit * Update PTC revision number	2021-12-08 17:05:59 -03:00
Mary	f39fce8f54	misc: Migrate usage of RuntimeInformation to OperatingSystem (#2901 ) Very basic migration across the codebase.	2021-12-04 20:02:30 -03:00
Mary	57d3296ba4	infra: Migrate to .NET 6 (#2829 ) * infra: Migrate to .NET 6 * Rollback version naming change * Workaround .NET 6 ZipArchive API issues * ci: Switch to VS 2022 for AppVeyor CI is now ready for .NET 6 * Suppress WebClient warning in DoUpdateWithMultipleThreads * Attempt to workaround System.Drawing.Common changes on 6.0.0 * Change keyboard rendering from System.Drawing to ImageSharp * Make the software keyboard renderer multithreaded * Bump ImageSharp version to 1.0.4 to fix a bug in Image.Load * Add fallback fonts to the keyboard renderer * Fix warnings * Address caian's comment * Clean up linux workaround as it's uneeded now * Update readme Co-authored-by: Caian Benedicto <caianbene@gmail.com>	2021-11-28 21:24:17 +01:00
FICTURE7	fbf40424f4	Add an early `TailMerge` pass (#2721 ) * Add an early `TailMerge` pass Some translations can have a lot of guest calls and since for each guest call there is a call guard which may return. This can produce a lot of epilogue code for returns. This pass merges the epilogue into a single block. ``` Using filter 'hcq'. Using metric 'code size'. Total diff: -1648111 (-7.19 %) (bytes): Base: 22913847 Diff: 21265736 Improved: 4567, regressed: 14, unchanged: 144 ``` * Set PTC version * Address feedback * Handle `void` returning functions * Actually handle `void` returning functions * Fix `RegisterToLocal` logging	2021-10-18 19:51:22 -03:00
FICTURE7	69093cf2d6	Optimize LSRA (#2563 ) * Optimize `TryAllocateRegWithtoutSpill` a bit * Add a fast path for when all registers are live. * Do not query `GetOverlapPosition` if the register is already in use (i.e: free position is 0). * Do not allocate child split list if not parent * Turn `LiveRange` into a reference struct `LiveRange` is now a reference wrapping struct like `Operand` and `Operation`. It has also been changed into a singly linked-list. In micro-benchmarks traversing the linked-list was faster than binary search on `List<T>`. Even for quite large input sizes (e.g: 1,000,000), surprisingly. Could be because the code gen for traversing the linked-list is much much cleaner and there is no virtual dispatch happening when checking if intervals overlaps. * Turn `LiveInterval` into an iterator The LSRA allocates in forward order and never inspect previous `LiveInterval` once they are expired. Something similar can be done for the `LiveRange`s within the `LiveInterval`s themselves. The `LiveInterval` is turned into a iterator which expires `LiveRange` within it. The iterator is moved forward along with interval walking code, i.e: AllocateInterval(context, interval, cIndex). * Remove `LinearScanAllocator.Sources` Local methods are less susceptible to do allocations than lambdas. * Optimize `GetOverlapPosition(interval)` a bit Time complexity should be in O(n+m) instead of O(nm) now. * Optimize `NumberLocals` a bit Use the same idea as in `HybridAllocator` to store the visited state in the MSB of the Operand's value instead of using a `HashSet<T>`. * Optimize `InsertSplitCopies` a bit Avoid allocating a redundant `CopyResolver`. * Optimize `InsertSplitCopiesAtEdges` a bit Avoid redundant allocations of `CopyResolver`. * Use stack allocation for `freePositions` Avoid redundant computations. * Add `UseList` Replace `SortedIntegerList` with an even more specialized data structure. It allocates memory on the arena allocators and does not require copying use positions when splitting it. * Turn `LiveInterval` into a reference struct `LiveInterval` is now a reference wrapping struct like `Operand` and `Operation`. The rationale behind turning this in a reference wrapping struct is because a `LiveInterval` is associated with each local variable, and these intervals may themselves be split further. I've seen translations having up to 8000 local variables. To make the `LiveInterval` unmanaged, a new data structure called `LiveIntervalList` was added to store child splits. This differs from `SortedList<,>` because it can contain intervals with the same start position. Really wished we got some more of C++ template in C#. :^( * Optimize `GetChildSplit` a bit No need to inspect the remaining ranges if we've reached a range which starts after position, since the split list is ordered. * Optimize `CopyResolver` a bit Lazily allocate the fill, spill and parallel copy structures since most of the time only one of them is needed. * Optimize `BitMap.Enumerator` a bit Marking `MoveNext` as `AggressiveInlining` allows RyuJIT to promote the `Enumerator` struct into registers completely, reducing load/store code a lot since it does not have to store the struct on the stack for ABI purposes. * Use stack allocation for `use/blockedPositions` * Optimize `AllocateWithSpill` a bit * Address feedback * Make `LiveInterval.AddRange(,)` more conservative Produces no diff against master, but just for good measure.	2021-10-08 18:15:44 -03:00
FICTURE7	ecc64c934d	Add `Operand.Label` support to `Assembler` (#2680 ) * Add `Operand.Label` support to `Assembler` This adds label support to `Assembler` and enables branch tightening when compiling with relocatables. Jump management and patching has been moved to the `Assembler`. * Move instruction table to `Assembler.Table` * Set PTC internal version * Rename `Assembler.Table` to `AssemblerTable`	2021-10-05 14:04:55 -03:00
riperiperi	d92fff541b	Replace CacheResourceWrite with more general "precise" write (#2684 ) * Replace CacheResourceWrite with more general "precise" write The goal of CacheResourceWrite was to notify GPU resources when they were modified directly, by looking up the modified address/size in a structure and calling a method on each resource. The downside of this is that each resource cache has to be queried individually, they all have to implement their own way to do this, and it can only signal to resources using the same PhysicalMemory instance. This PR adds the ability to signal a write as "precise" on the tracking, which signals a special handler (if present) which can be used to avoid unnecessary flush actions, or maybe even more. For buffers, precise writes specifically do not flush, and instead punch a hole in the modified range list to indicate that the data on GPU has been replaced. The downside is that precise actions must ignore the page protection bits and always signal - as they need to notify the target resource to ignore the sequence number optimization. I had to reintroduce the sequence number increment after I2M, as removing it was causing issues in rabbids kingdom battle. However - all resources modified by I2M are notified directly to lower their sequence number, so the problem is likely that another unrelated resource is not being properly updated. Thankfully, doing this does not affect performance in the games I tested. This should fix regressions from #2624. Test any games that were broken by that. (RF4, rabbids kingdom battle) I've also added a sequence number increment to ThreedClass.IncrementSyncpoint, as it seems to fix buffer corruption in OpenGL homebrew. (this was a regression from removing sequence number increment from constant buffer update - another unrelated resource thing) * Add tests. * Add XML docs for GpuRegionHandle * Skip UpdateProtection if only precise actions were called This allows precise actions to skip reprotection costs.	2021-09-29 02:27:03 +02:00
FICTURE7	312be74861	Optimize `HybridAllocator` (#2637 ) * Store constant `Operand`s in the `LocalInfo` Since the spill slot and register assigned is fixed, we can just store the `Operand` reference in the `LocalInfo` struct. This allows skipping hitting the intern-table for a look up. * Skip `Uses`/`Assignments` management Since the `HybridAllocator` is the last pass and we do not care about uses/assignments we can skip managing that when setting destinations or sources. * Make `GetLocalInfo` inlineable Also fix a possible issue where with numbered locals. See or-assignment operator in `SetVisited(local)` before patch. * Do not run `BlockPlacement` in LCQ With the host mapped memory manager, there is a lot less cold code to split from hot code. So disabling this in LCQ gives some extra throughput - where we need it. * Address Mou-Ikkai's feedback * Apply suggestions from code review Co-authored-by: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com> * Move check to an assert Co-authored-by: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>	2021-09-29 01:38:37 +02:00
riperiperi	1ae690ba2f	Use normal memory store path for DC ZVA (#2693 ) Seems like this is used as an optimized way to clear memory in homebrew applications. Unfortunately, calling the software fallback method every 8 bytes was not very optimal. The existing EmitStore is used by passing in ZR as the register to get a 0 write.	2021-09-29 01:21:30 +02:00
FICTURE7	0d23504e30	Fix PTC count table relocation patching (#2666 ) Fix an issue introduced in #2190 where by 2 different count table entry addresses were used for LCQ functions. E.g: ```asm .L1: mov rbp,COUNT_TABLE_0 ;; This gets an address. mov ebp,[rbp] lea esi,[rbp+1] mov rdi,COUNT_TABLE_1 ;; This gets another address. mov [rdi],esi cmp ebp,64h je near .L34 ``` This caused LCQ functions to not tier up when they're loaded from the PTC cache. This does not happen when they're freshly compiled. This PR fixes the issue by ensuring only a single counter is created per translation.	2021-09-29 00:28:34 +02:00
FICTURE7	a9343c9364	Refactor `PtcInfo` (#2625 ) * Refactor `PtcInfo` This change reduces the coupling of `PtcInfo` by moving relocation tracking to the backend. `RelocEntry`s remains as `RelocEntry`s through out the pipeline until it actually needs to be written to the PTC streams. Keeping this representation makes inspecting and manipulating relocations after compilations less painful. This is something I needed to do to patch relocations to 0 to diff dumps. Contributes to #1125. * Turn `Symbol` & `RelocInfo` into readonly structs * Add documentation to `CompiledFunction` * Remove `Compiler.Compile<T>` Remove `Compiler.Compile<T>` and replace it by `Map<T>` of the `CompiledFunction` returned.	2021-09-14 01:23:37 +02:00
mpnico	117e32a6ff	Implement a "Pause Emulation" option & hotkey (#2428 ) * Add a "Pause Emulation" option and hotkey Closes Ryujinx#1604 * Refactoring how pause is handled * Applied suggested changes from review * Applied suggested fixes * Pass correct suspend type to threads for suspend/resume * Fix NRE after stoping emulation * Removing SimulateWakeUpMessage call after resuming emulation * Skip suspending non game process * Pause the tickCounter in the ExecutionContext * Refactoring tickCounter pause/resume as suggested * Fix Config migration to add pause hotkey * Fixed pausing only application threads * Fix exiting emulator while paused * Avoid pause/resume while already paused/resumed * Cleanup unused code * Avoid restarting audio if stopping emulation while in pause. * Added suggested changes * Fix ConfigurationState	2021-09-11 22:08:25 +02:00
Mary	501c3d5cea	Implement MSR instruction for A32 (#2585 ) * Implement MSR instruction Fix #1342. Now Pocket Rumble is playable. * Address gdkchan's comments * Address gdkchan's comments * Address gdkchan's comment	2021-08-27 00:07:44 +02:00
FICTURE7	f2a7b300c4	Fix type mismatch in `BitwiseAnd` simplification (#2571 ) * Fix type mismatch in `BitwiseAnd` simplification `TryEliminateBitwiseAnd` would turn the `BitwiseAnd` operation into a copy of the wrong type. E.g: Before `Simplification`: ```llvm i64 %0 = BitwiseAnd i64 0x0, %1 ``` After `Simplication`: ```llvm i64 %0 = Copy i32 0x0 ``` Since the with the changes in #2515, we iterate in reverse order and `Simplication`, `ConstantFolding` does not indicate if it modified the CFG, the second pass to "retype" the copy into the proper destination type does not happen. This also blocked copy propagation since its destination type did not match with its source type. But in the cases I've seen, the `PreAllocator` would insert a copy for the propagated constant, which results in no diffs. Since the copy remained as is, asserts are fired when generating it. * Set PPTC version	2021-08-20 14:42:00 -03:00
FICTURE7	22b2cb39af	Reduce JIT GC allocations (#2515 ) * Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.	2021-08-17 15:08:34 -03:00
gdkchan	ab9d4b862d	Implement VORN (register) Arm32 instruction (#2396 )	2021-06-23 23:21:23 +02:00
FICTURE7	9d7627af64	Add multi-level function table (#2228 ) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped	2021-05-29 18:06:28 -03:00
riperiperi	54ea2285f0	POWER - Performance Optimizations With Extensive Ramifications (#2286 ) * Refactoring of KMemoryManager class * Replace some trivial uses of DRAM address with VA * Get rid of GetDramAddressFromVa * Abstracting more operations on derived page table class * Run auto-format on KPageTableBase * Managed to make TryConvertVaToPa private, few uses remains now * Implement guest physical pages ref counting, remove manual freeing * Make DoMmuOperation private and call new abstract methods only from the base class * Pass pages count rather than size on Map/UnmapMemory * Change memory managers to take host pointers * Fix a guest memory leak and simplify KPageTable * Expose new methods for host range query and mapping * Some refactoring of MapPagesFromClientProcess to allow proper page ref counting and mapping without KPageLists * Remove more uses of AddVaRangeToPageList, now only one remains (shared memory page checking) * Add a SharedMemoryStorage class, will be useful for host mapping * Sayonara AddVaRangeToPageList, you served us well * Start to implement host memory mapping (WIP) * Support memory tracking through host exception handling * Fix some access violations from HLE service guest memory access and CPU * Fix memory tracking * Fix mapping list bugs, including a race and a error adding mapping ranges * Simple page table for memory tracking * Simple "volatile" region handle mode * Update UBOs directly (experimental, rough) * Fix the overlap check * Only set non-modified buffers as volatile * Fix some memory tracking issues * Fix possible race in MapBufferFromClientProcess (block list updates were not locked) * Write uniform update to memory immediately, only defer the buffer set. * Fix some memory tracking issues * Pass correct pages count on shared memory unmap * Armeilleure Signal Handler v1 + Unix changes Unix currently behaves like windows, rather than remapping physical * Actually check if the host platform is unix * Fix decommit on linux. * Implement windows 10 placeholder shared memory, fix a buffer issue. * Make PTC version something that will never match with master * Remove testing variable for block count * Add reference count for memory manager, fix dispose Can still deadlock with OpenAL * Add address validation, use page table for mapped check, add docs Might clean up the page table traversing routines. * Implement batched mapping/tracking. * Move documentation, fix tests. * Cleanup uniform buffer update stuff. * Remove unnecessary assignment. * Add unsafe host mapped memory switch On by default. Would be good to turn this off for untrusted code (homebrew, exefs mods) and give the user the option to turn it on manually, though that requires some UI work. * Remove C# exception handlers They have issues due to current .NET limitations, so the meilleure one fully replaces them for now. * Fix MapPhysicalMemory on the software MemoryManager. * Null check for GetHostAddress, docs * Add configuration for setting memory manager mode (not in UI yet) * Add config to UI * Fix type mismatch on Unix signal handler code emit * Fix 6GB DRAM mode. The size can be greater than `uint.MaxValue` when the DRAM is >4GB. * Address some feedback. * More detailed error if backing memory cannot be mapped. * SetLastError on all OS functions for consistency * Force pages dirty with UBO update instead of setting them directly. Seems to be much faster across a few games. Need retesting. * Rebase, configuration rework, fix mem tracking regression * Fix race in FreePages * Set memory managers null after decrementing ref count * Remove readonly keyword, as this is now modified. * Use a local variable for the signal handler rather than a register. * Fix bug with buffer resize, and index/uniform buffer binding. Should fix flickering in games. * Add InvalidAccessHandler to MemoryTracking Doesn't do anything yet * Call invalid access handler on unmapped read/write. Same rules as the regular memory manager. * Make unsafe mapped memory its own MemoryManagerType * Move FlushUboDirty into UpdateState. * Buffer dirty cache, rather than ubo cache Much cleaner, may be reusable for Inline2Memory updates. * This doesn't return anything anymore. * Add sigaction remove methods, correct a few function signatures. * Return empty list of physical regions for size 0. * Also on AddressSpaceManager Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2021-05-24 22:52:44 +02:00
gdkchan	fb65f392d1	Improve accuracy of reciprocal step instructions (#2305 ) * Improve accuracy of reciprocal step instructions * Fix small mistake on RECPE rounding, nits, PTC version bump	2021-05-24 20:20:07 +10:00
FICTURE7	65ac00833a	Use branch instead of tailcall for recursive calls (#2282 ) * Use branch instead of tailcall for recursive calls Use a branch instead of doing a tailcall for recursive calls. This avoids having to store the dispatch address, setting up the epilogue and keeps guest registers in host registers for longer. The rejit check is moved down into the entry block so that the rejit behaviour remains the same as before. * Set PTC version Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2021-05-20 09:31:45 -03:00
FICTURE7	0181068016	Add `BIC/ORR Vd.T, #imm` fast path (#2279 ) * Add fast path for BIC Vd.T, #imm * Add fast path for ORR Vd.T, #imm * Set PTC version * Fixup Exception to InvalidOperationException	2021-05-20 09:09:17 -03:00
FICTURE7	c805542b29	Allow `LocalVariable` to be assigned more than once (#2288 ) * Allow `LocalVariable` to be assigned more than once This allows us to write flow controls like loops and if-elses with LocalVariables participating in phi nodes. * Add `GetLocalNumber` to operand	2021-05-17 01:54:53 +02:00
gdkchan	e318022b89	Fold constant offsets and group constant addresses (#2285 ) * Fold constant offsets and group constant addresses * PPTC version bump	2021-05-13 21:26:57 +02:00
LDj3SNuD	57ea3f93a3	PPTC meets ExeFS Patching. (#1865 ) * PPTC meets ExeFS Patching. * InternalVersion = 1865 * Ready! * Optimized the PtcProfiler Load/Save methods.	2021-05-13 20:05:15 +02:00
FICTURE7	89791ba68d	Add inlined on translation call counting (#2190 ) * Add EntryTable<TEntry> * Add on translation call counting * Add Counter * Add PPTC support * Make Counter a generic & use a 32-bit counter instead * Return false on overflow * Set PPTC version * Print more information about the rejit queue * Make Counter<T> disposable * Remove Block.TailCall since it is not used anymore * Apply suggestions from code review Address gdkchan's feedback Co-authored-by: gdkchan <gab.dark.100@gmail.com> * Fix more stale docs * Remove rejit requests queue logging * Make Counter<T> finalizable Most certainly quite an odd use case. * Make EntryTable<T>.TryAllocate set entry to default * Re-trigger CI * Dispose Counters before they hit the finalizer queue * Re-trigger CI Just for good measure... * Make EntryTable<T> expandable * EntryTable is now expandable instead of being a fixed slab. * Remove EntryTable<T>.TryAllocate * Remove Counter<T>.TryCreate Address LDj3SNuD's feedback * Apply suggestions from code review Address LDj3SNuD's feedback Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Remove useless return * POH approach, but the sequel * Revert "POH approach, but the sequel" This reverts commit `5f5abaa247`. The sequel got shelved * Add extra documentation Co-authored-by: gdkchan <gab.dark.100@gmail.com> Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>	2021-04-18 23:43:53 +02:00
LDj3SNuD	90163087a0	PPTC vs. giant ExeFS. (#2168 ) * PPTC vs. giant ExeFS. * InternalVersion = 2168 * Add new heuristic algorithm for calculating the number of threads for parallel translations that also takes into account the user's free physical memory and not just the number of CPU cores. * Nit. * Add an outer Header structure and add the hashes for both this new structure and the existing "inner" Header structure. * InternalVersion = 2169	2021-04-13 03:24:36 +02:00
gdkchan	d43a56726c	(CPU) Fix CRC32 instruction when constant values are used as input (#2183 )	2021-04-07 23:43:08 +02:00
FICTURE7	98ed81e4cd	Improve `StoreToContext` emission (#2155 ) * Improve StoreToContext emission Hoist StoreToContext in dynamic branch fast & slow paths out into their predecessor. Reduces register pressure, code size and compile time because we're throwing less stuff down the pipeline. * Set PTC internal version * Turn EmitDynamicTableCall private * Re-trigger CI	2021-04-02 19:54:23 +02:00
FICTURE7	8b3eba7e13	Reduce allocation during SSA construction (#2162 ) * Reduce allocation during SSA construction * Re-trigger CI	2021-04-02 19:26:16 +02:00
LDj3SNuD	4bd1ad16f9	Add Sqdmulh_Ve & Sqrdmulh_Ve Inst.s with Tests. (#2139 )	2021-03-25 23:33:32 +01:00
mageven	69f8722e79	Fix inconsistencies in progress reporting (#2129 ) Signal and setup events correctly Eliminate possible races Use a single event Mark volatiles and reduce scope of waithandles Common handler 100ms -> 50ms	2021-03-22 19:40:07 +01:00
mageven	ca5d8e58dd	Add progress reporting to PTC and Shader Cache (#2057 ) * UI changes * Add progress reporting to PTC & ShaderCache * Account for null events and expand docs Co-authored-by: Joshi234 <46032261+Joshi234@users.noreply.github.com>	2021-03-03 01:39:36 +01:00
LDj3SNuD	bcbf240d2e	PPTC: Fix unwanted propagation of a relocatable constant in a specific case. (#1990 ) * Fix unwanted propagation of a relocatable constant in a specific case. * Ptc.InternalVersion = 1990 * Nit to retrigger the Checks.	2021-02-23 13:15:45 +01:00
mageven	9bda7b4699	Implement VCNT instruction (#1963 ) * Implement VCNT based on AArch64 CNT Add tests * Update PTC version * Address LDj's comments * Explicit size in encoding * Tighter tests * Replace SoftFallback with IR helper Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Reduce one BitwiseAnd from IR fallback Based on popcount64b from https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation * Rename parameter and add assert Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>	2021-02-22 16:26:13 +01:00
LDj3SNuD	dc0adb533d	PPTC & Pool Enhancements. (#1968 ) * PPTC & Pool Enhancements. * Avoid buffer allocations in CodeGenContext.GetCode(). Avoid stream allocations in PTC.PtcInfo. Refactoring/nits. * Use XXHash128, for Ptc.Load & Ptc.Save, x10 faster than Md5. * Why not a nice Span. * Added a simple PtcFormatter library for deserialization/serialization, which does not require reflection, in use at PtcJumpTable and PtcProfiler; improves maintainability and simplicity/readability of affected code. * Nits. * Revert #1987. * Revert "Revert #1987." This reverts commit `998be765cf`.	2021-02-22 03:23:48 +01:00
FICTURE7	1586880114	Turn Copy into Fill in HybridAllocator (#2010 ) * Turn Copy into Fill in HybridAllocator * Set PTC internal verison	2021-02-21 18:33:59 +01:00
gdkchan	9d82d27df2	Fix memory tracking performance regression (#2026 ) * Fix memory tracking performance regression * Set PTC version	2021-02-17 09:16:20 +11:00
gdkchan	715b605e95	Validate CPU virtual addresses on access (#1987 ) * Enable PTE null checks again * Do address validation on EmitPtPointerLoad, and make it branchless * PTC version increment * Mask of pointer tag for exclusive access * Move mask to the correct place Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>	2021-02-16 19:04:19 +01:00
sharmander	40797a1283	Optimization \| Modify Add (Integer) Instruction to use LEA instead. (#1971 ) * Optimization \| Modify Add Instruction to use LEA instead. Currently, the add instruction requires 4 registers to take place. By using LEA, we can effectively perform the same working using 3 registers, reducing memory spills and improving translation efficiency. * Fix IsSameOperandDestSrc1 Check for Add * Use LEA if Dest != SRC1 * Update IsSameOperandDestSrc1 to account for Cases where Dest and Src1 can be same for add * Fix error in logic * Typo * Add paranthesis for clarity * Compare registers as requested. * Cleanup if statement, use same comparison method as generateCopy * Make change as recommended by gdk * Perform check only when Add calls are made * use ensure sametype for lea, fix else * Update comment * Update version #	2021-02-08 10:49:46 +11:00
gdkchan	ee28ccebf4	Disable partial JIT invalidation on unmap (#1991 )	2021-02-08 10:25:14 +11:00
gdkchan	dcce407071	Lower precision of estimate instruction results to match Arm behavior (#1943 ) * Lower precision of estimate instruction results to match Arm behavior * PTC version update * Nits	2021-01-28 10:23:00 +11:00
mageven	c19cfca183	Implement PRFM (register variant) as NOP (#1956 ) * Implement PRFM (register variant) as NOP Fix typo pfrm -> prfm Add comments to distinguish variants * Increment PTC version	2021-01-26 16:09:27 +11:00
FICTURE7	ddf1105bcb	Add VCLZ.* fast path (#1917 ) * Add VCLZ fast path * Add VCLZ.8B/16B SSSE3 fast path * Add VCLZ.4H/8H SSSE3 fast path * Add VCLZ.2S/4S SSE2 fast path * Improve CLZ.4H/8H fast path * Improve CLZ.2S/4S fast path * Set PPTC version	2021-01-25 10:01:25 +11:00
LDj3SNuD	c3e0c41da3	CPU (A64): Add Fmaxnmp & Fminnmp Scalar Inst.s, Fast & Slow Paths; with Tests. (#1894 )	2021-01-20 09:12:33 +11:00
LDj3SNuD	68f6b79fd3	Add a simple Pools Limiter. (#1830 ) * Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs. Added invalidation of .cache files in the event of reuse on a different user operating system. Added .info and .cache files invalidation in case of a failed stream decompression. Nits. * InternalVersion = 1712; * Nits. * Address comment. * Get rid of BinaryFormatter. Nits. * Move Ptc.LoadTranslations(). Nits. * Nits. * Fixed corner cases (in case backup copies have to be used). Added save logs. * Not core fixes. * Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers. * Add LoadTranslations log. * Nits. * Removed the search and management of LowCq overlapping functions. * Final increment of .info and .cache flags. * Nit. * Free up memory allocated by Pools during any PPTC translations at boot time. * Nit due to rebase. * Add a simple Pools Limiter. * Nits. * Fix missing JumpTable.RegisterFunction() due to rebase. Clear MemoryStreams as soon as possible, when they are no longer needed. * Code cleaning. * Nit for retrigger Checks. * Update Ptc.cs * Contextual refactoring of Translator. Ignore resetting of pools for DirectCallStubs. * Nit for retrigger Checks.	2021-01-12 19:04:02 +01:00
LDj3SNuD	430ba6da65	CPU (A64): Add Pmull_V Inst. with Clmul fast path for the "1/2D -> 1Q" variant & Sse fast path and slow path for both the "8/16B -> 8H" and "1/2D -> 1Q" variants; with Test. (#1817 ) * Add Pmull_V Sse fast path only, both "8/16B -> 8H" and "1/2D -> 1Q" variants; with Test. * Add Clmul fast path for the 128 bits variant. * Small optimisation (save 60 instructions) for the Sse fast path about the 128 bits variant. * Add slow path, both variants. Fix V128 Shl/Shr when shift = 0. * A32: Add Vmull_I P64 variant (slow path); not tested. * A32: Add Vmull_I_P8_P64 Test and fix P64 variant.	2021-01-04 23:45:54 +01:00
Ac_K	5b9c876155	Hotfix for #1814	2020-12-24 04:44:39 +01:00
LDj3SNuD	2502f1f07f	Free up memory allocated by Pools during any PPTC translations at boot time. (#1814 ) * Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs. Added invalidation of .cache files in the event of reuse on a different user operating system. Added .info and .cache files invalidation in case of a failed stream decompression. Nits. * InternalVersion = 1712; * Nits. * Address comment. * Get rid of BinaryFormatter. Nits. * Move Ptc.LoadTranslations(). Nits. * Nits. * Fixed corner cases (in case backup copies have to be used). Added save logs. * Not core fixes. * Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers. * Add LoadTranslations log. * Nits. * Removed the search and management of LowCq overlapping functions. * Final increment of .info and .cache flags. * Nit. * Free up memory allocated by Pools during any PPTC translations at boot time. * Nit due to rebase.	2020-12-24 03:58:36 +01:00
LDj3SNuD	8a33e884f8	Fix Vnmls_S fast path (F64: losing input d value). Fix Vnmla_S & Vnmls_S slow paths (using fused inst.s). Fix Vfma_V slow path not using StandardFPSCRValue(). (#1775 ) * Fix Vnmls_S fast path (F64: losing input d value). Fix Vnmla_S & Vnmls_S slow paths (using fused inst.s). Add Vfma_S & Vfms_S Fma fast paths. Add Vfnma_S inst. with Fma/Sse fast paths and slow path. Add Vfnms_S Sse fast path. Add Tests for affected inst.s. Nits. * InternalVersion = 1775 * Nits. * Fix Vfma_V slow path not using StandardFPSCRValue(). * Nit: Fix Vfma_V order. * Add Vfms_V Sse fast path and slow path. * Add Vfma_V and Vfms_V Test.	2020-12-17 20:43:41 +01:00
LDj3SNuD	b5c215111d	PPTC Follow-up. (#1712 ) * Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs. Added invalidation of .cache files in the event of reuse on a different user operating system. Added .info and .cache files invalidation in case of a failed stream decompression. Nits. * InternalVersion = 1712; * Nits. * Address comment. * Get rid of BinaryFormatter. Nits. * Move Ptc.LoadTranslations(). Nits. * Nits. * Fixed corner cases (in case backup copies have to be used). Added save logs. * Not core fixes. * Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers. * Add LoadTranslations log. * Nits. * Removed the search and management of LowCq overlapping functions. * Final increment of .info and .cache flags. * Nit. * GetIndirectFunctionAddress(): Validate that writing actually takes place in dynamic table memory range (and not elsewhere). * Fix Ptc.UpdateInfo() due to rebase. * Nit for retrigger Checks. * Nit for retrigger Checks.	2020-12-17 20:32:09 +01:00
sharmander	e901b7850c	CPU: Implement VRINTX.F32 \| VRINTX.F64 (#1776 ) * Start implementation * Draft * Updated opcode. Needs verification. * Clean up code. * Update implementation and tests. * Update implemenation + tests * Get RM from FPSCR + Do not use emit/addintrinsic * Remove "fast" path, as recommended by gdk. * Variable DELETED. * Update ARMeilleure/Decoders/OpCodeTable.cs Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Move method * stringing things together :) Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>	2020-12-16 20:27:15 -03:00
gdkchan	61634dd415	Clear JIT cache on exit (#1518 ) * Initial cache memory allocator implementation * Get rid of CallFlag * Perform cache cleanup on exit * Basic cache invalidation * Thats not how conditionals works in C# it seems * Set PTC version to PR number * Address PR feedback * Update InstEmitFlowHelper.cs * Flag clear on address is no longer needed * Do not include exit block in function size calculation * Dispose jump table * For future use * InternalVersion = 1519 (force retest). Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>	2020-12-16 17:07:42 -03:00
sharmander	3332b29f01	CPU: Implement VFMA (Vector) (#1762 ) * Implement VFMA.F64 * Simplify switch * Simplify FMA Instructions into their own IntrinsicType. * Remove whitespace * Fix indentation * Change tests for Vfnms -- disable inf / nan * Move args up, not description ;) * Implementation Complete. All Tests Pass (Slow / Fast Path) * Move location of function in assembler + test updates. * Shift params upwards * Remove unused function * Update PTC version. * Add comments / re-oreder opcode table. * Remove whitespace * Fix nit * Fix nit. * Fix whitespace * Wrong opcode was used by a bad merge. * Addressed rip's comments.	2020-12-15 00:01:52 -03:00
gdkchan	47ba81c661	Fix pre-allocator shift instruction copy on a specific case (#1752 ) * Fix pre-allocator shift instruction copy on a specific case * Fix to make shift use int rather than long for constants	2020-12-14 17:56:07 -03:00
gdkchan	c8bb3cc50e	Fix register read after write on STREX implementation (#1801 ) * Fix register read after write on STREX implementation * PTC version update	2020-12-13 12:19:38 -03:00
sharmander	36f6bbf5b9	CPU: Implement VFNMA.F32 \| F.64 (#1783 ) * Implement VFNMA.F<32/64> * Update PTC Version * Update Implementation & Renames & Correct Order * Fix alignment * Update implementation to not trigger assert * Actually use the intrinsic that makes sense :)	2020-12-07 21:04:01 -03:00
LDj3SNuD	567ea726e1	Add support for guest Fz (Fpcr) mode through host Ftz and Daz (Mxcsr) modes (fast paths). (#1630 ) * Add support for guest Fz (Fpcr) mode through host Ftz and Daz (Mxcsr) modes (fast paths). * Ptc.InternalVersion = 1630 * Nits. * Address comments. * Update Ptc.cs * Address comment.	2020-12-07 10:37:07 +01:00
sharmander	b479a43939	CPU: Implement VFNMS.F32/64 (#1758 ) * Add necessary methods / op-code * Enable Support for FMA Instruction Set * Add Intrinsics / Assembly Opcodes for VFMSUB231XX. * Add X86 Instructions for VFMSUB231XX * Implement VFNMS * Implement VFNMS Tests * Add special cases for FMA instructions. * Update PPTC Version * Remove unused Op * Move Check into Assert / Cleanup * Rename and cleanup * Whitespace * Whitespace / Rename * Re-sort * Address final requests * Implement VFMA.F64 * Simplify switch * Simplify FMA Instructions into their own IntrinsicType. * Remove whitespace * Fix indentation * Change tests for Vfnms -- disable inf / nan * Move args up, not description ;) * Undo vfma * Completely remove vfms code., * Fix order of instruction in assembler	2020-12-03 20:20:02 +01:00
gdkchan	cf6cd71488	IPC refactor part 2: Use ReplyAndReceive on HLE services and remove special handling from kernel (#1458 ) * IPC refactor part 2: Use ReplyAndReceive on HLE services and remove special handling from kernel * Fix for applet transfer memory + some nits * Keep handles if possible to avoid server handle table exhaustion * Fix IPC ZeroFill bug * am: Correctly implement CreateManagedDisplayLayer and implement CreateManagedDisplaySeparableLayer CreateManagedDisplaySeparableLayer is requires since 10.x+ when appletResourceUserId != 0 * Make it exit properly * Make ServiceNotImplementedException show the full message again * Allow yielding execution to avoid starving other threads * Only wait if active * Merge IVirtualMemoryManager and IAddressSpaceManager * Fix Ro loading data from the wrong process Co-authored-by: Thog <me@thog.eu>	2020-12-02 00:23:43 +01:00
riperiperi	9852cb9c9e	Use backup when PTC compression is corrupt (#1704 ) * Use backup when PTC compression is corrupt * Apply suggestions from code review Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>	2020-11-20 02:51:59 +01:00
LDj3SNuD	0679084f11	CPU (A64): Add FP16/FP32 fast paths (F16C Intrinsics) for Fcvt_S, Fcvtl_V & Fcvtn_V Instructions. Now HardwareCapabilities uses CpuId. (#1650 ) * net5.0 * CPU (A64): Add FP16/FP32 fast paths (F16C Intrinsics) for Fcvt_S, Fcvtl_V & Fcvtn_V Instructions. Switch to .NET 5.0. Nits. Tests performed successfully in both debug and release mode (for all instructions involved). * Address comment. * Update appveyor.yml * Revert "Update appveyor.yml" This reverts commit `27cdd59e8b`. * Remove Assembler CpuId. * Update appveyor.yml * Address comment.	2020-11-18 19:35:54 +01:00
Mary	863edae328	shader cache: Fix Linux boot issues (#1709 ) * shader cache: Fix Linux boot issues This rollback the init logic back to previous state, and replicate the way PTC handle initialization. * shader cache: set default state of ready for translation event to false * Fix cpu unit tests	2020-11-17 22:40:19 +01:00
Mary	aa129fdbdf	infra: Migrate to .NET 5 (#1694 ) * infra: Migrate to .NET 5 This migrate projects and CI to .NET 5 * Remove language version restrictions (now on 9.0 by default) * infra: pin .NET 5 to avoid later issues * infra: Cleanup csproj files * infra: update dependencies * infra: Add temporary workaround for a bug in Vector128.Create see https://github.com/dotnet/runtime/issues/44704 for more informations	2020-11-15 19:27:15 +01:00
FICTURE7	64088f04e3	Fix LiveInterval.Split (#1660 ) Before when splitting intervals, the end of the range would be included in the split check, this can produce empty ranges in the child split. This in turn can affect spilling decisions since the child split will have a different start position and this empty range will get a register and move to the active set for a brief moment. For example: A = [153, 172[; [1899, 1916[; [1991, 2010[; [2397, 2414[; ... Split(A, 1916) A0 = [153, 172[; [1899, 1916[ A1 = [1916, 1916[; [1991, 2010[; [2397, 2414[; ...	2020-11-04 23:09:45 -03:00
gdkchan	2f16491712	Get rid of Reflection.Emit dependency on CPU and Shader projects (#1626 ) * Get rid of Reflection.Emit dependency on CPU and Shader projects * Remove useless private sets * Missed those due to the alignment	2020-10-21 09:13:44 -03:00
riperiperi	b4d8d893a4	Memory Read/Write Tracking using Region Handles (#1272 ) * WIP Range Tracking - Texture invalidation seems to have large problems - Buffer/Pool invalidation may have problems - Mirror memory tracking puts an additional `add` in compiled code, we likely just want to make HLE access slower if this is the final solution. - Native project is in the messiest possible location. - [HACK] JIT memory access always uses native "fast" path - [HACK] Trying some things with texture invalidation and views. It works :) Still a few hacks, messy things, slow things More work in progress stuff (also move to memory project) Quite a bit faster now. - Unmapping GPU VA and CPU VA will now correctly update write tracking regions, and invalidate textures for the former. - The Virtual range list is now non-overlapping like the physical one. - Fixed some bugs where regions could leak. - Introduced a weird bug that I still need to track down (consistent invalid buffer in MK8 ribbon road) Move some stuff. I think we'll eventually just put the dll and so for this in a nuget package. Fix rebase. [WIP] MultiRegionHandle variable size ranges - Avoid reprotecting regions that change often (needs some tweaking) - There's still a bug in buffers, somehow. - Might want different api for minimum granularity Fix rebase issue Commit everything needed for software only tracking. Remove native components. Remove more native stuff. Cleanup Use a separate window for the background context, update opentk. (fixes linux) Some experimental changes Should get things working up to scratch - still need to try some things with flush/modification and res scale. Include address with the region action. Initial work to make range tracking work Still a ton of bugs Fix some issues with the new stuff. * Fix texture flush instability There's still some weird behaviour, but it's much improved without this. (textures with cpu modified data were flushing over it) * Find the destination texture for Buffer->Texture full copy Greatly improves performance for nvdec videos (with range tracking) * Further improve texture tracking * Disable Memory Tracking for view parents This is a temporary approach to better match behaviour on master (where invalidations would be soaked up by views, rather than trigger twice) The assumption is that when views are created to a texture, they will cover all of its data anyways. Of course, this can easily be improved in future. * Introduce some tracking tests. WIP * Complete base tests. * Add more tests for multiregion, fix existing test. * Cleanup Part 1 * Remove unnecessary code from memory tracking * Fix some inconsistencies with 3D texture rule. * Add dispose tests. * Use a background thread for the background context. Rather than setting and unsetting a context as current, doing the work on a dedicated thread with signals seems to be a bit faster. Also nerf the multithreading test a bit. * Copy to texture with matching alignment This extends the copy to work for some videos with unusual size, such as tutorial videos in SMO. It will only occur if the destination texture already exists at XCount size. * Track reads for buffer copies. Synchronize new buffers before copying overlaps. * Remove old texture flushing mechanisms. Range tracking all the way, baby. * Wake the background thread when disposing. Avoids a deadlock when games are closed. * Address Feedback 1 * Separate TextureCopy instance for background thread Also `BackgroundContextWorker.InBackground` for a more sensible idenfifier for if we're in a background thread. * Add missing XML docs. * Address Feedback * Maybe I should start drinking coffee. * Some more feedback. * Remove flush warning, Refocus window after making background context	2020-10-16 17:18:35 -03:00
LDj3SNuD	04e330cc00	Add Umaal & Vabd_I, Vabdl_I, Vaddl_I, Vhadd, Vqshrn, Vshll inst.s (slow paths). (#1577 ) * Add Umaal & Vabd_I, Vabdl_I, Vaddl_I, Vhadd, Vqshrn, Vshll inst.s (slow paths). No test provided (i.e. draft). * Ptc InternalVersion = 1577	2020-10-13 22:41:33 +02:00
gdkchan	f2b12c9749	Remove old, unused CPU optimization (#1586 )	2020-09-30 16:16:34 -03:00

1 2 3 4 5

242 commits